Current usage and future of XML Database Management Systems
December 28, 2009
Purpose The purpose of this document is to present a management report that analyzes the current usage of XML based Database Management System, its impacts and future trends of usage.
Database Management Systems (DBMS) has been constantly evolving ever since first DBMS was installed and used based on Codd’s relational concepts. We saw steady pace of innovations in DBMS technology and usage starting from Hierarchical databases to Object/Relational databases and from centralized to decentralized database systems. DBMS evolution has also been supported by innovative ways of collecting, storing, processing and retrieving data such as from basic genealogy to complex forensic data (Hoffer J.A). These innovations shows how closely linked is DBMS to the advancements in general technology landscape, overall systems architecture and type and nature of Information Management (IM). For example, Object Oriented systems development has supported the development of Object Oriented DBMS. Similarly, distributed systems architecture has lead to distributed DBMS setup. Also, advanced needs of managing complex information has lead to the development of various sophisticated spatial/clinical/genetic databases.
The current landscape of DBMS is a collection of above mentioned trends that has continued to evolve in the past 30 years. Different data management domains calls for specific technology based DBMS viz. Relational versus Object Oriented. Again within each specialized DBMS technology domain, their exists number of vendor products that compete to provide efficient database management. For example, Oracle and DB2 compete each other in Relational database domain. Competition is also prevalent in the way the DBMS product license is offered such as with MySQL, PostgreSQL (IdeaByte) and recently with Sybase (Product: Sybase ASE 15 Express) when it announced to offer it free on Linux (peterdobler.com).
General Overview XML (Extensible Markup Language, developed by W3C) based DBMS is one such innovative usage of DBMS prompted by pervasive usage of web based database applications and its related need of managing frequent storage and retrieval of not-very structured data in document format (i.e. as web pages). This need goes in line with what was described in the above section as related to evolving and specific IM needs. XML, in its basic sense of existence, is used to create, store and transport either data-centric (such as a SOAP request and response) or document-centric data (such as XHTML documents). In either case, XML provides an ordered way to arrange data in cascaded data tags which could be easily read and processed using XML query language (eg: XQuery). Even though XML was originally designed to create XHTML (Extensible HTML) documents, technologists and database vendors realized the importance of XML in storing and retrieving semi-structured data, efficiently (Obasanjo, D).
Strictly speaking, XML is not a database, but an efficient medium to represent and transport data across multiple systems. Also XML is not a DBMS in strict sense, but could provide some basic features of DBMS via XML documents, XML query languages, programming interfaces etc (Bourret, R.), i.e use XML documents to store data (eg: DTDs) which is queried and accessed by XML query language (eg: XQuery). This is the most basic model of XML DBMS and forms the basis of all modern Native XML DBMS available such as Sedna. Sedna is a DBMS that supports some traditional DBMS features such as update and query languages, query optimization, fine-grain concurrency control, various indexing techniques, recovery and security. Sedna also supports W3C based XQuery language which could be used to conduct complex data management operations such as XML data querying, XML data transformations and even business logic computation (ispras.ru). Another type of XML DBMS is the normal DBMS such as DB2 or Oracle that provides support for XML based data storage and retrieval through special storage and data management features. For examples, latest releases of Oracle provides native XML data type which could be used to store XML data. DB2 provides support for XQuery based data management where data could be exported/imported into the database in XML format.
As we saw from above analysis, XML DBMS’s core usage is based on the need to handle vast amount of document centric or XHTML centric data. This is the basic feature that distinguishes XML DBMS from current DBMS technology. If we have a database application that is web based and it requires heavy processing of documents/objects (storage/access/search of web pages, music/video files, directory/phone book type of data etc) and that requirements of structured document/data storage is not very relevant, then we could potentially reap benefits by using a native XML DBMS. Where as, if we plan to implement a heavy transaction oriented web application which involves atomic transactions, such as bank transactions, we should be using a traditional DBMS such as DB2 or Oracle (provided these support native XML transactions).
Impact and Future Directions Degree of disruption:
XML DBMS has not bought any level disruption to the current DBMS market or its usage. It has been developed and used as an add on tool to support a specific IM need, mainly in web based database applications. The name ‘XML DBMS’ sounds like a misnomer since XML or native XML DBMS does not provide all features of a full blown DBMS. Since XML and its query language confirms to W3C standards, it is could be easily integrated with all popular relational/ object oriented DBMS as add-on feature.
Costs associated with implementing XML DBMS depends on the type of solution that is sought for. If we plan to use native XML DBMS, most of it is free/open source, which brings down total cost of ownership to zero. Most of the present day DBMS such as Oracle, DB2 etc comes with native XML support, so that no extra cost is incurred if we are already using one of these DBMS.
XMLS DBMS provides maximum benefit when used for driving heavy document-centric web applications as we saw in ‘General Overview’ section. XML DBMS provides most cost effective way to store and process document data since very little effort is required to present user data since the underlying data format for the transport and presentation layers are in the same, i.e XML or a derivative such as XHTML.
IT infrastructure changes: Since XML DBMS is used to support the strategy of building cost effective XML data management, most of the supporting system architecture would be already in place – such as XML documents (that follows a specific XML schema) , XML parser/extractor and Query tool (which is supported by almost all of the web scripting languages and native XML DBMSs) and native XML support by the underlying relational DBMS. For example, if we plan to implement XML DBMS for a web based application which is LAMP (Linux/Apache/MySQL/PHP) based, all supporting technology (DTD/XML schema support, XQuery/XPath based query language support, SAX/DOM based programming interface support, native XML support within MySQL etc ) is inherent to the underlying technology infrastructure. The most critical factor that drives selection of XML DBMS is thus the specific need to support XML centric application architecture.
Skills required: The basic skill required to implement and manage XML DBMS is knowledge of XML, XML schema, XML query language, XML parsers/extractors, XML programming interfaces, usage of native XML functions (if using relational based DBMS) and knowledge of native XML DBMS (if native XML DBMS such as Sedna drives the database application).
Future directions in usage of XML DBMS:
XML technology in general is being widely accepted as a standard medium of data transport between disparate systems. The standards are W3C complaint and XML query tools and APIs are constantly improved to be interoperable with wide range of relational, object-oriented and non-relational databases,. This scenario supports the wide acceptance of XML based databases for powering systems which are less web centric in nature. To show the immense possibility of effectively utilizing the power of XML DBMS, provided is a sample system as shown below (infolab.cs.unipi.gr). The system leverages XML DBMS technology to build and manage a Pattern Base Management System (PBMS) which enables user to store and retrieve patterns, just like data.
Figure 1. XML DBMS based PBMS (Image courtesy – http://infolab.cs.unipi.gr/projects/pbms_description/pbms_site_v5b.htm)
The idea of using XML based DBMS originated with the concept that patterns are “compact and rich semantic representations of data”, which could be effectively represented in XML schema. The figure shows how data is extracted from data sources via XML and further fed into underlying relational based (or a native XML DBMS) through appropriate XML query tool (in this case, it is a Pattern Definition/Query/Manipulation Language or PD/Q/ML).
Glossary XML – Extensible Markup Language, developed by World wide web consortium (W3C) to deal with shortcoming of HTML. IM – Information Management SOAP – Simple Object Access Protocol used as a medium to communicate between two systems, eg: an application and a web service. Native XML DBMS – Pure XML based DBMS without underlying relational or any other traditional DBMS support XQuery – An XML query tool XHTML – Extensible HTML SAX – Simplae API for XML DOM – Document Object Model DTD – Document Type Definition
1. Hoffer J.A et.al (March 20, 2006). Modern Database Management. Prentice Hall 8th edition.