«OBJECT DATABASES AND THE SEMANTIC WEB A THESIS SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ING. JAKUB ...»
Reversing the direction of rdfs:domain. When finding out information about a certain type, it is useful to ask a question like X(my:Person,Y) to find out all the attributes of my:Person and their types together with subclassing information and other specifications of the class. For this reason, it would be useful to define a predicate that has the same meaning as rdfs:domain, but its object and subject are exchanged (see the curved arrow in Figure 7.1).
Connecting edges to corresponding nodes. In RDF, it is natural that an object can act both as a predicate and as subject/object. To find out information about an attribute in an object database, one can usually examine the type of an object that the attribute is connected to.
However, the RDF model does allow extra object attributes unspecified by the class of the object, therefore it should be possible to find out about the specifics of an attribute elsewhere. In Figure 7.1, straight arrow indicates a new connection that needs to be made for the database to correctly traverse to the needed information — one practical way to do this is to create property extents that serve as access points to the property hierarchy.
Inverse relationships. With random access to any triple in the RDF graph, attributes and relationships can be traversed in both directions. However, in an object-oriented graph, a new inverse link needs to be added for every two-way relationship. On one hand, this adds extra triples, but on the other, it allows the clustering of triples by objects and lowers performance requirements for navigating the graph.
Limiting access to several entry points together with removing property-driven random access into the graph certainly limit the ease of random access to data in the RDF graph. On the other hand, data can be physically organized by objects and the usual OODB optimizations can be applied to things like attribute storage or type information. Accessing the RDF graph in a normal OODB way is made possible, which is useful for large RDF databases that need to retrieve whole RDF objects rather than process complex queries. The difference is similar to one between OLTP (online transaction processing) and OLAP (online analytical processing) architectures in relational databases.
Semantic Web as an Object-oriented Database 65
7.3 IMPLEMENTATION NOTES The proposed SODA model would obviously be very slow if built on top of existing RDF datasource implementations. Most of the existing implementations are programmed in high-level scripting languages and used for smaller-scale purposes rather than large data storage. However, RDF databases store triples in relational or postrelational databases which significantly boosts the performance.
Thanks to the structure imposed on RDFS data, an implementation similar to a traditional objectoriented database could be chosen.
Numerous optimizations can be achieved for object-oriented RDF data — for example, the database can group data by objects and only store literal values instead of full triples; similarly, blank nodes of collection and tuple types can stored with class instances, forming in-memory records; assembling objects that inherit properties can be done without multiple joins on database tables; and some triples, such as soda:type or soda:_n become redundant thanks to the default organization of the database schema.
Why build on the semantic foundation of RDF instead of converting data to some other representation and storing them in an existing object-oriented database? The main advantages of
having a sound RDF-compatible semantics are:
For the Semantic Web — from the conceptual point of view, all of the data, their structure and semantics are RDF-compatible and therefore easily accessible within the Semantic Web by other computers, and ready for automatic processing by software agents.
For system flexibility — RDF-based solutions are very flexible, which makes them ideal for prototyping, restructuring and further semantic extensions. It is often desirable to prefer flexibility to performance, and the SODA model is open to such modifications.
The Semantic Web is an emerging paradigm for sharing semantically rich data on the Web and using them for cooperation among software agents. As the Semantic Web grows, it needs to address issues that have been traditionally researched in the database community — security, transaction management, efficient data storage, embedded business logic.
For many years, the area of object-oriented databases has lacked a unifying formal foundation.
OODBs have been closely integrated with several object-oriented languages and independent of each other, and the ODMG effort at providing a common specification has only had partial success.
This thesis showed many similarities between the RDF-based Semantic Web and object-oriented data models. It highlighted some of the areas where these two can enrich each other — the area of OODB models and ontologies, and the area of formal specification of flexible RDF-based databases. A vocabulary extension for RDF was developed that structures data according to object-oriented database principles. This showed how to map the foundational OODB elements onto the world of the Semantic Web, and how to provide the Semantic Web with some important database concepts.
8.1 MAIN CONTRIBUTIONS The correspondence between a wide range of object-oriented data models and the Semantic Web RDF/S model was shown and analyzed.
An object-oriented data definition language (similar to the G2 CDL language) based on RDF/S model theory was designed and formally described.
Several extensions to the model were developed, such as the process of mining object data from arbitrary RDF/S graphs, and the modifications needed for providing reachability to RDF/S graphs based on access points.
Results of this research were presented to the international database and Semantic Web community at conferences in Canada [Güttner03c] (co-organized by Maebashi, Japan), Italy and Croatia [Güttner03b], and several ones in the Czech Republic — [GH02], [Güttner03] (won the Best PhD.
Paper Award), [GH03], [Güttner04].
8.2 DIRECTIONS FOR FURTHER RESEARCHThere are many areas that open up for future investigation. Much research activity is currently aimed at RDF servers and databases (see section 4.4) and the object-oriented data model presented in this thesis could facilitate the implementation of an object-based RDF database. Some of the topics that could be
addressed within the scope of the Semantic Web are:
Semantic Web as an Object-oriented Database 67 Access Control for portions of RDF graphs. This idea is discussed in a recent article ([Güttner04]) that shows the emerging need for access control based on RDF data, which returns the appropriate RDF subgraph. A simple access model was suggested but the issue needs further investigation.
RDF object query language. OQL [CB00] and JDOQL [Craig03] are standard languages for querying object databases that have been extensively tested and used in a variety of applications.
One interesting topic is exploring how these languages could be adapted to work with RDF data objects, and what limitations would need to be overcome.
Storage optimizations gained by storing and subsequently navigating RDF data by objects. A fruitful topic would be to explore the tradeoffs of boosting navigation and storage capacity at the expense of limiting random access to triples — see section 7.2 for an introduction to the topic.
[Adobe04] Adobe Systems Inc.: XMP Specification. http://www.adobe.com/products/xmp/pdfs/xmpspec.pdf, USA 2004.
[ACK01] Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICSFORTH RDFSuite: Managing Voluminous RDF Description Bases. In: Proceedings of 2nd Int‘l Workshop SemWeb’01 at WWW‘01, Hongkong 2001.
[Atkinson90] Atkinson, M. et al: The Object-Oriented Database Systems Manifesto. In: Proceedings of DOOD ’90 – Deductive and Object-Oriented Databases, Elsevier Science Publishers, USA 1990.
[Atwood85] Atwood, T.: An object-oriented DBMS for design support applications. Ontologic, Inc. report, 1985.
[BKK87] Banerjee, J., Kim, W., Kim, K.C.: Queries in object-oriented databases. MCC Technical Report, no. DB 188-87, USA 1987.
[BBD00] Beged-Dov G., Brickley D., Dornfest R., et al.: RDF Site Summary (RSS 1.0), http:// purl.org/rss/1.0/spec, USA 2000.
[BvSvS87] Benesch, H., Von Saalfeld, H., Von Saalfeld, K.: Encyclopedic Atlas of Psychology.
Deutscher Taschenbuch Verlag GmbH&Co, Germany 1987.
[BernersLee89] Berners-Lee, T.: Information Management: A Proposal. CERN, Switzerland 1989.
[BernersLee02] Berners-Lee, T.: Primer: Getting into RDF & Semantic Web using N3. http://www.w3.org/2000/ 10/swap/Primer, W3C, USA 2002.
[BHL00] Berners-Lee, T., Hendler, J., Lassila, O.: Semantic web. In: Scientific American May 5/02, USA 2000.
[BFM98] Berners-Lee, T., Fielding, R., Masinter, L.: Uniform Resource Identifiers (URI): Generic Syntax (RFC 2396). IETF Standard, http://www.ietf.org/rfc/rfc2396.txt, USA 1998.
[BM00] Biron, P.V., Malhotra, A. (eds.): XML Schema Part 2: Datatypes. W3C Recommendation, http://www.w3.org/TR/xmlschema-2/, USA 2000.
[BK95] Bonner, A.J., Kifer, M.: Transaction Logic Programming (or a Logic of Declarative and Procedural Knowledge). Technical Report CSRI-323, ftp://ftp.cs.toronto.edu/pub/bonner/papers/ transaction.logic/iclp93.ps, University of Toronto, Canada 1995 [BEK00] Box, D., Ehnebuske, D., Kakivaya, G., Layman, A., Mendelsohn, N., Nielsen, H.F., Thatte, S., Winer, D.: Simple Object Acceess Protocol. W3C Recommendation, http:// www.w3.org/TR/SOAP/, USA 2000.
[BKvH02] Broekstra, J., Kampman, A., Van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Proceedings of 1st Int‘l Semantic Web Conference, http://www.openrdf.org/doc/papers/Sesame-ISWC2002.pdf, Italy 2002.
[CdWV88] Carey, M., DeWitt, D., Vandenberg, S.: A data model and query language for Exodus.
In: Proceedings of the 1988 ACM SIGMOD conference, ACM Press, USA 1988.
All World Wide Web links in this thesis have been valid as of June 24, 2004.
Semantic Web as an Object-oriented Database 69 [CDD03] Carroll, J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: The Jena Semantic Web Platform: Architecture and Design. HP Laboratories Technical Report HPLUSA 2003.
[CastroLeon04] CastroLeon, E.: The Web Within the Web. In: IEEE Spectrum 2/41, IEEE Press, USA 2004.
[CB00] Cattel, R.D.D., Berry, D.K. (eds.): The Object Data Standard: ODMG 3.0. Morgan Kaufmann Publishers San Francisco, USA 2000.
[CKW93] Chen, W., Kifer, M., Warren, D.S.: HiLog: A foundation for higher-order logic programming. In: Journal of Logic Programming, 15/3, USA 1993.
[CODASYL80] Committee on Data Systems and Languages, Data Base Task Group: CODASYL Network Data Model. CODASYL, USA 1980.
[Codd70] Codd, E.F.: A Relational Model of Data for Large Data Banks, In: Communications of the ACM 13(6), ACM Press, USA 1970.
[CvHH01] Connolly, D., Van Harmelen, F., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: DAML+OIL Reference Description. W3C Note, http://www.w3.org/TR/daml+oilreference, USA 2001.
[CSS94] Costa, J., Sernadas, A., Sernadas, C.: Object Inheritance Beyond Subtyping. In: Acta Informatica, Germany 1994.
[Cowan02] Cowan, J.: Metadata, Reuters Health Information, and Cross-Media Publishing. In:
Proceedings of Seybold New York 2002 Enterprise Publishing Conference. http://seminars.
seyboldreports.com/seminars/2002_new_york/files/presentations/014/cowan_john.ppt, USA 2002.
[Craig03] Craig, R. (ed.): Java Data Objects Specification 1.0.1. http://jcp.org/aboutJava/communityprocess/ final/jsr012/index2.html, Sun Microsystems, USA 2003.
[Dataquest99] Dataquest, a Gartner group company: Dataquest market analysis. http://www.intersystems.
com/cache/analysts/reviews/dataquest.html, England 1999.
[DC94] Diskin, Z., Cadish, B.: Algebraic Graph-Oriented = Category Theory Based. Manifesto of categorizing database theory. Technical Report 9406 – Frame Information Systems, Latvia 1994.
[DC03] Dublin Core Metadata Element Set, Version 1.1: Reference Description. NISO Standard Z39.85-2001, http://dublincore.org/documents/2003/06/02/dces/, USA 2003.