«OBJECT DATABASES AND THE SEMANTIC WEB A THESIS SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ING. JAKUB ...»
"Resource Description Framework (RDF) is a framework for representing information in the Web."[KC04] Although the Semantic Web is mostly mentioned from the viewpoint of ontologies or artificial intelligence, managing a body of information is a typical database problem. From this perspective, the Semantic Web can and should be viewed as a worldwide database1. Obviously, it is widely distributed and not centrally managed, often incomplete or inconsistent and very loose in its format — but it still is a database, albeit not a relational one. There are, however, other types of databases as well, ones whose structure is very close to the graph nature of RDF and the Web.
This section considers the Semantic Web as an object-oriented database. At the core of an OODB is a set of uniquely identifiable objects connected by relationships; more OODB principles are elaborated in chapter 3 and below. This chapter lists similar and corresponding concepts in object-oriented databases and the RDF-based Semantic Web, overviews design choices for object models and RDF, and discusses how the two can enrich each other by applying database concepts to the Semantic Web.
5.1 SIMILARITIES When comparing the areas of the Semantic Web, ontologies and agents with the areas of object databases and their application layers, it turns out that there are some important similarities. Certain
structural elements are almost identical in the two models. Some of them include:
Unique identifiers — unique object identifiers are a strong requirement in every OODB — they allow objects to be connected in relationships. It is obviously important to note the scope of the word “unique”. Several years ago, this would have probably meant "unique within an application".
Today the information systems community strives for open and interoperable solutions.
Uniqueness across an application, all installations of a given database product or even an operating system is not enough. What object databases need is some kind of resource identifiers with uniform structure and worldwide validity. Uniform Resource Identifiers [BFM98] are one such system in wide use — and they also underlie the Semantic Web RDF framework. With a suitable data model, the ability to uniquely identify and publish data and their relationships would practically constitute a loosely bound, worldwide, distributed object-oriented database.
Object-oriented schemas — compared to relational schemas, object-oriented ones are much richer and closer to the way humans think about problems. This is also true of designing knowledge-based schemas (ontologies). The result is that these two paradigms have many
concepts in common and share identical semantics (see Figure 5.1) and it would be beneficial for the ontological and object design communities to cooperate more closely.
Graph-theoretic foundation — In 1990, [Atkinson90] admits that the OODB community has not reached a consensus on a common theoretical foundation. While relational databases firmly stand on relational algebra, the situation in object databases has not changed. However, many models contain elements from graph theory, either as a supplement to a set-theoretical model ([LRV92]), a pure graph model or category theory ([NR95], [Tuijn94], [Schewe95], [Kolenčík98]).
Graph theory is suitable for expressing relationships or inheritance hierarchies.
The underlying paradigm for the RDF layer of the Semantic Web is model theory that interprets graphs. An opportunity is open for laying a shared theoretical foundation for both the Semantic Web and object-oriented databases, which would essentially connect their two worlds together.
Description logics — Deductive object-oriented databases use description logics for areas ranging from describing the database schema through type checking and query languages to update semantics (F-logic, Transaction logic, HiLog [Kifer95]). Deductive databases are being extended to embrace the Semantic Web [YK00]. Since RDF model theory is also built on existentially quantified first order logic, yet another connection appears between the two areas.
RDF databases — research that focuses on this emerging area works to understand how RDF data can be organized, stored, queried, and processed. All of these efforts are more natural in an object-oriented or object-relational database than in a purely relational one.
5.2 SHARING CONCEPTS
Figure 5.1 shows that the structure of an object model is similar to that of simple ontology languages (like RDFS), despite some semantic nuances.
This inevitably suggests that some well-developed concepts from the area of object-oriented databases could be adapted and used in the context of the
Semantic Web. Some examples are:
Semantic Web as an Object-oriented Database 45 Access Control — the growing Semantic Web is about to include sensitive data that cannot be accessed by everyone. Object databases offer several ways of restricting access to data graphs — permissions, views, roles, and other mechanisms. Some of these could be used for RDF graphs.
Query Languages — multiple query languages for RDF data exist (e.g. RDFSuite RQL [KCA02], Sesame SeRQL [MKvH02], and Jena RDQL [Seaborne02], see appendix C). Some of them stem from SQL, others from graph formalisms; it would be interesting to see how object query languages such as OQL [CB00] or JDOQL [Craig03] adapt to RDF/S data.
Efficient Storage — Today the Semantic Web still operates in small and manageable scale and there serious performance concerns are rare, but storing RDF in flat files is slowly becoming an obstacle. RDF databases try to address this problem, but they struggle with storing triples in relational tables with too many joins, no support for reification, complex data types, or inheritance [WSK03]. The Sesame RDF server already uses PostgreSQL, an object-relational system [MKvH02]. When storing RDFS data, object-oriented databases that cluster data by objects and types offer more efficiency.
High Level Processing — manipulating RDF data by triples is too fine-grained for some applications. Aggregating multiple edges into objects and finding dependencies between these objects moves RDF data to a higher level of abstraction, making them easier to understand and manage. This allows object-oriented languages to work with RDF objects1, including typing, method invocations, and custom behavior.
5.3 DATA MODELING FROM A RDF PERSPECTIVEThis section discusses the main concepts of object-oriented database models, gives a deeper rationale for their use, mentions some limitations they have in database implementations and outlines their correspondence to RDF concepts together with possible advantages this might offer.
The herein mentioned concepts are derived from two sources:
If humans are to understand the object model, it is necessary to tie it closely to human thinking and reasoning — that is the domain of cognitive psychology [BvSvS87].
The concepts also correspond to the four fundamental object-oriented principles as proposed at a public meeting of experts in object technology standards and SQL3 development [Sutherland93].
5.3.1 ATOMIC, COLLECTION AND STRUCTURE NODES A system is defined as a set of related objects. Human thinking is based on finding out about the qualities of these objects (theory of exploration and concepts) and abstracting from them (theory of meaning and classification) [BvSvS87]. Therefore, an object model should be able to express properties of objects.
Assigning properties to objects has two aspects. First one is that every specific feature needs to have a given place, role or meaning within the parent object. This is modeled using properties and the parent a concept called RDF objects is being developed within the Jena platform [CDD03] Semantic Web as an Object-oriented Database 46 object is then a tuple, also called a structure or a record. The number of properties is constant because each one of has distinct meaning. The second construction deals with several properties that have the same role. This is usually not modeled by having many values of the same property, but by aggregating these values within a new subobject that represents the many values of the property — a set, bag, list, array, or generally, a collection. The difference between these different collection types is whether they are indexed and whether multiple occurrences of the same object are allowed. Since objects cannot be decomposed infinitely, we need some atomic objects, also called elementary or simple objects.
In the world of relational databases, cells in tables are required to be atomic objects, rows are tuples with named fields, and tables represent collections.
RDF COUNTERPARTSIn RDF, it is natural to identify atomic objects with typed literals — a good and widely used choice for their typing is XML Schema Datatypes [BM00]. Tuples are naturally modeled by triples that have the same subject and specify property values, while collections can use RDFS collection syntax with some additional semantic restrictions.
5.3.2 UNIQUE IDENTIFIERS In [Sutherland93], the first principle is: "A first class object has unique, immutable identity within its scope in a distributed environment." Databases in a networked world need to avoid the problem of synonyms and homonyms in natural language, and uniquely identify the objects they describe.
In the object-oriented database world of nodes, a sharp distinction exists between objects (reference concepts, instances) and literals (data concepts, values). While objects have a unique object identifier (OID) that allows them to enter into relationships and keep their distinct nature without regard to their structure, literals are always parts of objects and the only relationship that targets them comes from their parent object. From the modeling point of view, this reflects the fact that some information does not make sense alone; it should only be considered within the parent object. All literals with the same value are essentially identical.
RDF COUNTERPARTSIn RDF graphs, objects naturally correspond to individuals with urirefs since these denote resources that can be referenced. Literals correspond either to RDF (typed) literals or, in case of complex data types like collections and structures, to blank nodes. RDF literals cannot be subjects of triples and blank nodes do not have universal identifiers, therefore they cannot be globally referenced. Additional condition has to ensure there is only a single reference to a blank node, one from the parent object1.
5.3.3 ATTRIBUTES AND RELATIONSHIPS The second principle of [Sutherland93] says: "First class links occur only between first class objects."
There are two principal kinds of properties on the instance level of the OODB world — attributes and relationships. Properties are used to structure and relate things in tuples and collections. Attributes connect nodes to literals, and relationships connect nodes to objects. The attribute property is also called the part-of relation or aggregation. N-ary relationships are transformed to binary ones and relationships of cardinality 1-to-N or M-to-N are expressed using collections. In object-oriented
practice, properties have unique names only within a given object, which causes name conflicts when extending objects with new properties in case of multiple inheritance [Kolenčík98].
RDF COUNTERPARTSIn RDF, properties are not bound to a specific class of objects and their naming has to be just as unique as node urirefs. In an object-oriented RDF-based database, connecting property domains and ranges to specific classes is obligatory but the advantage of unique property names is kept. In cases where a property is used in a sense more general than just a specific class (Dublin Core metadata), a strongly typed property can be subclassed from the more general one.
5.3.4 TYPING AND THE DATABASE SCHEMA The third [Sutherland93] principle states: "A first class object always knows what type(s) it is." Human thinking depends on concept abstraction — D. Frege, semiotics and the theory of meaning [BvSvS87].