«OBJECT DATABASES AND THE SEMANTIC WEB A THESIS SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ING. JAKUB ...»
Typegraph model of Tuijn [Tuijn94] generates the database schema category from a graph (called typegraph). Instances are given by functors from this category to another one, not necessarily a category of sets and mappings. Tuijn’s model is structurally equivalent to IQL with the query language defined only by universal constructions. Attributes, relationships and generalization are modeled as arrows.
Theory-based approach of Fiadeiro et al. [FSM90] focuses on building an information system from a formal logical specification (a theory presentation) that describes properties and behavior of objects. Aggregation, inheritance and relationships are expressed by constructs on a diagram in the category of theory presentations.
Process-based approach of Costa et al. [CSS94] is very similar to the previous one, but theory presentations are substituted by process models. This shows the unifying power of CT because instead of the usual representation of objects by sets and their structure by mappings, every object is represented by a Marcovian process and arrows denote partial mappings between these processes.
Limit Data Model (LDM) of Kolenčík [Kolenčík98] works with a category whose objects represent classes, and arrows model attributes, methods, inheritance and relationships. Limits such as pullback or pushout are used to define additional constraints on the schema, such as virtual base classes and polymorphic behavior.
Semantic Web as an Object-oriented Database 24
Figure 3.4 shows a category that defines the schema of a simple database.
Each arrow represents a projection of a product object. Such projections can represent simple string-valued attributes (person’s name), OID-valued relationships (PhD. student’s superior), or subclassing relationships (lecturer is a subclass of person). Since the graphical schema is a category, arrow compositions must be defined — these correspond to dotted naming conventions. The superscript in object names shows that the classes are “pure” — they do not contain objects of their subclasses, and specific arrows connect them to the respective “full” classes. Additionally, for Person to function as a virtual class, it is required that the D–S–P–L diamond is a pullback.
3.6.4 STATE OF THE ART AND IMPLEMENTATIONSMost categorical models focus on formal proofs for certain OODB concepts. Different authors use different categorical constructs for their models, without a standard way to express them. Most works only give a partial formalization of the data model. All of these factors contribute to the fact that implementations based on CT are scarce.
However, the Vision database ([FactSet00]) is based on categorical foundations — instead of storing objects as records, it works only with featureless sets and mappings. All properties and methods of an object are given through its mapping onto other objects, and database concepts such as memberships and keys are modeled by constraining these mappings (e.g. by injectivity). Families of appropriately constrained maps represent inheritance and polymorphism, and map transformations are powerful enough to model joins, projections, updates etc. to the point that they completely define a database query language.
3.7 LOGIC MODELS (F–LOGIC)
WHY WAS IT CHOSENThis section is an important example of modeling in logic. The new RDF-based OODB model proposed in this thesis is a logical one, and therefore this section is an introduction to some of its concepts. Moreover, many deductive object-oriented systems are currently being extended to handle Semantic Web data and provide entailment engines.
INTRODUCTIONMost of the previous work was focused on imperative and functional languages, but good reasons exist for bringing the object database paradigm into the world of logic programming [Kifer95]. The objectoriented paradigm provides a good way of specifying and manipulating structured objects, while logic and deduction offer the power of ad hoc reasoning and querying. This is increasingly important with growing database size, especially in the Semantic Web.
Some attempts have tried to integrate Prolog with object-oriented languages (C++), others define object-oriented features in existing logics. F-logic [KLW95] is an object-oriented logic designed for use in object-oriented logical (deductive) databases. Its main limitation is that it can only define and query object state, not change it through method side effects. For expressing the semantics of database updates, Transaction logic was designed by the same authors ([BK95]). F-logic has a sound and complete proof theory.
3.7.2 F-LOGIC DATA MODEL Specifications and Implementations. In F-logic, all types are classes — there are no equivalents of ODMG interfaces or literals. The semantics of F-logic do not address the question of separating interfaces from implementations, but if desired, signatures may be specified separately for practical encapsulation and reuse issues, although both use the same logic language1.
Object Types. Every object has a type and every operation requires typed operands. Type correctness is ensured by F-logic inference and consistency rules. Atomic, collection and structured objects are not distinguished — all objects are structured. The only collection tools of F-logic are set-valued object attributes. There is no direct equivalent of ordered sets, bags or collection types.
Subtyping and Inheritance. F-logic groups semantically related objects in IS-A class hierarchies that support multiple inheritance. Any data expression in a class can be either inheritable or noninheritable — an important distinction for class attributes. Semantically, inheritance in F-logic is set inclusion.
Object myCar is an instance of class peugeot, which is a subclass of frenchCar — an instance of class carType (see the paragraph on metadata below). The last line is a query that retrieves all classes that peugeot belongs to; observe that carType is one of the types it returns.
myCar : peugeot peugeot :: frenchCar frenchCar : carType ?- peugeot : X Literals. While in many object-oriented models a distinction is made between entities with an OID (objects) and without one (literals), this is not the case in F-logic. Here, all elementary values are objects — “Mr. Wong” and 1772 are actually OIDs of a string and integer object, respectively.
Extents. Extents as such do not exist in F-logic since its model theory requires that all data atoms are accessible to the query mechanism, and to preview an extent, a simple query returns all objects of a given type. From this point of view, maintaining extents is only a performance issue.
Queries: This query returns a set of employees who work at the “IT” department along with their names and ages.
?– X : empl ∧ X [ name → Y; age → Z : midaged; affiliation → D [ dname → “IT” ] ] Metadata. F-logic uses higher-order syntax by eliminating the hard syntactic distinction between objects, classes and attribute names. This is desirable because the same language can be used for defining, manipulating and reasoning about classes and their instances; for defining virtual classes using deductive rules; and for schema exploration and browsing. A class can be an instance of another class, and queries may return sets of classes or attributes. This does not lead to computational problems because the semantics remain first-order (higher-order variables range over intensional representations of sets rather than the sets themselves).
Attributes, relationships and operations. In F-logic, complex objects are defined by specifying logic functions from a set of all objects. These functions are partial since they are only defined for objects of a given type. Operations of an object take a given number of typed arguments and return a typed result. Attributes are simply functions that take the context object and return a constant value. Since everything is an object in F-logic, there is no difference between an attribute and a 1:1 relationship. Set-valued attributes and 1:N relationships are functions that return sets of objects.
F-logic has no notion of inverse or subordination relationships, exceptions or operation side effects (since it has no update semantics). It is clear that unlike the ODMG specification, F-logic has exact method semantics. Another important point is that some facts are saved in the database (extensional ones), while others are deduced from database contents at runtime (intensional facts).
Semantic Web as an Object-oriented Database 27 Deductive rules: The following rule says that an employee’s boss is automatically derived from the manager property of his department.
E [ boss → M ] ← E : empl ∧ D: dept ∧ E [ affiliation → D [ manager → M :empl ] ] Object Identifiers and Names are considered “syntactic gadgets” needed to refer to objects in the physical representation of a database or in a programming language, respectively. F-logic uses logical OIDs in the form of tree-like first-order terms.
Some examples of F-logic OIDs are mark, 17, “A duck” and father(adam).
3.7.3 EXAMPLE OF FORMAL SEMANTICS Semantic structures in F-logic are called F-structures, and they are represented by tuples.
Given a statement, called an F-molecule, with a corresponding variable assignment, rules are defined for its satisfaction by an F-structure that makes the statement true. This mechanism serves to define and verify facts about the F-structure data model environment that has been created. Properties of Fstructures can be stated using axiomatic F-molecules: reflexivity of the subclassing relationship is expressed by p :: p is satisfied by I.
The proof theory for the satisfaction (or logical entailment) relation is sound and complete. It starts with the definition of substitutions and unifiers and proceeds to define a suite of 12 inference rules. In addition to common ones for logic systems — resolution, factoring and paramodulation — additional rules are defined to capture the semantics of an object-oriented system; these include behavior of the subclassing IS-A relationship, type inheritance, restriction of method input types and relaxation of output types, restriction of return values for scalar methods, merging and elimination of simple idterm tautologies.
3.7.4 STATE OF THE ART AND IMPLEMENTATIONSOne example of a system that is based on F-logic is Flora [YK00]. It represents a second generation of deductive and object-oriented database systems (DOOD) — the first one attracted attention in early 1990’s but never gained wide acceptance due to performance problems and other difficulties.
However, renewed interest was sparked by the interest in autonomous agents, Semantic Web, RDF and ontology languages.
Semantic Web as an Object-oriented Database 28 Flora aims at being a practical system with high expressive power, strong theoretical foundations, competitive performance, and a convenient development environment. It is based on F-logic (for object-oriented model), HiLog [CKW93] (for higher-order programming) and Transaction Logic [BK95] (for implementing updates). Rather than implementing its own deductive engine, it translates the program at source level into predicate calculus and uses XSB [SSW94], which has the advantage of very high performance and optimizations such as tabling, trie-based indices and unification factoring.
Flora has been successfully used to implement a number of sophisticated Web-based information systems, including a model-based mediation system, a webbase, and a knowledge-based integration system. The current FLORA-2 implementation is being developed at http://flora.sourceforge.net.
3.7.5 SOURCES A keynote on integrating object-oriented and logical languages is [Kifer95]. F-logic is defined in [KLW95], Transaction logic in [BK95], HiLog in [CKW93], and the implementation of Flora was presented in [YK00].
Semantic Web as an Object-oriented Database 29