«OBJECT DATABASES AND THE SEMANTIC WEB A THESIS SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ING. JAKUB ...»
3 OVERVIEW OF OBJECT–ORIENTED DATABASESRelational database management systems (RDBMSs) are designed to store data according to the most efficient method of data cataloging… In many cases, however, [this] is not the most efficient method for storing and retrieving such data… When dealing with data in complex interdependent structures, or [rapidly retrieving data] by following paths of associations, the relational database begins to show impediments… In some cases, RDBMS must be regarded as impractical for certain data management tasks.1 The limitations of the relational data model in terms of its ability to handle complex data types, complex data relationships, and multiple access methods are starting to be recognized.
Relational databases… do not scale well to accommodate complex transactions. The unique needs of the Internet… have been a major catalyst in this need for change.2 The first widespread database standard was the CODASYL norm of 1980 [CODASYL80] — it formalized the field of network databases with records interconnected by physical address references.
However, the model suffered from many problems with distribution, consistency checking, and migration. All of these shortcomings were addressed by the relational approach, first presented by E. F.
Codd in 1970 [Codd70], which remains the prevalent paradigm until today. It stands on a firm mathematical foundation of relational algebra that has been fully used in designing the SQL query language.
The third generation, object databases, tries to address problems that arose from the relational approach.
These include impedance mismatch between relational datatypes and object-oriented language data structures; lack of support for complex data types and relationships (such as object and type hierarchies, large binary objects and semistructured data); no systematic approach to storing and encapsulating algorithms in the database; and problems with efficient lookup of objects due to indexing by a large number of key types.
DB paradigm Unique ID Relationships Lookup Embedded data Network model Direct physical address Yes By address No Relational model Many table-unique key types No By values No Object model Single OID type Yes By OID Yes This chapter presents an overview of data models in object-oriented databases. It introduces multiple standards, languages, implementations, and formalisms, ranging from influential papers like the OODBS Manifesto of 1990, to the recent object-oriented data management standard for application IDC Consulting white paper on user data management [IDC03]
servers — Java Data Objects 1.0.1 of 2003 (see the chronological outline in Figure 3.1). All sections of this chapter are arranged in a similar structure so the reader can better compare them.
3.1 OBJECT–ORIENTED DATABASE SYSTEM MANIFESTO
WHY WAS IT CHOSENThis Manifesto is one of the first and most influential attempts to characterize the area of object databases. It is not a formal model, and not even a standard, but it points out the most important features that an object-oriented database system should have. Compared to the exact definition of a “relational database”, the notion of “object-oriented database” has always been quite informal, and this brief, 15-page document tries to underpin it with several useful axioms.
The Object–Oriented Database System Manifesto [Atkinson90] of 1990 is an attempt to summarize:
Mandatory features required of a program to be both a database and an object-oriented system.
Optional features that clearly improve the system, yet the system is still an object-oriented database without them.
Open choices are up to the individual database implementors to decide. In these, the scientific community has not yet reached a consensus and it is not clear which option is the most suitable.
3.1.2 MANDATORY FEATURES
FEATURES OF AN OBJECT-ORIENTED SYSTEMComplex Objects. Complex objects are constructed from simpler ones. Examples of the simplest objects are integers, floats, integers, strings and characters. Examples of constructors are tuples, sets (bags), lists or arrays. Any constructor can be applied to any object. Retrieval, deletion or copying of complex objects is available.
Object Identity. Every object has a unique identifier independent of its value. Objects can be shared through relationships and independently updated.
Types and Classes. The database schema is given by a set of classes or types that describe the format of objects. While types (C++, Simula, O2) are mainly a static notion used to ensure program correctness at compile time, classes (Smalltalk, Lisp) are rather first-class citizens used for creating and warehousing objects at runtime. The system should be able to maintain extents of selected classes or types.
Class or Type Hierarchies. The system has support for inheritance between types or classes — it is able to derive more specialized classes or types from existing ones. No specific type of inheritance is prescribed.
Overriding, Overloading, and Late Binding. Different algorithms can have the same name and which one will run is only chosen at runtime. For example, invoking the display operation on a general type gives different results for different types of graphical primitives.
Computational Completeness. Any computable function is expressible using the data manipulation language of the database system.
Extensibility. The set of predefined types must be extensible. There must be no distinction in using system-defined and user-defined types.
FEATURES OF A DATABASE SYSTEMPersistence. Each object should be allowed to survive the execution of a process and to be reused later without explicit load/store operations.
Secondary Storage Management. The system must supply features such as index management, data clustering, buffering and query optimization. They are so performance-critical that they need to be present if the database is to complete certain tasks in realistic time frame.
Concurrency. The system should ensure harmonious coexistence among multiple users working simultaneously on the database by providing atomicity of operations and controlled sharing.
Recovery. In case of failures, the system should restart to some coherent state of its data.
Ad Hoc Query Facility. The functionality of an ad hoc query language should be provided to express simple queries. This does not necessarily require a full query language; for example, a graphical browser can achieve the same thing.
3.1.3 OPTIONAL FEATURES Multiple Inheritance. An object should be able to inherit from multiple predecessors. Not everyone in the object-oriented community agrees that this should be a required feature.
Type Checking and Type Inference. The degree of compile-time type checking and type inferencing is left open, but the more, the better.
Distribution. Distribution is orthogonal to the object-oriented nature of the system; that is, the database system may or may not be distributed.
Versions. Different versions of database contents and their management may be supported.
3.1.4 OPEN CHOICES Programming Paradigm. The choice of logic, functional, imperative, or any other programming style is left to the designers along with language syntax.
Representation System. The choice of specific atomic types and type constructors is left to the designers.
Type System. Any kind of type formers beyond type constructors can be implemented — generic types, restrictions, boolean operations, functions etc. The type system for variables can be richer than the one for objects.
Uniformity. It is up to the designers to decide whether schema information should be stored as normal objects, whether types are first-class citizens in the programming language and whether the user sees any difference between types, objects and methods.
3.1.5 SOURCES The OODB Manifesto can be found in [Atkinson90].
3.2 THE ODMG STANDARD
WHY WAS IT CHOSENODMG 3.0 is a de facto standard for object databases and object-to-database mappings. Members of the Object Data Management Group included major object database vendors like Ardent, POET, Object Design, Objectivity, GemStone, Micro Data Base Systems, Computer Associates and Versant along with other companies (Sun Microsystems, NEC, CERN, Baan, Hitachi, Barry & Associates, Microsoft etc.). In 2001, ODMG activity was suspended.
INTRODUCTIONThe ODMG standard gives a set of specifications for writing applications that are portable at the source code level among different object data management systems — ones that integrate database capability with object-oriented language features. Thus, object-oriented languages are extended with transparently persistent data, concurrency control, data recovery, associative queries etc.
The ODMG standard was also to correspond to standards efforts such as Java Community Process (in 2003, Java bindings were superseded by JDO — Java Data Objects Specification, section 3.3), OMG (which adopted ODMG-93 in 1994 and OQL in 1995), SQL (the goal of INCITS X3H2 was to converge SQL3 and OQL), C++ and Smalltalk (X3J16 and X3J20).
The components of ODMG 3.0 are:
object model is more thoroughly presented in the following subsection. Its semantics is not formally defined, although its structure is described by ODL metamodel interfaces.
Object Specification Languages describe ODL (Object Definition Language) for defining ODMG data types; and OIF (Object Interchange Format) for migrating the contents of a database in a standard way.
Object Query Language (OQL) is a declarative language based on SQL, but more powerful than SQL, for querying and updating objects.
Programming Language Bindings for C++, Smalltalk and Java explain how to write portable code for manipulating persistent objects. They define a map to and from ODL along with bindings for invoking OQL, managing the database, and executing transactions.
3.2.2 ODMG 3.0 OBJECT MODEL
TYPES Specifications and Implementations. The ODMG object model supports encapsulation because every type has a specification consisting of implementation-independent signatures of operations, properties and exceptions, and one or more implementations through prescribed bindings to programming language data structures and methods. There are three kinds of types — interfaces specify
behavior (signatures of operations and properties), literals specify abstract state and classes specify both (see Figure 3.2).
Classes can instantiate objects and they contain enough state and behavior information to be incorporated in the OODB schema. Their operations are implemented through methods and properties mapped onto data structures.
Interfaces, on the other hand, cannot be instantiated (they function as “abstract classes”).
Implementations are supplied by classes that inherit from them.
Literals are instantiated as data structures with no operations and no OID.
Object Types. Every object or literal has a type and every operation requires typed operands.
Type equivalence is only determined by the type’s name, no implicit type conversions are provided.
Collection Objects are composed of a number of instances of the same type (an atomic, collection or literal). Supported collection types are set, bag, list, array, and dictionary. They function as type generators parameterized by the type of their members. Standard operations include tests for membership, emptiness, cardinality, collection-specific boolean operations, concatenation, indexing etc. Collections also have support for OQL queries and iterators.