«OBJECT DATABASES AND THE SEMANTIC WEB A THESIS SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ING. JAKUB ...»
Subject, Description, Type: A list of keywords or phrases (ideally from some controlled vocabulary); an overview of the resource in form of free-text, abstract or a table of contents; and the nature or genre of the content (DC–types vocabulary).
Creator, Publisher, Contributor: Name of an entity — person, organization, or service — primarily responsible for making, publishing and contributing to the content of the resource.
Source, Relation: A reference to a resource from which this one is derived; a reference to a related resource (in the same form as identifiers).
Language, Coverage, Rights: A language of the resource (ISO 3066); extent of its content, typically spatial information (TGN, Thesaurus of Geographic Names), time period (ISO 8601) or jurisdiction; and information about rights held over it such as copyright or intellectual property rights.
Information using the Dublin Core elements may be represented in any suitable language (e.g., in HTML meta elements); however, RDF is an ideal representation for Dublin Core information, and a widely used one. Moreover, basically all of these controlled vocabularies exist in form of RDF URI identifiers or XML Schema Datatypes, and can be formally expressed in RDF.
A description of a World Wide Web page in DC/RDF could look like this:
rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:Description rdf:about="http://www.fit.vutbr.cz/" dc:titleSODA – Semantic Object–Oriented Database Program/dc:title dc:publisherFaculty of Information Technology, VUT Brno/dc:publisher dc:date2003-05-19/dc:date dc:subject rdf:Bag rdf:liSemantic Web/rdf:li rdf:liObject–Oriented Databases/rdf:li rdf:liSODA/rdf:li /rdf:Bag /dc:subject dc:formattext/html/dc:format dc:languageen/dc:language /rdf:Description /rdf:RDF
RSS — REMOTE SITE SYNDICATIONPeople need to access many small pieces of information on the Web on a day-to-day basis — schedules, to-do lists, news headlines, search results, "What's New", etc. As the sources and diversity of the information on the Web increases, a language was created to provide lightweight information description format for timely, large-scale distribution and reuse.
RSS 1.0 ("RDF Site Summary" RDF vocabulary, [BBD00]) is perhaps the most widely deployed RDF application on the Web; it is used in many news syndication sites and desktop readers. It keeps evolving and the recent 2.0 version is not a RDF application anymore — instead, it contains a module (Simple Semantic Resolution) that can translate its simple XML files into XML/RDF.
RSS defines properties that describe news sources and news items, including their uriref, name and short description, along with many optional features such as publication date, category, time to live or ratings. These descriptions can be transparently extended by additional modules. An example of RSS
1.0 feed from [MM04] is:
4.4.2 XMP – EXTENSIBLE METADATA PROTOCOL "XMP is an important piece that brings the Semantic Web closer to realization" Eric Miller, W3C Semantic Web Activity Lead XMP [Adobe04] is a labeling technology for embedding metadata information in files. It was developed by Adobe Corp., is now implemented in all Adobe applications (including Photoshop, Acrobat or Illustrator) and includes a downloadable kit for embedding the technology in other software packages and file types. Current
version of the specification provides the means for inserting XMP data into the following filetypes:
TIFF, JPG, JPG 2000, GIF, PNG, AI (Adobe Illustrator), PSD (Adobe Photoshop), and SVG/XML images; and HTML, PDF, Postscript, XML and EPS documents.
The XMP data model is based on RDF/XML, although it has several limitations — the top-level resources it describes are always documents or their portions, serialization to XML only has limited syntax, some RDF tags are ignored (rdf:ID, rdf:parseType=’Literal’, rdf:aboutEach). On the other hand, XMP supports most RDF concepts, including advanced ones like blank nodes, complete container vocabulary, and even reification. For each property, it also defines its range and type (among these are Boolean values, several integer and real formats, vocabulary choices, dates, MIME types, and dimensions).
XMP uses RDF vocabularies from other sources (Dublin Core and a RDF version of EXIF schemas [EXIF02]) and defines several new vocabulary extensions for areas like:
Basic metadata like the unique identifier of the resource (including the name of identification system), timestamps, thumbnails, and information about the tool that created the document Rights management, including copyright owners, usage terms, and a reference to an online rights management certificate or a WWW page with human-readable information about the rights Workflow automation, which includes versioning information, history, references to corresponding management systems and their user interfaces, and references to any jobs that this document participates in Application-specific properties that include page and document size, PDF version and keywords, Photoshop metadata etc.
Adobe XMP serves as a good example of a large company adopting and promoting the RDF standard as a foundation for the emerging Semantic Web.
a “Semantic Web framework” that provides the user with tools for developing Semantic Web
applications. It is written in Java and available at http://jena.sourceforge.net. Jena consists of:
RDF API that can manipulate RDF graphs (triple-oriented or resource-oriented), and includes import and export for RDF/XML (with its own ARP parser), N3 [BernersLee02] and N-Triples [MM04]. It has full support for RDF constructs (containers, typed literals) and allows the application to extend the behavior of resources, for example by adding method calls.
Persistence subsystem provides RDF graphs with back-end database storage in relational databases (MySQL, Oracle and PostgreSQL, portable to any SQL DBMS). The user can influence the degree of denormalization for tables and set the optimum balance between speed and storage size. While other RDF databases often include optimizations for RDFS classes (as standalone tables), this is not the case with Jena, which leads to more flexibility in case of schema changes.
Reasoning Subsystem provides a generic rule based inference engine extensible with arbitrary rule sets. Current configurations available for the engine include a stable implementation of RDFS and a development version of OWL/Lite.
Ontology Subsystem to support programmers who are working with RDF-based ontology data, specifically OWL, DAML+OIL and RDFS. The Java classes for RDF Resource and Property are extended to model more directly classes, properties and relationship found in ontologies. The ontology API works closely with the reasoning subsystem to derive additional information from ontologies. The subsystem also includes a document manager for imported ontologies.
Query language (RDQL) for RDF data whose implementation is coupled to relational database storage so that optimized query is performed over data held in a Jena relational persistence store.
Some of these optimizations include FastPath support and partial translation of RDQL queries into direct SQL queries.
4.4.4 TAP TAP [GmC03] is a platform developed by IBM in cooperation with Stanford. It aims at resolving different concerns that appear as the Semantic Web is enabled in form of a worldwide virtual database. With the growing amount of RDF data, it becomes necessary to concentrate on not only RDF logical foundations, inference engines and ontological extensions, but also issues that enable the networking of Semantic
Web data in a simple and unified way. TAP tries to address the following issues:
A query language called GetData that could be provided by all Semantic Web servers. While numerous RDF query languages already exist, GetData is intended to be very simple and predictable so that query processing is limited to using just a small amount of server’s computational resources (unlike, for example, SQL), and queries are easy to formulate and evaluate.
A GetData query simply states the subject and property name and returns the object of a corresponding RDF triple. Additional commands include searching resource names by a substring, getting all properties of a given object, and querying for the object of a triple. GetData runs as a Web service and communicates using the SOAP protocol [BEK00]. In addition, it retrieves data globally, not just from one given server — in a way, this is similar to the distributed DNS domain name service, only instead of retrieving an IP address for a host name, it retrieves RDF triples.
Semantic Negotiation (earlier called reference by description) tries to overcome the problem of reasoning about resources that have the same meaning but different parties use different unique identifiers for them. Resources are matched based on the values of their properties instead of their ID itself. When two parties communicate with each other, they start out with a small shared Semantic Web as an Object-oriented Database 42 vocabulary (like Dublin Core metadata [DC03]) and continue to agree on a growing number of vocabulary terms.
Trust management is necessary to make the Semantic Web useful for business purposes. Since anyone can publish any RDF information, not all sources on the Internet can be trusted. TAP creates a web of trust by building a network of registries that contain lists of other trusted registries, so the user just lists several trusted registries and they, in turn, only provide information that can be found within their own web of trust.
The TAP system currently consists of: TAPache, a module for the Apache HTTP server1 that implements the GetData interface for RDF files located in a special directory, much like what the server does with HTML pages; a client library with and application programming interface that allows user programs to obtain and consume data obtained through GetData; and a registry that contains a mechanism similar to the DNS registry, implements the tools for the distributed Web of Trust and caches query results.
4.4.5 THE MOZILLA PROJECT Mozilla [SA00]2 is a popular open-source communicator that includes a Web browser, e-mail and news client, and HTML composer. It is less known that it also provides a complete framework for development of cross-platform distributed applications, and that its internal data representation format is based on RDF. This gives a very interesting example of using the Semantic Web data format within an environment of a large software system that stretches the common understanding of “Web resources” to the practical level of application development.
In Mozilla, RDF datasources are transparently accessible through the component model or through a custom implementation. The component framework encapsulates them and adds some unique manipulation possibilities — RDF graphs can be stored and used both locally and remotely, portions of RDF graphs can easily be exchanged or integrated between components or different computers, and every application can define its own way of handling graph updates. Displaying RDF data is possible through XUL templates that match property names to template expressions, automatically display the graph structure and provide means for conditional formatting and layout of different parts of the RDF graph.
In Mozilla, RDF datasources are currently used to store information about bookmarks, remote site maps, local filesystems, networks of “related links”, Web search engines, browsing profiles, address books etc.