CS835 - Data and Document Representation & Processing
Lecture 9 - Data: Semantic Web, Ontologies, RDF
· The Semantic Web is worldwide information linked in such a way as to be easily understandable by machines
· Idea created by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML.
· Problem: Most data on the Web in a form difficult to use on a large scale
· no global system for publishing data that can be easily processed by anyone.
· Solution - Semantic Web
What is the Problem?
What we see:
Everything you can imagine is real.
Sketches program is one of the most dynamic programs of the annual SIGGRAPH
conference, providing a forum for ideas, techniques, and uses of computer
graphics and interactive techniques.
What the Computer Sees:
New for SIGGRAPH 2005
Frequently Asked Questions
Submission Procedure Checklist
Review and Upon Acceptance
New for SIGGRAPH 2005
How to Submit Your Work
Conference Volunteer Application
Share the SIGGRAPH
Need to Add “Semantics
–E.g., Dublin Core
–Problems with this approach
–Ontologies provide a vocabulary of terms
–New terms can be formed by combining existing ones
–Meaning (semantics) of such terms is formally specified
–Can also specify relationships between terms in multiple ontologies
A Semantic Web — First Steps
Ontology in Computer Science
· An ontology is an engineering artifact:
It is constituted by a specific vocabulary used to describe a certain reality,
o plus a set of explicit assumptions regarding the intended meaning of the vocabulary.
Shared understanding of a domain of interest
Formal and machine manipulable model of a domain of interest
Structure of an Ontology
o Ontologies typically have two distinct components:
Names for important concepts in the domain
Elephant is a concept whose members are a kind of animal
Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants
Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years
Background knowledge/constraints on the domain
Adult_Elephants weigh at least 2,000 kg
All Elephants are either African_Elephants or Indian_Elephants
No individual can be both a Herbivore and a Carnivore
· Semantic Web built on syntaxes that use URIs to represent data called "Resource Description Framework" syntaxes.
· The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web.
· Intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource.
· By generalizing the concept of a "Web resource", RDF can be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web.
· A language that uses three URIs - Uniform Resource Identifiers
· In RDF, information is a collection of statements, each with a subject, verb and object - and nothing else.
· Once information is in RDF form, it can be processed
The RDF Data Model
o Statements are <subject, predicate, object> triples:
Can be represented as a graph:
o Statements describe properties of resources
A resource is any object that can be pointed to by a URI:
a document, a picture, a paragraph on the Web;
a book in the library, a real person (?)
Properties themselves are also resources (URIs)
o The subject of one statement can be the object of another
Such collections of statements form a directed, labeled graph
Note that the object of a triple can also be a “literal” (a string)
o RDF has an XML syntax that has a specific meaning:
Every Description element describes a resource
Every attribute or nested element inside a Description is a property of that Resource
We can refer to resources by using URIs
<http://xyz.org/#a> <http://xyz.org/#b> http://xyz.org/#c
<http://xyz.org/#Sean> <http://xyz.org/#name> "Sean"
The above reads as subject, verb and object – Sean has the name “Sean”
_:a1 <http://xyz.org/#name> "Sean"
This may be read as "there is something that has the name Sean", or "a1 has the name Sean, for some value of a1".
· These are called anonymous nodes, because they don't have a URI.
· Given: <http://xyz.org/#a> <http://xyz.org/#b> http://xyz.org/#c
@prefix xyz: <http://xyz.org/#>
o This gives:
@prefix xyz: <http://xyz.org/#>
:a :b :c
@prefix blargh: <http://xyz.org/#> .
blargh:a blargh:b blargh:c .
@prefix blargh: <http://xyz.org/#> .
@prefix xyz: <http://xyz.org/#> .
blargh:a xyz:b blargh:c .
@prefix : <#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix daml: <http://www.daml.org/2001/03/daml+oil#> .
@prefix log: <http://www.w3.org/2000/10/swap/log#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
python cwm.py a.n3 -rdf > a.rdf
· RDF gives a formalism for meta data annotation, and a way to write it down in XML, but it does not give any special meaning to vocabulary such as subClassOf or type
RDF Schema allows you to define vocabulary terms and the relations between those terms
it gives “extra meaning” to particular RDF predicates and resources
this “extra meaning”, or semantics, specifies how a term should be interpreted
· RDF Schema (also: RDF Schema Candidate Recommendation) was designed to be a simple datatyping model for RDF.
· Using RDF Schema, we can say:
"Fido" is a type of "Dog"
"Dog" is a sub class of animal.
· Can create properties and classes, and create ranges and domains for properties.
· All terms for RDF Schema start with "http://www.w3.org/2000/01/rdf-schema#"
1. "Resource" (rdfs:Resource)
2. "Class" (rdfs:Class)
3. "Property" (rdf:Property)
rdfs:Resource rdf:type rdfs:Class
rdfs:Class rdf:type rdfs:Class
rdf:Property rdf:type rdfs:Class
rdf:type rdf:type rdf:Property
:Dog rdf:type rdfs:Class
o Now we can say that "Fido is a type of Dog":
:Fido rdf:type :Dog
o Can create properties by saying a term is a type of rdf:Property, and then use those properties in the RDF:
:name rdf:type rdf:Property
:Fido :name "Fido"
· This says that Fido's name is "Fido"?
· The term ":Fido" is a URI, and any URI for Fido, including ":Squiggle" or ":n508s0srh"
· The URI ":Fido" is easier to remember.
· Must tell machines that his name is Fido
· More properties: rdfs:subClassOf and rdfs:subPropertyOf.
o Can say that one class or property is a sub class or sub property of another.
o e.g., "Dog" is a sub class of the class "Animal":
:Dog rdfs:subClassOf :Animal
o Also say that there are other sub classes of Animal:-
:Human rdfs:subClassOf :Animal
:Duck rdfs:subClassOf :Animal
o Create new instances of those classes:-
:Bob rdf:type :Human
:Quakcy rdf:type :Duck
Can invent another property, use that, and build up more information...
:owns rdf:type rdf:Property
:Bob :owns :Fido
:Bob :owns :Quacky
:Bob :name "Bob Fleming"
:Quacky :name "Quakcy"
· RDF Schema provides ranges and domains
o Ranges and domains specify what classes the subject and object of each property belong.
o e.g., to constain ":bookTitle" to a book with a literal value:
:Book rdf:type rdfs:Class
:bookTitle rdf:type rdf:Property
:bookTitle rdfs:domain :Book
:bookTitle rdfs:range rdfs:Literal
:MyBook rdf:type :Book
:MyBook :bookTitle "My Book"
· RDF Schema contains a set of properties for annotating schemata, providing comments, labels, and the like.
· Two properties for doing this are rdfs:label and rdfs:comment
:bookTitle rdfs:label "bookTitle";
rdfs:comment "the title of a book" .
· RDFS too weak to describe resources in sufficient detail
No localized range and domain constraints
Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants
No existence/cardinality constraints
Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents
No transitive, inverse or symmetrical properties
Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical
Difficult to provide reasoning support
No “native” reasoners for non-standard semantics
Desirable features identified for Web Ontology Language:
Extends existing Web standards
–Such as XML, RDF, RDFS
Easy to understand and use
–Should be based on familiar KR idioms
Of “adequate” expressive power
Possible to provide automated reasoning support
· Two languages developed to satisfy above requirements
–OIL: developed by group of (largely) European researchers (several from EU OntoKnowledge project)
–DAML-ONT: developed by group of (largely) US researchers (in DARPA DAML program)
Efforts merged to produce DAML+OIL
–Development was carried out by “Joint EU/US Committee on Agent Markup Languages”
–Extends (“DL subset” of) RDF
DAML+OIL submitted to W3C as basis for standardisation
–Web-Ontology (WebOnt) Working Group formed
–WebOnt group developed OWL language based on DAML+OIL
–OWL language now a W3C Candidate Recommendation
–Will soon become Proposed Recommendation
· It aims to provide a language and toolset that enables the Web to transform from a platform that focuses on presenting information to a platform that focuses on understanding and reasoning with information.
· DAML gives RDF Schema more in depth properties and classes.
· DAML provides simple terms for creating inferences.
· DAML+OIL is a language for describing ontologies, building on RDF Schema and XML Schema.
· It can be used to describe types of objects and the kinds of relationships expected between them.
· It uses references to XML Schema datatypes to describe integers, dates and other datatypes.
· DAML provides a method of saying things such as inverses, unambiguous properties, unique properties, lists, restrictions, cardinalities, pairwise disjoint lists, datatypes, and so on.
· DAML construct - daml:inverseOf
· Can say that one property is the inverse of another.
· The rdfs:range and rdfs:domain values of daml:inverseOf is rdf:Property.
· example of daml:inverseOf:
:hasName daml:inverseOf :isNameOf
:Sean :hasName "Sean"
"Sean" :isNameOf :Sean
· DAML construct - daml:UnambiguousProperty class.
· Saying that a Property is a daml:UnambiguousProperty means that if the object of the property is the same, then the subjects are equivalent.
foaf:mbox rdf:type daml:UnambiguousProperty .
:x daml:equivalentTo :y
· Inference is one of the driving principles of the Semantic Web
:MyCar de:macht "160KW" .
· A German Semantic Web processor may understand ":macht"
· An English processor may not
· Here is a piece of inference data that makes things clearer to the processor:
de:macht daml:equivalentTo en:power
· The DAML "equivalentTo" property is used to say that "macht" in the German system is equivalent to "power" in the English system.
· Using an inference engine, a Semantic Web client could successfully determine that:
:MyCar en:power "160KW"
· Merging databases becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it.
· CWM can do this
· Great levels of inference can only be provided using "First Order Predicate Logic" languages, and DAML is not a FOPL language entirely.
· Three species of OWL
–OWL full is union of OWL syntax and RDF
–OWL DL restricted to FOL fragment (¼ DAML+OIL)
–OWL Lite is “easier to implement” subset of OWL DL
–OWL DL ¼ OWL full within DL fragment
–DL semantics officially definitive
OWL DL based on SHIQ Description Logic
–In fact it is equivalent to SHOIN(Dn) DL
OWL DL Benefits from many years of DL research
–Well defined semantics
–Formal properties well understood (complexity, decidability)
–Known reasoning algorithms
–Implemented systems (highly optimised)
XMLS datatypes as well as classes in 8P.C and 9P.C
Arbitrarily complex nesting of constructors
–E.g., Person u 8hasChild.Doctor t 9hasChild.Doctor
e.g., Person u 8hasChild.Doctor t9hasChild.Doctor:
<owl:intersectionOf rdf:parseType=" collection">
<owl:unionOf rdf:parseType=" collection">
•Axioms (mostly) reducible to inclusion (v)
C ´ D iff both C v D and D v C