CS835 - Data and Document Representation
& Processing
|
Lecture 9 - Data:
Semantic Web, Ontologies, RDF |
·
The Semantic Web is
worldwide information linked in such a way as to be easily understandable by
machines
·
Idea created by Tim
Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML.
·
Problem: Most data on the Web in a form difficult to use on
a large scale
·
no global system for
publishing data that can be easily processed by anyone.
·
Solution - Semantic Web
What is the Problem?
What we see:
Siggraph 2005 Sketches Everything you can imagine is real. PABLO PICASSO The
Sketches program is one of the most dynamic programs of the annual SIGGRAPH
conference, providing a forum for ideas, techniques, and uses of computer
graphics and interactive techniques. |
What the Computer Sees:
SKETCHES INFORMATION |
|
|
New for SIGGRAPH 2005 |
|
Implementation Sketches |
|
Frequently Asked Questions |
|
Submission Guidelines |
|
Submission Procedure Checklist |
|
Review and Upon Acceptance |
|
|
GENERAL INFORMATION |
|
|
New for SIGGRAPH 2005 |
|
Deadlines |
|
How to Submit Your Work |
|
Online Submission |
|
Uploading Files |
|
Presenter Information |
|
Award Nominations |
|
Conference Volunteer Application |
|
|
> |
Share the SIGGRAPH |
Need to Add “Semantics
–E.g., Dublin Core
–Problems with this approach
–Ontologies provide a vocabulary of terms
–New terms can be formed by combining existing ones
–Meaning (semantics) of such terms is formally specified
–Can also specify relationships between terms in multiple ontologies
A Semantic Web — First Steps
Ontology in Computer Science
·
An ontology is an
engineering artifact:
It is constituted by a specific
vocabulary used to describe a certain reality,
o plus a set of explicit assumptions regarding the intended
meaning of the vocabulary.
Shared understanding of a domain of
interest
Formal and machine manipulable model of
a domain of interest
Structure of an Ontology
o Ontologies typically have two distinct components:
Names for important concepts in the
domain
Elephant is a concept whose members are a kind
of animal
Herbivore is a concept whose members are exactly
those animals who eat only plants or parts of plants
Adult_Elephant is a concept whose members are
exactly those elephants whose age is greater than 20 years
Background knowledge/constraints on
the domain
Adult_Elephants weigh at least 2,000 kg
All Elephants are either African_Elephants or
Indian_Elephants
No individual can be both a Herbivore and a Carnivore
·
Semantic Web built on syntaxes that use URIs to represent data called
"Resource Description Framework" syntaxes.
·
The Resource Description Framework (RDF) is a language
for representing information about resources in the World Wide Web.
·
Intended for representing metadata about Web
resources, such as the title, author, and modification date of a Web page,
copyright and licensing information about a Web document, or the availability
schedule for some shared resource.
·
By generalizing the concept of a "Web
resource", RDF can be used to represent information about things that can
be identified on the Web, even when they cannot be directly retrieved
on the Web.
·
A language that uses three URIs - Uniform
Resource Identifiers
·
In RDF, information is a collection of statements, each with a subject,
verb and object - and nothing else.
·
Once information is in RDF form, it can be processed
The RDF Data Model
o Statements are <subject, predicate, object>
triples:
<Frank,hasColleague,Richard>
Can be represented as a
graph:
o Statements describe properties of resources
A resource is any object that can be
pointed to by a URI:
a document, a picture, a paragraph on the Web;
http://www.cs.man.ac.uk/index.html
a book in the library, a real person (?)
isbn://5031-4444-3333
…
Properties themselves are also
resources (URIs)
Linking Statements
o The subject of one statement can be the object of
another
Such collections of
statements form a directed, labeled graph
Note that the object of
a triple can also be a “literal” (a string)
o RDF has an XML syntax that has a specific meaning:
Every Description element describes a
resource
Every attribute or nested element
inside a Description is a property of that Resource
We can refer to resources by using
URIs
<Description about="some.uri/person/ian_horrocks">
<hasColleague resource="some.uri/person/uli_sattler"/>
</Description>
<Description about="some.uri/person/uli_sattler">
<hasHomePage>http://www.cs.mam.ac.uk/~sattler</hasHomePage>
</Description>
<Description about="some.uri/person/carole_goble">
<hasColleague resource="some.uri/person/uli_sattler"/>
</Description>
<http://xyz.org/#a>
<http://xyz.org/#b> http://xyz.org/#c
<http://xyz.org/#Sean>
<http://xyz.org/#name> "Sean"
The above reads as subject, verb and object
– Sean has the name “Sean”
_:a1 <http://xyz.org/#name>
"Sean"
This may be read as "there is something that has
the name Sean", or "a1 has the name Sean, for some value of a1".
·
These are called anonymous nodes, because they don't have a URI.
·
Given: <http://xyz.org/#a>
<http://xyz.org/#b> http://xyz.org/#c
@prefix xyz: <http://xyz.org/#>
o This gives:
@prefix xyz: <http://xyz.org/#>
:a :b :c
@prefix blargh: <http://xyz.org/#>
.
blargh:a blargh:b blargh:c .
@prefix blargh: <http://xyz.org/#>
.
@prefix xyz: <http://xyz.org/#> .
blargh:a xyz:b blargh:c .
@prefix : <#> .
@prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix daml:
<http://www.daml.org/2001/03/daml+oil#> .
@prefix log:
<http://www.w3.org/2000/10/swap/log#> .
@prefix dc:
<http://purl.org/dc/elements/1.1/> .
@prefix foaf:
<http://xmlns.com/foaf/0.1/> .
python cwm.py a.n3 -rdf > a.rdf
· RDF gives a formalism for meta data annotation, and a
way to write it down in XML, but it does not give any special meaning to
vocabulary such as subClassOf or type
RDF Schema allows you to
define vocabulary terms and the relations between those terms
it gives “extra meaning” to
particular RDF predicates and resources
this “extra meaning”, or semantics, specifies how a term should be interpreted
·
RDF Schema (also: RDF Schema Candidate
Recommendation) was designed to be a simple datatyping model for RDF.
·
Using RDF Schema, we can say:
"Fido" is a type of "Dog"
"Dog" is a sub class of animal.
·
Can create properties and classes, and create ranges and domains for
properties.
·
All terms for RDF Schema start with "http://www.w3.org/2000/01/rdf-schema#"
1.
"Resource" (rdfs:Resource)
2.
"Class" (rdfs:Class)
3.
"Property" (rdf:Property)
rdfs:Resource rdf:type rdfs:Class
rdfs:Class rdf:type rdfs:Class
rdf:Property rdf:type rdfs:Class
rdf:type rdf:type rdf:Property
:Dog rdf:type rdfs:Class
o Now we can say
that "Fido is a type of Dog":
:Fido rdf:type :Dog
o Can create
properties by saying a term is a type of rdf:Property, and then use those
properties in the RDF:
:name rdf:type rdf:Property
:Fido :name "Fido"
· This says that
Fido's name is "Fido"?
· The term
":Fido" is a URI, and any URI for Fido, including
":Squiggle" or ":n508s0srh"
· The URI
":Fido" is easier to remember.
· Must tell machines
that his name is Fido
· More properties:
rdfs:subClassOf and rdfs:subPropertyOf.
o Can say that one
class or property is a sub class or sub property of another.
o e.g.,
"Dog" is a sub class of the class "Animal":
:Dog rdfs:subClassOf :Animal
o Also say that
there are other sub classes of Animal:-
:Human rdfs:subClassOf :Animal
:Duck rdfs:subClassOf :Animal
o Create new
instances of those classes:-
:Bob rdf:type :Human
:Quakcy rdf:type :Duck
Can invent another property, use that, and build up more information...
:owns rdf:type rdf:Property
:Bob :owns :Fido
:Bob :owns :Quacky
:Bob :name "Bob Fleming"
:Quacky :name "Quakcy"
· RDF Schema
provides ranges and domains
o Ranges and domains
specify what classes the subject and object of each property belong.
o e.g., to constain
":bookTitle" to a book with a literal value:
:Book rdf:type rdfs:Class
:bookTitle rdf:type rdf:Property
:bookTitle rdfs:domain :Book
:bookTitle rdfs:range rdfs:Literal
:MyBook rdf:type :Book
:MyBook :bookTitle "My Book"
· RDF Schema
contains a set of properties for annotating schemata, providing comments,
labels, and the like.
· Two properties for
doing this are rdfs:label and rdfs:comment
· e.g.:
:bookTitle rdfs:label
"bookTitle";
rdfs:comment "the title of a
book" .
· RDFS too weak to describe resources in sufficient detail
No localized range and domain constraints
Can’t say that the range of hasChild
is person when applied to persons and elephant when applied to elephants
No existence/cardinality constraints
Can’t say that all instances
of person have a mother that is also a person, or that persons have exactly 2
parents
No transitive, inverse or symmetrical properties
Can’t say that isPartOf is a
transitive property, that hasPart is the inverse of isPartOf or that touches is
symmetrical
Difficult to provide reasoning support
No “native” reasoners for
non-standard semantics
Desirable features identified for Web Ontology Language:
Extends existing Web standards
–Such as XML, RDF, RDFS
Easy to understand and
use
–Should be based on familiar KR idioms
Formally specified
Of “adequate” expressive
power
Possible to provide
automated reasoning support
·
Two languages developed
to satisfy above requirements
–OIL: developed by group of (largely)
European researchers (several from EU OntoKnowledge project)
–DAML-ONT: developed by group of
(largely) US researchers (in DARPA DAML program)
Efforts merged to produce DAML+OIL
–Development was carried out by “Joint EU/US Committee on Agent Markup
Languages”
–Extends (“DL subset” of) RDF
DAML+OIL submitted to W3C as basis
for standardisation
–Web-Ontology (WebOnt) Working Group formed
–WebOnt group developed OWL language based on DAML+OIL
–OWL language now a W3C Candidate Recommendation
–Will soon become Proposed Recommendation
· DAML , The DARPA Agent Markup Language is a language created by DARPA
· It aims to provide
a language and toolset that enables the Web to transform from a platform that
focuses on presenting information to a platform that focuses on understanding
and reasoning with information.
· DAML gives RDF
Schema more in depth properties and classes.
· DAML provides simple
terms for creating inferences.
· DAML+OIL is a
language for describing ontologies, building on RDF Schema and XML Schema.
· It can be used to
describe types of objects and the kinds of relationships expected between them.
· It uses references
to XML Schema datatypes to describe integers, dates and other datatypes.
· DAML provides a
method of saying things such as inverses, unambiguous properties, unique
properties, lists, restrictions, cardinalities, pairwise disjoint lists,
datatypes, and so on.
· DAML construct - daml:inverseOf
· Can say that one
property is the inverse of another.
· The rdfs:range and
rdfs:domain values of daml:inverseOf is rdf:Property.
· example of
daml:inverseOf:
:hasName daml:inverseOf :isNameOf
:Sean :hasName "Sean"
"Sean" :isNameOf :Sean
· DAML construct - daml:UnambiguousProperty
class.
· Saying that a
Property is a daml:UnambiguousProperty means that if the object of the property
is the same, then the subjects are equivalent.
· example:
foaf:mbox rdf:type daml:UnambiguousProperty .
:x foaf:mbox
:y foaf:mbox
implies that:-
:x daml:equivalentTo :y
· Inference is one
of the driving principles of the Semantic Web
· Example:
:MyCar de:macht "160KW" .
· A German Semantic
Web processor may understand ":macht"
· An English
processor may not
· Here is a piece of
inference data that makes things clearer to the processor:
de:macht daml:equivalentTo en:power
· The DAML
"equivalentTo" property is used to say that "macht" in the
German system is equivalent to "power" in the English system.
· Using an inference
engine, a Semantic Web client could successfully determine that:
:MyCar en:power "160KW"
· Merging databases
becomes a matter of recording in RDF somewhere that "Person Name" in
your database is equivalent to "Name" in my database, and then
throwing all of the information together and getting a processor to think about
it.
· CWM can do this
· Great levels of
inference can only be provided using "First Order Predicate Logic"
languages, and DAML is not a FOPL language entirely.
· Three species of OWL
–OWL full is union of OWL
syntax and RDF
–OWL DL restricted to FOL
fragment (¼ DAML+OIL)
–OWL Lite is “easier to
implement” subset of OWL DL
Semantic layering
–OWL DL ¼ OWL full within DL fragment
–DL semantics officially definitive
OWL DL based on SHIQ Description Logic
–In fact it is equivalent to SHOIN(Dn) DL
OWL DL Benefits from
many years of DL research
–Well defined semantics
–Formal properties well understood (complexity, decidability)
–Known reasoning
algorithms
–Implemented systems (highly optimised)
XMLS datatypes as well as classes in 8P.C and 9P.C
–E.g., 9hasAge.nonNegativeInteger
Arbitrarily complex nesting of constructors
–E.g., Person u 8hasChild.Doctor t 9hasChild.Doctor
e.g., Person u 8hasChild.Doctor t9hasChild.Doctor:
<owl:Class>
<owl:intersectionOf rdf:parseType="
collection">
<owl:Class
rdf:about="#Person"/>
<owl:Restriction>
<owl:onProperty
rdf:resource="#hasChild"/>
<owl:toClass>
<owl:unionOf rdf:parseType="
collection">
<owl:Class
rdf:about="#Doctor"/>
<owl:Restriction>
<owl:onProperty
rdf:resource="#hasChild"/>
<owl:hasClass
rdf:resource="#Doctor"/>
</owl:Restriction>
</owl:unionOf>
</owl:toClass>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
•Axioms (mostly) reducible to inclusion (v)
C ´ D iff both C v D and D v C
Ontology Editors
Swoop http://www.mindswap.org/2004/SWOOP/
OilEd http://oiled.man.ac.uk/
Protégé http://protege.stanford.edu/
OntoEdit http://www.ontoprise.de/products/ontoedit_en