CS835 - Data and
Document Representation & Processing
|
Lecture 2 – XML I |
Recommended textbooks:
S. Holzner, Sams Teach
yourself XML in 21 Days, 3rd edition, 2004.
C. Bates, XML in Theory
and Practice, Wiley, 2003.
The following examples are
from Holzner:
Sample HTML doc:
Text View |
Browser View |
<HTML> <HEAD> <TITLE>Hello From HTML</TITLE> </HEAD> <BODY> <CENTER> <H1> An HTML
Document </H1> </CENTER> This is an HTML
document! </BODY> </HTML> |
|
Sample XML doc:
Text View |
Browser View |
<?xml version="1.0" encoding="UTF-8"?> <document> <heading> Hello From XML </heading> <message> This is an XML
document! </message> </document> |
<?xml
version="1.0" encoding="UTF-8" ?> - <document> <heading>Hello From
XML</heading> <message>This text is
inside a <message> element.</message> </document> |
Sample XML with Stylesheets:
Test View |
Browser View |
Xml files contents: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/css"
href="ch01_04.css"?> <document> <heading> Hello From XML </heading> <message> This is an XML
document! </message> </document> |
|
Stylesheet file contents (ch01_04.css): heading {display: block; font-size:
24pt; color: #ff0000; text-align: center} message{display: block; font-size:
18pt; color: #0000ff; text-align: center} |
Extracting Content:
JavaScript:
Text View |
|
<HTML> <HEAD> <TITLE> Retrieving
data from an XML document </TITLE> <XML ID="firstXML" SRC="ch01_02.xml"></XML> <SCRIPT
LANGUAGE="JavaScript"> function
getData() { xmldoc=
document.all("firstXML").XMLDocument; nodeDoc
= xmldoc.documentElement; nodeHeading
= nodeDoc.firstChild;
outputMessage = "Heading: " +
nodeHeading.firstChild.nodeValue;
message.innerHTML=outputMessage; } </SCRIPT> </HEAD> <BODY> <CENTER> <H1>
Retrieving data from an XML document </H1> <DIV ID="message"></DIV> <P> <INPUT TYPE="BUTTON" VALUE="Read the heading" ONCLICK="getData()"> </CENTER> </BODY> </HTML> |
|
Source Document: ch01_02.xml <?xml version="1.0" encoding="UTF-8"?> <document> <heading> Hello From XML </heading> <message> This is an XML
document! </message> </document> |
XML Editors:
Amaya - free
XML
Spy – free home edition
XMLWriter – 30 day trial
o Valid tags
begin with A to Z, _ , a to z
o Second characters may be digits 0 – 9, - , and .
o Tag names are case sensitive
o Tag names cannot include white space
<book> XML in Theory and Practice </book>
<name> Professor F. T. Marchese </name>
Rules:
o An element must have start and end tags unless it is an
empty element
o Start and end tags must form a matched pair
Only have one tag: Syntax
…< />
<heading/>
<heading text =
“Hello from XML” />
o Each well formed document must contain a root element
with any legal name
o This element contains all other elements
e.g.
<document>
<heading>
</heading>
<message>
This is an XML document!
</message>
</document>
Nesting elements: tags must pair-up inside XML so they are closed in reverse
order:
<document>
<heading>
</heading>
<message>
This is an XML document!
</message>
</document>
o ASCII – 1 byte – 256 characters
o Unicode – 2 bytes 65536 characters
o UCS – Universal character system - 4 bytes – 4.3
billion characters
XML supports:
US-ASCII – US ASCII
UTF-8 -- Compressed Unicode -- two bytes – 1st
byte ASCII , 2nd byte Unicode subset.
UTF-16 – Compressed UCS
ISO-10646-UCS-2 -- Unicode
In practice… XML
“processors” support UTF-8
<?xml
version="1.0" encoding="UTF-8"?>
<document>
<heading>
Hello From XML
</heading>
<message>
This is an XML
document!
</message>
</document>
Character Reference
Character |
Sequence |
< |
< |
> |
> |
‘ |
' |
& |
& |
“ |
" |
e.g.
<message> This text is inside a
<message> element. </message>
Result: This text is inside a <message> element.
<!-- This is a comment -->
Attributes may appear in:
o Elements
o Processing instructions
o XML declarations
Syntax:
attributename =
“value”
e.g.
<brush width=”10”
height =”5” color=”cyan” />
<point x=”10”
y=”100” />
<book title=”Home
Alone 2” review=”bad” />
CDATA are sections of the XML
document that are not parsed.
CDATA – Character Data
PCDATA – Parsed Character Data
<?xml
version="1.0" standalone="yes" ?> - <document> - <text> Here's how the
element starts: -
<![CDATA[ <employee status="retired"> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> . . . ]]> </text> </document> |
Namespace – a unique identifier for a set of names within an
XML document
Declaring a Namespace: assign xmlns:prefix
attribute to a unique identifier, e.g.
xmlns:hr=http://www.superduperbigco.com/human_resources
The URIs (Uniform Resource Identifiers) or URLs specified can
point to a document such as a DTD or schema.
Original document |
- <document> - <employee> - <name> <lastname>Kelly</lastname>
<firstname>Grace</firstname>
</name> <hiredate>October 15,
2005</hiredate> - <projects> - <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> - <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> </document> |
Document using namespaces |
- <hr:employee xmlns:hr="http://www.superduperbigco.com/human_resources" xmlns:boss="http://www.superduperbigco.com/big_boss"> - <hr:name> <hr:lastname>Kelly</hr:lastname>
<hr:firstname>Grace</hr:firstname>
</hr:name> <hr:hiredate>October
15, 2005</hr:hiredate> <boss:comment>Needs
much supervision.</boss:comment> - <hr:projects> - <hr:project> <hr:product>Printer</hr:product>
<hr:id>111</hr:id> <hr:price>$111.00</hr:price>
</hr:project> - <hr:project> <hr:product>Laptop</hr:product>
<hr:id>222</hr:id> <hr:price>$989.00</hr:price>
</hr:project> </hr:projects> </hr:employee> |
o A DTD
defines the formal rules of a documents structure
o Lists elements, attributes, and entities that may be
used in the document
o Defines the
relationship among elements, attributes,
and entities
o DTDs outline the tree structure of an XML document
o DTDs have own structure and syntax
o DTD is a series of declarations of the form <! >
o DTDs contain 4 keywords:
o
ELEMENT – which defines
a tag
o
ATTRIBUTE – which defines
an attribute of an ELEMENT
o
ENTITY – which is used
to define an ENTITY
o
NOTATION – which
defines a data type
e.g. from Bates:
<!DOCTYPE letter[ <!ELEMENT letter
(address)> <!ELEMENT address (line1, line2?, line3*, city,
(county|state)?, country?, code?)> <!ELEMENT line1 (#PCDATA)> <!ELEMENT line2 (#PCDATA)> <!ELEMENT line3 (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT county (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT code (#PCDATA)> ]> |
o DTD describes structure of XML document starting with
root node – letter
o DTD is declared by using a <!DOCTYPE> element
o <!DOCTYPE> element syntax:
o
<!DOCTYPE
rootname [DTD]>
o
<!DOCTYPE rootname
SYSTEM URI>
o
<!DOCTYPE rootname
SYSTEM URI [DTD]>
o Each tag is
declared as an ELEMENT
o
Each element may
contain data or more elements, and may have further attributes
o
The structure must be
declared as 1st element, e.g.
<!ELEMENT letter (address)>
o
ELEMENT content follows
name and is in parentheses
o Content is a list of items separated by “,” or “|” –
known as content model
o Root node has another ELEMENT as its content -(address)
o Address element contains all components:
<!ELEMENT address (line1, line2?, line3*, city, (county|state)?,
country?, code?)>
o Comma between elements means that all may be in XML
document
o Element ordering is logical for human understanding,
not required by XML.
o Parentheses used for grouping, and | is logical OR
o Symbols
after items signify appearance:
Symbol |
Example |
Meaning |
Asterisk |
item* |
Item appears zero or more times |
Comma |
(item1, item2, item3) |
Separates items in sequence |
None |
item |
Item appears exactly once |
Parentheses |
(item1, item2) |
Encloses group of items |
Pipe |
(item1 | item2) |
Separates a set of alternatives |
Plus |
Item+ |
Item appears at least once |
Question Mark |
Item? |
Item appears once or not at all |
o
Parsed character data - <!ELEMENT line1 (#PCDATA)>
o Mixed
content model - <!ELEMENT line1 (#PCDATA |
house_number | street_name)*>
o Must obey
this form -> #PCDATA -> other elements separated by pipe -> followed
by *
o Attributes
give additional info about element or content
o Attributes
declared separately and associated with element:
<!ATTLIST
element attribute type default>
o element – name of
element to which the attribute applies
o attribute
- attribute name
o type – XML data
type
o default - XML attribute defaults
e.g.
<!ELEMENT country (#PCDATA)> <!ATTLIST country continent
(Europe | Asia | Africa | North America )”Asia” language CDATA
#IMPLIED> |
o element – country
o attribute
- continent – followed by an enumerated list
of values
o default - Asia
o attribute
- language – followed by CDATA
o default - #IMPLIED
XML Attribute Types
Type
|
Usage |
CDATA |
Character data –
not parsed |
ENTITY |
Attribute values
is reference to an entity declared elsewhere in DTD |
ENTITIES |
Multiple
entities referenced |
ID |
Identifies a
location within document |
IDREF |
References an ID
declared elsewhere in DTD – used for hyperlinking in document |
IDREFS |
Multiple Ids
linked |
NMTOKEN |
Value can be
word or token |
NMTOKENS |
A list of tokens |
NOTATION |
NOTATION
declared elsewhere |
Enumeration |
List of possible
values in parens |
XML Attribute Defaults
Default |
Usage |
#REQUIRED |
Value must be
given for each element that has an attribute |
#IMPLIED |
Attribute is
optional – no value must be given |
#FIXED
value |
Attribute must
have value given |
Default |
Default value is
given for attribute |
o XML
document separated into number of components called Entities
o Each entity has a unique name
o Entities use to:
o
Split large documents
o
Content needs to be
used in a number of places with document without duplication
o
Different systems may
render same content in different ways
o Declaration:
o
<!ENTITY
name definition>
o
<!ENTITY
name SYSTEM system_identifier [NOTATION]>
o
<!ENTITY
name PUBLIC [public_identifier] system_identifier [NOTATION]>
o Internal entity - simplest definition –– within DTD – wherever referenced in XML
document content in DTD will be substituted for reference.
o
Internal entity
definition - <!ENTITY name
definition>
o
External reference – refers to content outside DTD
and XML file – may be on remote system
o <!ENTITY locationmap SYSTEM
“./images/home.png” NDATA PNG>
§ URI - “./images/home.png”
§ NDATA – Notation data type follows
§ PNG – type of data
NOTATIONS normally specify
applications that can process data:
e.g.
<!NOTATION PNG
SYSTEM “/usr/bin/display”>
<!NOTATION gif SYSTEM "gifviewer.exe">
Internal DTD –
<!DOCTYPE
rootnode[ ]> |
External DTD –
<?xml
version="1.0"?> <!DOCTYPE
rootnode SYSTEM | PUBLIC [public_identifier] URI> |
Example:
from Holzner
XML file |
<?xml
version = "1.0" encoding="UTF-8"
standalone="no"?> <!DOCTYPE
document SYSTEM "ch04_07.dtd"> <document> <employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15,
2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> </document> |
DTD file |
<!ELEMENT
document (employee)*> <!ELEMENT
employee (name, hiredate, projects)> <!ELEMENT
name (lastname, firstname)> <!ELEMENT
lastname (#PCDATA)>
<!ELEMENT
firstname (#PCDATA)> <!ELEMENT
hiredate (#PCDATA)> <!ELEMENT
projects (project)*>
<!ELEMENT
project (product,id,price)> <!ELEMENT
product (#PCDATA)>
<!ELEMENT id
(#PCDATA)> <!ELEMENT price (#PCDATA)> |
XML Schemas:
o Provide a means for defining
the structure, content and semantics of XML documents through XML itself.
o Define a richer set of data
types such as booleans, numbers, dates and times, and currencies than the more
traditional DTD
o XML Schemas make it easier
to validate documents based on namespaces
o Defined in the W3C's XML
Schema Working Group
Purpose - to define the legal building blocks of an
XML document
An XML Schema:
Using Schema:
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- Define the actual document --> <xsd:element name="letter"> </xsd:element> </xsd:schema> |
o Content of schema – mostly element
definitions
o Elements may contain
sub-elements (e.g. string or numbers, or both)
o Simple types - Elements
that contain only data
o Complex types – all others
Example: Mortgage file (Holzner)
XML file |
<?xml
version="1.0" encoding="UTF-8"?> <document documentDate="2005-03-02"> <comment>Good risk</comment> <mortgagee phone="888.555.1234"> <name>James
Blandings</name> <location>1234 299th St</location> <city>New York</city> <state>NY</state> </mortgagee> <mortgages> <mortgage loanNumber="66 7777 88"> <property>The Hackett
Place</property> <date>2005-03-01</date> <loanAmount>80000</loanAmount> <term>15</term> </mortgage> <mortgage loanNumber="11 8888 22"> <property>123 Acorn
Drive</property> <date>2005-03-01</date> <loanAmount>90000</loanAmount> <term>15</term> </mortgage> <mortgage loanNumber="33 4444 11"> <property>99 West Pocusset
St</property> <date>2005-03-02</date> <loanAmount>100000</loanAmount> <term>30</term> </mortgage> <mortgage loanNumber="55 3333 88"> <property>19 Johnson
Place</property> <date>2005-03-02</date> <loanAmount>110000</loanAmount> <term>30</term> </mortgage> <mortgage loanNumber="22 6666 99"> <property>345 Notingham
Court</property> <date>2005-03-02</date> <loanAmount>120000</loanAmount> <term>30</term> </mortgage> </mortgages> <bank phone="888.555.8888"> <name>XML Bank</name> <location>12 Schema
Place</location> <city>New York</city> <state>NY</state> </bank> </document> |
XSD file |
<?xml
version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation> Mortgage record XML schema. </xsd:documentation> </xsd:annotation> <xsd:element name="document" type="documentType"/> <xsd:complexType name="documentType"> <xsd:sequence> <xsd:element ref="comment"/> <xsd:element name="mortgagee" type="recordType"/> <xsd:element name="mortgages" type="mortgagesType"/> <xsd:element name="bank" type="recordType"/> </xsd:sequence> <xsd:attribute name="documentDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="recordType"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="location" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="phone" type="xsd:string" use="optional"/> </xsd:complexType> <xsd:complexType name="mortgagesType"> <xsd:sequence> <xsd:element name="mortgage" minOccurs="0" maxOccurs="8"> <xsd:complexType> <xsd:sequence> <xsd:element name="property" type="xsd:string"/> <xsd:element name="date" type="xsd:date" minOccurs="0"/> <xsd:element name="loanAmount" type="xsd:decimal"/> <xsd:element name="term"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:maxInclusive value="30"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> <xsd:attribute name="loanNumber" type="loanNumberType"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <xsd:simpleType name="loanNumberType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{2} \d{4}
\d{2}"/> </xsd:restriction> </xsd:simpleType> <xsd:element name="comment" type="xsd:string"/> </xsd:schema> |
XML Schema elements are grouped by their function: top level elements, particles, multiple XML documents and namespaces, identity constraints, attributes, named attributes, complex type definitions, and simple type definitions.
The following are elements that appear at the top level of a schema document.
Element |
Description |
Defines an annotation. |
|
Declares an attribute. |
|
Groups a set of attribute declarations so that they can be
incorporated as a group for complex type definitions. |
|
Defines a complex type, which determines the set of
attributes and the content of an element. |
|
Declares an element. |
|
Groups a set of element declarations so that they can be incorporated
as a group into complex type definitions. |
|
Identifies a namespace whose schema components are
referenced by the containing schema. |
|
Includes the specified schema document in the target
namespace of the containing schema. |
|
Contains the definition of a notation to describe the
format of non-XML data within an XML document. An XML Schema notation
declaration is a reconstruction of XML 1.0 NOTATION declarations. |
|
Allows simple and complex types, groups, and attribute
groups that are obtained from external schema files to be redefined in the
current schema. |
|
Defines a simple type, which determines the constraints on
and information about the values of attributes or elements with text-only
content. |
The following are elements that can have minOccurs and maxOccurs attributes. Such elements always appear as part of a complex type definition or as part of a named model group.
Element |
Description |
Allows the elements in the group to appear (or not appear) in any order in the containing element. |
|
Enables any element from the
specified namespace(s) to appear in the containing sequence or choice
element. |
|
Allows one and only one of the
elements contained in the selected group to be present within the containing
element. |
|
Declares an element. |
|
Groups a set of element
declarations so that they can be incorporated as a group into complex type
definitions. |
|
Requires the elements in the
group to appear in the specified sequence within the containing element. |
The following are elements that bring in schema elements from other namespaces or redefine schema elements in the same namespace.
Element |
Description |
Identifies a namespace whose schema components are
referenced by the containing schema. |
|
Includes the specified schema document in the target
namespace of the containing schema. |
|
Allows simple and complex types, groups, and attribute
groups that are obtained from external schema files to be redefined in the
current schema. |
The following are elements that are related to identity constraints.
Element |
Description |
Specifies an XML Path Language (XPath) expression that
specifies the value (or one of the values) used to define an identity
constraint (unique, key, and keyref elements). |
|
Specifies that an attribute or element value (or set of
values) must be a key within the specified scope. The scope of a key is the
containing element in an instance document. A key must be unique,
non-nillable, and always present. |
|
Specifies that an attribute or element value (or set of
values) correspond to those of the specified key or unique
element. |
|
Specifies an XPath expression that selects a set of
elements for an identity constraint (unique, key, and keyref
elements). |
|
Specifies that an attribute or element value (or a
combination of attribute or element values) must be unique within the
specified scope. The value must be unique or nil. |
The following are elements that define attributes in schemas.
Element |
Description |
Enables any attribute from the specified namespace(s) to
appear in the containing complexType element or in the containing attributeGroup
element. |
|
Declares an attribute. |
|
Groups a set of attribute declarations so that they can be
incorporated as a group for complex type definitions. |
The following are elements that define named constructs in schemas. Named constructs are referred to with a QName by other schema elements.
Element |
Description |
Declares an attribute. |
|
Groups a set of attribute declarations so that they can be
incorporated as a group for complex type definitions. |
|
Defines a complex type, which determines the set of
attributes and the content of an element. |
|
Declares an element. |
|
Groups a set of element declarations so that they can be
incorporated as a group into complex type definitions. |
|
Specifies that an attribute or element value (or set of
values) must be a key within the specified scope. The scope of a key is the
containing element in an instance document. A key must be unique,
non-nillable, and always present. |
|
Specifies that an attribute or element value (or set of
values) correspond to those of the specified key or unique
element. |
|
Contains the definition of a notation to describe the
format of non-XML data within an XML document. An XML Schema notation
declaration is a reconstruction of XML 1.0 NOTATION declarations. |
|
Defines a simple type, which determines the constraints on
and information about the values of attributes or elements with text-only
content. |
|
Specifies that an attribute or element value (or a
combination of attribute or element values) must be unique within the specified
scope. The value must be unique or nil. |
The following are elements that create complex type definitions.
Element |
Description |
Allows the elements in the group to appear (or not appear)
in any order in the containing element. |
|
Defines an annotation. |
|
Enables any element from the specified namespace(s) to
appear in the containing sequence or choice element. |
|
Enables any attribute from the specified namespace(s) to
appear in the containing complexType element or in the containing attributeGroup
element. |
|
Specifies information to be used by applications within an
annotation element. |
|
Declares an attribute. |
|
Groups a set of attribute declarations so that they can be
incorporated as a group for complex type definitions. |
|
Allows one and only one of the elements contained in the
selected group to be present within the containing element. |
|
Contains extensions or restrictions on a complex type that
contains mixed content or elements only. |
|
Specifies information to be read or used by users within
an annotation element. |
|
Declares an element. |
|
Contains extensions on simpleContent. This extends
a simple type or a complex type that has simple content by adding specified
attribute(s), attribute groups(s) or anyAttribute. |
|
Contains extensions on complexContent. |
|
Groups a set of element declarations so that they can be
incorporated as a group into complex type definitions. |
|
Defines constraints on a simpleContent definition. |
|
Defines constraints on a complexContent definition. |
|
Requires the elements in the group to appear in the
specified sequence within the containing element. |
|
Contains extensions or restrictions on a complexType
element with character data or a simpleType element as content and
contains no elements. |
The following are elements that create simple type definitions.
Element |
Description |
Defines an annotation. |
|
Specifies information to be used by applications within an
annotation element. |
|
Specifies information to be read or used by users within
an annotation element. |
|
Declares an element. |
|
Defines a collection of a single simpleType
definition. |
|
Defines constraints on a simpleType definition |
|
Defines a collection of multiple simpleType
definitions. |
The following table lists primitive XML schema data types, facets that can be applied to the data type, and a description of the data type.
Facets can only appear once in a type definition except for enumeration and pattern facets. Enumeration and pattern facets can have multiple entries and are grouped together.
Data Type |
Facets |
Description |
string |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents character strings. |
boolean |
pattern, whiteSpace |
Represents Boolean values, which are either true or
false. |
decimal |
enumeration, pattern, totalDigits, fractionDigits, minInclusive,
minExclusive, maxInclusive, maxExclusive, whiteSpace |
Represents arbitrary precision numbers. |
float |
pattern, enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents single-precision 32-bit floating-point numbers. |
double |
pattern, enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents double-precision 64-bit floating-point numbers. |
duration |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a duration of time. The
pattern for duration is |
dateTime |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a specific instance of time. The
pattern for dateTime is This
representation may be immediately followed by a "Z" to indicate
Coordinated Universal Time (UTC) or to indicate the time zone. For example,
the difference between the local time and Coordinated Universal Time,
immediately followed by a sign, + or -, followed by the difference from UTC
represented as |
time |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents an instance of time that recurs every day. The
pattern for time is |
date |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a calendar date. The
pattern for date is |
gYearMonth |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a specific Gregorian month in a specific
Gregorian year. A set of one-month long, nonperiodic instances. The
pattern for gYearMonth is |
gYear |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a Gregorian year. A set of one-year long,
nonperiodic instances. The
pattern for gYear is |
gMonthDay |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a specific Gregorian date that recurs,
specifically a day of the year such as the third of May. A gMonthDay
is the set of calendar dates. Specifically, it is a set of one-day long,
annually periodic instances. The
pattern for gMonthDay is |
gDay |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a Gregorian day that recurs, specifically a day
of the month such as the fifth day of the month. A gDay is the space
of a set of calendar dates. Specifically, it is a set of one-day long,
monthly periodic instances. The
pattern for gDay is |
gMonth |
enumeration, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, whiteSpace |
Represents a Gregorian month that recurs every year. A gMonth
is the space of a set of calendar months. Specifically, it is a set of
one-month long, yearly periodic instances. The
pattern for gMonth is |
hexBinary |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents arbitrary hex-encoded binary data. A hexBinary
is the set of finite-length sequences of binary octets. Each binary octet is
encoded as a character tuple, consisting of two hexadecimal digits
([0-9a-fA-F]) representing the octet code. |
base64Binary |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents Base64-encoded arbitrary binary data. A base64Binary
is the set of finite-length sequences of binary octets. |
anyURI |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents a URI as defined by RFC 2396. An anyURI
value can be absolute or relative, and may have an optional fragment identifier. |
QName |
length, enumeration, pattern, maxLength, minLength,
whiteSpace |
Represents a qualified name. A qualified name is composed
of a prefix and a local name separated by a colon. Both the prefix and local names
must be an NCName. The prefix must be associated with a namespace URI
reference, using a namespace declaration. |
NOTATION |
length, enumeration, pattern, maxLength, minLength,
whiteSpace |
Represents a NOTATION attribute type. A set of
QNames. |
The following table lists derived XML schema data types, facets that can be applied to the derived data type, and a description of the derived data type.
Data Type |
Facets |
Description |
normalizedString |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents white space normalized strings. This data type
is derived from string. |
token |
enumeration, pattern, length, minLength, maxLength,
whiteSpace |
Represents tokenized strings. This data type is derived
from normalizedString. |
language |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents natural language identifiers (defined by RFC
1766). This data type is derived from token. |
IDREFS |
length, maxLength, minLength, enumeration, whiteSpace |
Represents the IDREFS attribute type. Contains a
set of values of type IDREF. |
ENTITIES |
length, maxLength, minLength, enumeration, whiteSpace |
Represents the ENTITIES attribute type. Contains a set
of values of type ENTITY. |
NMTOKEN |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents the NMTOKEN attribute type. An NMTOKEN
is set of name characters (letters, digits, and other characters) in any
combination. Unlike Name and NCName, NMTOKEN has no
restrictions on the starting character. This data type is derived from token. |
NMTOKENS |
length, maxLength, minLength, enumeration, whiteSpace |
Represents the NMTOKENS attribute type. Contains a
set of values of type NMTOKEN. |
Name |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents names in XML. A Name is a token that
begins with a letter, underscore, or colon and continues with name characters
(letters, digits, and other characters). This data type is derived from token. |
NCName |
length, pattern, maxLength, minLength, enumeration,
whiteSpace |
Represents noncolonized names. This data type is the same
as Name, except it cannot begin with a colon. This data type is
derived from Name. |
ID |
length, enumeration, pattern, maxLength, minLength,
whiteSpace |
Represents the ID attribute type defined in the XML
1.0 Recommendation. The ID must be a no-colon-name (NCName) and must be
unique within an XML document. This data type is derived from NCName. |
IDREF |
length, enumeration, pattern, maxLength, minLength,
whiteSpace |
Represents a reference to an element that has an ID
attribute that matches the specified ID. An IDREF must be an
NCName and must be a value of an element or attribute of type ID within the
XML document. This data type is derived from NCName. |
ENTITY |
length, enumeration, pattern, maxLength, minLength,
whiteSpace |
Represents the ENTITY attribute type in XML 1.0
Recommendation. This is a reference to an unparsed entity with a name that
matches the specified name. An ENTITY must be an NCName and must be
declared in the schema as an unparsed entity name. This data type is derived
from NCName. |
integer |
enumeration, fractionDigits, pattern, minInclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents a sequence of decimal digits with an optional
leading sign (+ or -). This data type is derived from decimal. |
nonPositiveInteger |
enumeration, fractionDigits, pattern, minInclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer that is less than or equal to zero.
A nonPositiveInteger consists of a negative sign (-) and sequence of
decimal digits. This data type is derived from integer. |
negativeInteger |
enumeration, fractionDigits, pattern, minInclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer that is less than zero. Consists of
a negative sign (-) and sequence of decimal digits. This data type is derived
from nonPositiveInteger. |
long |
enumeration, fractionDigits, pattern, minInclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum value of
-9223372036854775808 and maximum of 9223372036854775807. This data type is
derived from integer. |
int |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum value of -2147483648
and maximum of 2147483647. This data type is derived from long. |
short |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum value of -32768 and
maximum of 32767. This data type is derived from int. |
byte |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum value of -128 and
maximum of 127. This data type is derived from short. |
nonNegativeInteger |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer that is greater than or equal to
zero. This data type is derived from integer. |
unsignedLong |
enumeration, fractionDigits, pattern, minInclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum of zero and maximum
of 18446744073709551615. This data type is derived from nonNegativeInteger. |
unsignedInt |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum of zero and maximum
of 4294967295. This data type is derived from unsignedLong. |
unsignedShort |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum of zero and maximum
of 65535. This data type is derived from unsignedInt. |
unsignedByte |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer with a minimum of zero and maximum
of 255. This data type is derived from unsignedShort. |
positiveInteger |
enumeration, fractionDigits, pattern, minInclusive, minExclusive,
maxInclusive, maxExclusive, totalDigits, whiteSpace |
Represents an integer that is greater than zero. This data
type is derived from nonNegativeInteger. |
o Simple Types
– used for an element that contains only document content
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="today" type=”xsd:date” /> <xsd:element name="user" type=”xsd:string” /> </xsd:schema> |
o Defining
simple types – take an existing simple type and apply a restriction using a facet
o Facets – rules
which are applied to a base type to change it in some way
Example 1: Defining myInteger, Range
10000-99999 <xsd:element name="workingInts" type=”myInteger” />
<xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType>
|
Example 2: Using the Enumeration
Facet <xsd:element name="USA" type=”USState” />
<xsd:simpleType name="USState"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="AK"/> <xsd:enumeration value="AL"/> <xsd:enumeration value="AR"/> <!-- and so on ... --> </xsd:restriction> </xsd:simpleType>
|
o Complex Types – defined using complexType element
o
May include
subelements, element content and attributes
Sequence - Requires the elements in the group to
appear in the specified sequence within the containing element.
Example 1: Defining the USAddress Type <xsd:complexType name="USAddress" > <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType>
|
Example 2: Defining PurchaseOrderType <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>
|
o XML Schemas can specify the types of attributes
o Declaring: <xsd:attribute name="orderDate" type="xsd:date"/>
o Used in above example means that all elements of PurchaseOrderType will support this attribute.
o References an existing definition
o e.g. <xsd:element ref="comment" minOccurs="0"/>
Compositors
o Sequence - Requires the elements in the group to appear in the
specified sequence within the containing element.
The root element is named "AAA", from
null namespace and contains one "BBB" element, followed by one
"CCC" element. Use the "sequence" pattern to specify
exact order of the elements. The attributes "minOccurs" and
"maxOccurs" are not necessary, because their default value is 1. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
Document <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> |
o Restriction
– limits the range of values
Here the value of the element "root" must
be and integer and less than 25.
|
Valid document: <root xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>24</root> |
o All – sets
up an unordered set of elements
The root element is named "AAA", from
null namespace and contains one "BBB" and one "CCC"
element. Their order is not important <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
document: <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> |
o Choice –
creates a set of optional elements – only one option may be selected
The root element is named "AAA", from
null namespace and contains either "BBB" or "CCC"
elements (but not both). Use the "choice" element. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
Document: <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> |
o List –
Now, we want the "root" element to
contain a list of three integers. We will define a general list (element
"list") of integers and then restrict it (element
"restriction") to have exact length (element "length") of
three items. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
Document: root xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>0 0 1</root> |
o Union
The element "root" is to be from range
0-100 or 300-400 (including the border values). We will make a union from two
intervals. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
Document: <root xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>50</root> |
o
Group
To define a group of common attributes, which will
be reused. The root element is named "root", it must contain the
"aaa" and "bbb" elements, and these elements must have
attributes "x" and "y". <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
|
Valid
Document: <root xsi:noNamespaceSchemaLocation="correct_0.xsd"
xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> |
Tutorial 1: http://www.w3schools.com/css/default.asp
Tutorial 2: http://www.tizag.com/cssT/
CSS – Cascading Style Sheets –
·
HTML technology used to
format XML
·
Levels: CSS1, CSS2
·
Style sheets are
collections of style rules for formatting XML content marked-up by tags
e.g.
title
{display: block; font-size: 36pt; font-weight: bold;
text-align: center;
text-decoration: underline}
Which
XML elements to format { how to format }
selector
{ property: value; property: value; ... }
e.g.from Holzner
Style sheet: ch08_02.css title {display: block; font-size: 36pt; font-weight: bold; text-align: center;
text-decoration: underline}{display: block; font-size: 16pt; text-align: center}{display:
block; font-size: 28pt; text-align: center; font-style: italic}{display: block;
margin-top: 10} |
XML document: <?xml version="1.0" standalone="yes"?> <?xml-stylesheet
type="text/css" href="ch08_02.css"?> <document> <title>The Discourses</title> <philosopher>Epictetus</philosopher> <book>Book Four</book> <paragraph> He is free who lives
as he wishes to live; who is neither subject to
compulsion nor to hindrance, nor to force; whose movements to
action are not impeded, whose desires attain their
purpose, and who does not fall into that which he would avoid. </paragraph> <paragraph> Who, then, chooses
to live in error? No man. Who chooses to live deceived,
liable to mistake, unjust, unrestrained, discontented,
mean? No man. </paragraph> <paragraph> Not one then of
the bad lives as he wishes; nor is he, then, free. And
who chooses to live in sorrow, fear, envy, pity, desiring and
failing in his desires, attempting to avoid something
and falling into it? Not one. </paragraph> <paragraph> Do we then find
any of the bad free from sorrow, free from fear, who does not
fall into that which he would avoid, and does not obtain
that which he wishes? Not one; nor then do we find any bad
man free. </paragraph> </document> |
|
Background Color:
<html> <head> <style type="text/css"> body {background-color: yellow} h1 {background-color: #00ff00} h2 {background-color: transparent} p {background-color: rgb(250,0,255)} </style> </head> <body> <h1>This is header 1</h1> <h2>This is header 2</h2> <p>This is a paragraph</p> </body> </html> |
Text: Color
<html> <head> <style
type="text/css"> h1
{color: #00ff00} h2
{color: #dda0dd} p
{color: rgb(0,0,255)} </style>
</head> <body> <h1>This
is header 1</h1> <h2>This
is header 2</h2> <p>This
is a paragraph</p> </body> </html> |
Text: Alignment
<html> <head> <style
type="text/css"> h1
{text-align: center} h2
{text-align: left} h3
{text-align: right} </style>
</head> <body> <h1>This
is header 1</h1> <h2>This
is header 2</h2> <h3>This
is header 3</h3> </body> </html> |
Font: Style
<html> <head> <style
type="text/css"> h3
{font-family: times} p
{font-family: courier} p.sansserif
{font-family: sans-serif} </style> </head> <body> <h3>This
is header 3</h3> <p> This
is a paragraph</p> <p
class="sansserif"> This
is a paragraph</p> </body> </html>
|
Font: Size
<html> <head> <style
type="text/css"> h1
{font-size: 150%} h2
{font-size: 20px} p
{font-size: x-large} </style> </head> <body> <h1>This
is header 1</h1> <h2>This
is header 2</h2> <p>This
is a paragraph</p> </body> </html> |
CSS Classes can give HTML multiple renderings
<html> <head> <style> p.first
{ background-color: gray; color: blue;}
p.second
{ background-color: red; } p.third
{ background: purple; color:
yellow; } </style> </head> <body> <h2>CSS
Classes</h2> <p
class="first">This is the p.first paragraph</p> <p
class="second">This is the p.second paragraph</p> <p
class="third">This is the p.third paragraph</p> </body> </html> |
Borders:
<html> <head> <style> p.solid
{border-style: solid; } p.double
{border-style: double; } p.groove
{border-style: groove; } p.dotted
{border-style: dotted; } p.dashed
{border-style: dashed; } p.inset
{border-style: inset; } p.outset
{border-style: outset; } p.ridge
{border-style: ridge; } p.hidden
{border-style: hidden; } </style> </head> <body> <p
class="solid">This is the solid style</p> <p
class="double">This is the double style</p> <p
class="groove">This is the groove style</p> <p
class="dotted">This is the dotted style</p> <p
class="dashed">This is the dashed style</p> <p
class="inset">This is the inset style</p> <p
class="outset">This is the outset style</p> <p
class="ridge">This is the ridge style</p> <p
class="hidden">This is the hidden style</p> </body> </html>
|
Padding : Change the default padding that appears
inside various HTML elements ( paragraphs, tables, etc ).
<html> <head> <style
type="text/css"> td
{padding: 1.5cm} td.twovalues
{padding: 0.5cm 2.5cm} </style> </head> <body> <table
border="1"> <tr> <td> This
is a tablecell with padding on each side </td> </tr> </table> <br> <table
border="1"> <tr> <td
class="twovalues"> This
is a tablecell with padding on each side. The top and bottom padding have the
same value (0.5cm), while the left and right padding have another value (2.5) </td> </tr> </table> </body> </html>
|
Margins: define the space around elements.
<html> <head> <style
type="text/css"> p.margin
{margin: 2cm 4cm 3cm 4cm} </style> </head> <body> <p> This
is a paragraph </p> <p
class="margin"> This
is a paragraph with margins </p> <p> This
is a paragraph </p> </body> </html> |