1
|
- Carol Wolf
- Computer Science Department
|
2
|
- XML stands for eXtensible Markup Language.
- A markup language is used to provide information about a document.
- Tags are added to the document to provide the extra information.
- HTML tags tell a browser how to display the document.
- XML tags give a reader some idea what some of the data means.
|
3
|
- XML documents are used to transfer data from one place to another often
over the Internet.
- XML subsets are designed for particular applications.
- One is RSS (Rich Site Summary or Really Simple Syndication ). It is used to send breaking news
bulletins from one web site to another.
- A number of fields have their own subsets. These include chemistry, mathematics,
and books publishing.
- Most of these subsets are registered with the W3Consortium and are
available for anyones use.
|
4
|
- XML is text (Unicode) based.
- Takes up less space.
- Can be transmitted efficiently.
- One XML document can be displayed differently in different media.
- Html, video, CD, DVD,
- You only have to change the XML document in order to change all the
rest.
- XML documents can be modularized.
Parts can be reused.
|
5
|
- <html>
- <head><title>Example</title></head.
- <body>
- <h1>This is an example of a page.</h1>
- <h2>Some information goes here.</h2>
- </body>
- </html>
|
6
|
- <?xml version=1.0/>
- <address>
- <name>Alice Lee</name>
- <email>alee@aol.com</email>
- <phone>212-346-1234</phone>
- <birthday>1985-03-22</birthday>
- </address>
|
7
|
- HTML tags have a fixed meaning and browsers know what it is.
- XML tags are different for different applications, and users know what
they mean.
- HTML tags are used for display.
- XML tags are used to describe documents and data.
|
8
|
- Tags are enclosed in angle brackets.
- Tags come in pairs with start-tags and end-tags.
- Tags must be properly nested.
- <name><email>
</name></email> is not allowed.
- <name><email>
</email><name> is.
- Tags that do not have end-tags must be terminated by a /.
- <br /> is an html example.
|
9
|
- Tags are case sensitive.
- <address> is not the same as <Address>
- XML in any combination of cases is not allowed as part of a tag.
- Tags may not contain < or &.
- Tags follow Java naming conventions, except that a single colon and
other characters are allowed.
They must begin with a letter and may not contain white space.
- Documents must have a single root tag that begins the document.
|
10
|
- XML (like Java) uses Unicode to encode characters.
- Unicode comes in many flavors.
The most common one used in the West is UTF-8.
- UTF-8 is a variable length code.
Characters are encoded in 1 byte, 2 bytes, or 4 bytes.
- The first 128 characters in Unicode are ASCII.
- In UTF-8, the numbers between 128 and 255 code for some of the more
common characters used in western Europe, such as ใ, แ, ๅ, or ็.
- Two byte codes are used for some characters not listed in the first 256
and some Asian ideographs.
- Four byte codes can handle any ideographs that are left.
- Those using non-western languages should investigate other versions of
Unicode.
|
11
|
- An XML document is said to be well-formed if it follows all the rules.
- An XML parser is used to check that all the rules have been obeyed.
- Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML
parsers.
- Parsers are also available for free download over the Internet. One is Xerces, from the Apache
open-source project.
- Java 1.4 also supports an open-source parser.
|
12
|
- <?xml version=1.0/>
- <address>
- <name>Alice Lee</name>
- <email>alee@aol.com</email>
- <phone>212-346-1234</phone>
- <birthday>1985-03-22</birthday>
- </address>
- Markup for the data aids understanding of its purpose.
- A flat text file is not nearly so clear.
- Alice Lee
- alee@aol.com
- 212-346-1234
- 1985-03-22
- The last line looks like a date, but what is it for?
|
13
|
- <?xml version = 1.0 ?>
- <address>
- <name>
-
<first>Alice</first>
- <last>Lee</last>
- </name>
-
<email>alee@aol.com</email>
-
<phone>123-45-6789</phone>
- <birthday>
-
<year>1983</year>
-
<month>07</month>
<day>15</day>
- </birthday>
- </address>
|
14
|
|
15
|
- An XML document has a single root node.
- The tree is a general ordered tree.
- A parent node may have any number of children.
- Child nodes are ordered, and may have siblings.
- Preorder traversals are usually used for getting information out of the
tree.
|
16
|
- A well-formed document has a tree structure and obeys all the XML rules.
- A particular application may add more rules in either a DTD (document
type definition) or in a schema.
- Many specialized DTDs and schemas have been created to describe
particular areas.
- These range from disseminating news bulletins (RSS) to chemical
formulas.
- DTDs were developed first, so they are not as comprehensive as schema.
|
17
|
- A DTD describes the tree structure of a document and something about its
data.
- There are two data types, PCDATA and CDATA.
- PCDATA is parsed character data.
- CDATA is character data, not usually parsed.
- A DTD determines how many times a node may appear, and how child nodes
are ordered.
|
18
|
- <!ELEMENT address (name, email, phone, birthday)>
- <!ELEMENT name (first, last)>
- <!ELEMENT first (#PCDATA)>
- <!ELEMENT last (#PCDATA)>
- <!ELEMENT email (#PCDATA)>
- <!ELEMENT phone (#PCDATA)>
- <!ELEMENT birthday (year, month, day)>
- <!ELEMENT year (#PCDATA)>
- <!ELEMENT month (#PCDATA)>
- <!ELEMENT day (#PCDATA)>
|
19
|
- Schemas are themselves XML documents.
- They were standardized after DTDs and provide more information about the
document.
- They have a number of data types including string, decimal, integer,
boolean, date, and time.
- They divide elements into simple and complex types.
- They also determine the tree structure and how many children a node may
have.
|
20
|
- <?xml version="1.0" encoding="ISO-8859-1" ?>
- <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
- <xs:element name="address">
- <xs:complexType>
- <xs:sequence>
- <xs:element name="name" type="xs:string"/>
- <xs:element name="email" type="xs:string"/>
- <xs:element name="phone" type="xs:string"/>
- <xs:element name="birthday"
type="xs:date"/>
- </xs:sequence>
- </xs:complexType>
- </xs:element>
- </xs:schema>
|
21
|
- <?xml version="1.0" encoding="ISO-8859-1" ?>
- ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters.
- <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
- www.w3.org/2001/XMLSchema contains the schema standards.
- <xs:element name="address">
- <xs:complexType>
- This states that address is a complex type element.
- <xs:sequence>
- This states that the following elements form a sequence and must come in
the order shown.
- <xs:element name="name" type="xs:string"/>
- This says that the element, name, must be a string.
- <xs:element name="birthday" type="xs:date"/>
- This states that the element, birthday, is a date. Dates are always of the form
yyyy-mm-dd.
|
22
|
- XSLT is used to transform one xml document into another, often an html
document.
- The Transform classes are now part of Java 1.4.
- A program is used that takes as input one xml document and produces as
output another.
- If the resulting document is in html, it can be viewed by a web browser.
- This is a good way to display xml data.
|
23
|
- <?xml version="1.0" encoding="ISO-8859-1"?>
- <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- <xsl:template match="address">
- <html><head><title>Address
Book</title></head>
- <body>
- <xsl:value-of select="name"/>
- <br/><xsl:value-of select="email"/>
- <br/><xsl:value-of select="phone"/>
- <br/><xsl:value-of select="birthday"/>
- </body>
- </html>
- </xsl:template>
- </xsl:stylesheet>
|
24
|
- Alice Lee
alee@aol.com
123-45-6789
1983-7-15
|
25
|
- There are two principal models for parsers.
- SAX Simple API for XML
- Uses a call-back method
- Similar to javax listeners
- DOM Document Object Model
- Creates a parse tree
- Requires a tree traversal
|
26
|
- Elliotte Rusty Harold, Processing XML with Java, Addison Wesley, 2002.
- Elliotte Rusty Harold and Scott Means, XML Programming, OReilly &
Associates, Inc., 2002.
- W3Schools Online Web Tutorials, http://www.w3schools.com.
|