CS835 - Data and Document Representation & Processing

Lecture 8 - Processing and Delivery XML III: DOM, SAX

 

DOM

http://xml.coverpages.org/dom.html

http://www.w3schools.com/dom/default.asp

 

The Node Interface

·       Program called an XML parser can be used to load an XML document into the memory of your computer

·       Its information can be retrieved and manipulated by accessing the DOM.

·       DOM represents a tree view of the XML document

o      documentElement is the top-level of the tree

o      it has one or many childNodes that represent the branches of the tree.

·       A Node Interface Model is used to access the individual elements in the node tree

o      e.g, the childNodes property of the documentElement can be accessed with a for/each construct to enumerate each individual node.

o      Microsoft XML parser supports all the necessary functions to:

§       traverse the node tree

§       access the nodes and their attribute values

§       insert and delete nodes

§       convert the node tree back to XML

Most commonly used node types supported by the Microsoft XML parser:

Node Type

Example

Document type

<!DOCTYPE food SYSTEM "food.dtd">

Processing instruction

<?xml version="1.0"?>

Element

<drink type="beer">Carlsberg</drink>

Attribute

type="beer"

Text

Carlsberg

 

The Microsoft XML Parser

·       XML parser required to read and update - create and manipulate - an XML document. 

·       The Microsoft XMLDOM parser features a programming model that:

·        Supports JavaScript, VBScript, Perl, VB, Java, C++ and more

·        Supports W3C XML 1.0 and XML DOM

·        Supports DTD and validation

JavaScript in IE 5.0 can create an XML document object with the following code:

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")

 

Loading an XML file into the parser

The following code loads an existing XML document (note.xml) into the XML parser:

<script type="text/javascript">
 
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
// ....... processing the document goes here
 
</script>

·       The first line of the script creates an instance of the Microsoft XML parser.

·       The third line tells the parser to load an XML document called note.xml.

·       The second line assures that the parser will halt execution until the document is fully loaded.

JUST TRY IT

 

Loading pure XML text into the parser

The following code loads a text string into the XML parser:

<script type="text/javascript">
 
var text="<note>"
text=text+"<to>Tove</to><from>Jani</from>"
text=text+"<heading>Reminder</heading>"
text=text+"<body>Don't forget me this weekend!</body>"
text=text+"</note>"
 
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.loadXML(text)
// ....... processing the document goes here
 
</script>

·       the "loadXML" method (instead of the "load" method) is used to load a text string.

JUST TRY IT

 

The parseError Object

If you try to open an XML document, the XML Parser might generate an error. By accessing the parseError object, the exact error code, the error text, and even the line that caused the error can be retrieved.

The parseError object is not a part of the W3C DOM standard.

 

File Error

With this code we can try to load a non existing file, and display some of its error properties:

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("ksdjf.xml")
 
document.write("<br>Error Code: ")
document.write(xmlDoc.parseError.errorCode)
document.write("<br>Error Reason: ")
document.write(xmlDoc.parseError.reason)
document.write("<br>Error Line: ")
document.write(xmlDoc.parseError.line)

Try it Yourself

 

XML Error

With this code we let the parser load an XML document that is not well formed.

(You can study more about Well Formed and Valid XML at our XML School)

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note_error.xml")
 
document.write("<br>Error Code: ")
document.write(xmlDoc.parseError.errorCode)
document.write("<br>Error Reason: ")
document.write(xmlDoc.parseError.reason)
document.write("<br>Error Line: ")
document.write(xmlDoc.parseError.line)

Try it Yourself or just look at the XML file

 

The parseError Properties

Property

Description

errorCode

Returns a long integer error code

reason

Returns a string explaining the reason for the error

line

Returns a long integer representing the line number for the error

linePos

Returns a long integer representing the line position for the error

srcText

Returns a string containing the line that caused the error

url

Returns the url pointing the loaded document

filePos

Returns a long integer file position of the e

 

Accessing the XML DOM

Providing HTML content from XML files

·       By using an XML parser inside the browser, an HTML page can be constructed as a static document, with an embedded JavaScript to provide dynamic data.

The following JavaScript reads XML data from an XML document and writes the XML data into (waiting) HTML elements.

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
 
nodes = xmlDoc.documentElement.childNodes
 
to.innerText = nodes.item(0).text
from.innerText = nodes.item(1).text
header.innerText = nodes.item(2).text
body.innerText = nodes.item(3).text

JUST TRY IT

 

Accessing XML elements by name

·       Using names is the preferred way to extract XML elements from an XML document.

The following JavaScript reads XML data from an XML document and writes the XML data into (waiting) HTML elements.

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
 
to.innerText=
xmlDoc.getElementsByTagName("to").item(0).text
from.innerText=
xmlDoc.getElementsByTagName("from").item(0).text
header.innerText=
xmlDoc.getElementsByTagName("heading").item(0).text
body.innerText=
xmlDoc.getElementsByTagName("body").item(0).text

JUST TRY IT

 

·       Important: To extract the text (Jani) from an element like this: <from>Jani</from>,

·       the syntax is: getElementsByTagName("from").item(0).text

·       NOT like this: getElementsByTagName("from").text

·       Must address the child node like getElementsByTagName("from").item(0).text is that the result returned from getElementsByTagName is an array of nodes, containing all nodes found within the XML document with the specified tag name (in this case "from").

The HttpRequest Object

The HttpRequest object provides client-side communication with a server.

Examples

readyState
How to return the state of the document. This property changes as the document is being loaded.

responseText
How to return the request as a string.

status
How to return the status of the operation, as a code.

statusText
How to return the status of the operation, as a string.

 

The HttpRequest object

With the httpRequest object you can send a request from the client to the server.

If you are using JavaScript in IE 5.0, you can create the httpRequest object with the following code:

var xmlHTTP = new ActiveXObject("Microsoft.XMLHTTP")

The httpRequest object is not a part of the W3C DOM standard.

 

Get XML

How to get an xml file from the server using the httpRequest object (works only in IE):

var xmlHttp = new ActiveXObject("Microsoft.XMLHTTP")
xmlHttp.open("GET", "note.xml", false)
xmlHttp.send()
xmlDoc=xmlHttp.responseText

Try it Yourself

Netscape compatible code:

xmlHttp = new XMLHttpRequest();
xmlHttp.open("GET", "note.xml", false);
xmlHttp.send(null);
xmlDoc = xmlHttp.responseText;

Try it Yourself

 

Send XML

You can also send an xml document to an ASP page on the server, analyze the request, and send back the result.

var xmlHttp = new ActiveXObject("Microsoft.XMLHTTP")
xmlHttp.open("POST", "demo_dom_http.asp", false)
xmlHttp.send(xmlDoc)
document.write(xmlHttp.responseText)

The ASP page, written in VBScript:

set xmldoc = Server.CreateObject("Microsoft.XMLDOM")
xmldoc.async=false
xmldoc.load(request)
 
for each x in xmldoc.documentElement.childNodes
   if x.NodeName = "to" then name=x.text
next
 
response.write(name)

You send the result back to the client using the response.write property.

Try it Yourself

 

Important Note

·       The Microsoft XMLHTTP object can only be run in the BROWSER.

·       SERVER code that attempts to use the XMLHTTP to communicate with other Web servers, may function incorrectly or perform poorly.

 

The httpRequest Properties

Property

Description

readyState

Returns the state of the document

responseBody

Returns the response as an array of unsigned bytes

responseStream

Returns the response as an IStream

responseText

Returns the response as a string

responseXML

Returns the response as an xml document

status

Returns the status code as a number

statusText

Returns the status as a string

The httpRequest Methods

Property

Description

abort()

Cancel the current http request

getAllResponseHeaders()

Returns the value of the http headers

getResponseHeader(headerName)

Returns the value of one specified http header

open(method, url, async, userid, password)

Opens http request, and specifies the information

send()

Send the http request to the server

setRequestHeader(headerName,headerValue)

Specifies the name of a http header

XML DOM NodeTypes

Example: note_special.xml

<?xml version="1.0" ?>

- <note>

  This is my text node

- <![CDATA[

 This is my CDATASection node 

  ]]>

- <!--

 This is my Comment node 

  -->

  </note>

 

 

Examples

NodeType
We traverse the file note_special.xml to get the nodeType of the nodes.

NodeName
We traverse the file note_special.xml to get the nodeName of the same nodes.

NodeValue
We traverse the file note_special.xml to get the nodeValue of the same nodes.

NodeTypeString
In IE5, you can also get the nodeType as a string, with the .nodeTypeString property.

 

Node Types

·       Nodes are separated into different types

·       Below is a list of the types and what the .nodeName and the .nodeValue properties return.

·       Internet Explorer 5 uses the .nodeTypeString property to return the nodeType as a string.

 

nodeType

nodeTypeString

nodeName

nodeValue

1

element

tagName

null

2

attribute

name

value

3

text

#text

content of node

4

cdatasection

#cdata-section

content of node

5

entityreference

entity reference name

null

6

entity

entity name

null

7

processinginstruction

target

content of node

8

comment

#comment

comment text

9

document

#document

null

10

documenttype

doctype name

null

11

documentfragment

#document fragment

null

12

notation

notation name

null

 

NodeTypes - Named Constants

NodeType

Named Constant

1

ELEMENT_NODE

2

ATTRIBUTE_NODE

3

TEXT_NODE

4

CDATA_SECTION_NODE

5

ENTITY_REFERENCE_NODE

6

ENTITY_NODE

7

PROCESSING_INSTRUCTION_NODE

8

COMMENT_NODE

9

DOCUMENT_NODE

10

DOCUMENT_TYPE_NODE

11

DOCUMENT_FRAGMENT_NODE

12

NOTATION_NODE

 

XML DOM - The Node object

Example: note.xml

<?xml version="1.0" ?>

- <note time="12:03:46">

  <to>Tove</to>

  <from>Jani</from>

  <heading>Reminder</heading>

  <body>Don't forget me this weekend!</body>

  </note>

 

Examples

nodeName
How to return the name of a node.

nodeValue
How to return the value of a node.

nextSibling
How to return the name of the nextSibling node.

Text
In IE5 you can return the text from a node and all its child nodes.

xml
In IE5 you can return the xml from a node and all its child nodes.

appendChild
How to create an element node with a text node and then append it as a child node.

insertBefore
How to create a text node and then insert it before a specified node.

 

The Node Object

·       The node object represents a node in the node tree.

·       A node can be an element node, a text node, or any other of the node types explained in the DOM nodeType chapter.

·       All of these node types have properties and methods.

·       The general properties and methods for all node types are listed below:

W3C Properties

Property

Description

attributes

Returns a NamedNodeMap containing all attributes for this node

childNodes

Returns a NodeList containing all the child nodes for this node

firstChild

Returns the first child node for this node

lastChild

Returns the last child node for this node

nextSibling

Returns the next sibling node. Two nodes are siblings if they have the same parent node

nodeName

Returns the nodeName, depending on the type

nodeType

Returns the nodeType as a number

nodeValue

Returns, or sets, the value of this node, depending on the type

ownerDocument

Returns the root node of the document

parentNode

Returns the parent node for this node

previousSibling

Returns the previous sibling node. Two nodes are siblings if they have the same parent node

W3C Methods

Method

Description

appendChild(newChild)

Appends the node newChild at the end of the child nodes for this node

cloneNode(boolean)

Returns an exact clone of this node. If the boolean value is set to true, the cloned node contains all the child nodes as well

hasChildNodes()

Returns true if this node has any child nodes

insertBefore(newNode,refNode)

Inserts a new node, newNode, before the existing node, refNode

removeChild(nodeName)

Removes the specified node, nodeName

replaceChild(newNode,oldNode)

Replaces the oldNode, with the newNode

 

IE5 Node Properties and Methods

The node object has some properties and methods that are defined in Internet Explorer 5 only:

IE5 Properties

Property

Description

basename

Returns the nodeName without the namespaces

dataType

Returns, or sets, the dataType for this node

definition

 

nodeTypeString

Returns the nodeType as a string

nodeTypedValue

 

specified

Returns whether the nodeValue is specified in the DTD/Schema or not

text

Returns, or sets, the text for this node and all its child nodes

xml

Returns, or sets, the xml for this node and all its child nodes

IE5 Methods

Method

Description

selectNodes(pattern)

 

selectSingleNode(pattern)

 

transformNode(stylesheet)

Processes the node and its childNodes with the specified XSL stylesheet, and returns the result

 

XML DOM - The NodeList object

Examples

length
How to return the number of nodes in a nodeList.

item
How to return a specific node in the nodeList.

nextNode()
IE5 allows you to return the next node in the nodeList.

reset()
IE5 allows you to reset the pointer to the first node in the nodeList.

 

The NodeList object

·       The nodeList object represents a node and its child nodes as a node tree.

·       The properties and methods of the nodeList object are listed below:

W3C Properties

Property

Description

length

Returns the number of nodes in a nodeList

W3C Methods

Method

Description

item

Returns a specific node in the nodeList

 

 

IE5 NodeList Methods

The nodeList object has some methods that are defined in Internet Explorer 5 only:

IE5 Properties

Property

Description

nextNode()

Returns the next object in the node list

reset()

Resets the pointer to the first node in the nodeList

XML DOM - The Document Object

 

Examples

documentElement
How to return the node name of the root element.

createCDATASection
How to create a CDATA node and then append it to the nodeList.

createComment
How to create a comment node and then append it to the nodeList.

createElement
How to create an element and then append it to the nodeList.

createTextNode
How to create a text node then append the text node to the nodeList.

getElementsByTagName
How to return the value of a specified node.

 

The Document object

·       The document object is the root element in the node tree.

·       All nodes in the node tree are childNodes of the document element.

·       The document element is required in all XML documents.

·       The properties and methods of the Document object are listed below:

W3C Properties

Property

Description

documentElement

Returns the root element of the document

doctype

Returns the DTD or Schema for the document. 

implementation

Returns the implementation object for this particular document

W3C Methods

Method

Description

createAttribute(attributeName)

Creates an attribute node with the specified attribute name

createCDATASection(text)

Creates a CDATASection, containing the specified text

createComment(text)

Creates a comment node, containing the specified text

createDocumentFragment()

Creates an empty documentFragment object

createElement(tagName)

Creates an element with the specified tagName

createEntityReference(referenceName)

Creates an entityReference with the specified referenceName

createProcessingInstruction(target,text)

Creates a processingInstruction node, containing the specified target and text

createTextNode(text)

Creates a text node, containing the specified text

getElementsByTagName(tagName)

Returns the specified node, and all its child nodes, as a nodeList

 

XML DOM - The Element Object

Examples

tagName
How to return the tag name of a node.

getElementsByTagName
How to return the value of a specified node.

getAttribute
How to return an attribute's value.

setAttribute
How to change an attribute's value.

setAttribute 2
How to set a new attribute and its value.

 

The Element object

The element object represents the element nodes in the document. If the element node contains text, this text is represented in a text node. The properties and methods of the Element object are listed below:

W3C Properties

Property

Description

tagName

Returns, or sets the name of the node

W3C Methods

Method

Description

getAttribute(attributeName)

Returns the value of the specified attribute

getAttributeNode(attributeName)

Returns the specified attribute node as an object

getElementsByTagName(tagName)

Returns the specified node, and all its child nodes, as a nodeList

normalize()

Puts the text nodes for this element, and its child nodes, into one text node, returns nothing

removeAttribute(attributeName)

Removes the specified attribute's value. If the attribute has a default value this value is inserted

removeAttributeNode(attributeNode)

Removes the specified attribute node. If the attribute node has a default value, this attribute is inserted

setAttribute(attributeName, attributeValue)

Inserts a new attribute

setAttributeNode(attributeNodeName)

Inserts a new attribute node

 

XML DOM - The Attr Object

Examples

name
How to return the name of an attribute.

value
How to return the value of an attribute.

specified
Returns True if the value is set in the document, False if the value is a default value in the DTD/Schema.

 

The Attr object

·       The attr object returns an attribute of an element object as an attribute node.

·       The attr object has the same properties and methods as nodes in general.

·       The properties of the Attr object are listed below:

W3C Properties

Property

Description

name

Sets or returns the name of the attribute

specified

Returns a boolean value indicating if the node's value is set in the document or not

value

Sets or returns the value of the attribute

 

XML DOM - The Text Object

Examples

splitText
The SplitText method splits a text at the given character and returns the rest of the text.

createTextNode
How to create a text node.

 

The Text object

The text object represents the text inside an element as a node. The method of the text object is listed below:

W3C Method

Method

Description

splitText(number)

Splits the text at the specified character, and returns the rest of the text

 

XML DOM - The CDATASection Object

Examples

createCDATASection
How to create a CDATASection node.

 

The CDATASection object

·       The CDATASection object represents the CDATASection nodes in a document.

·       The CDATASection node is used to escape parts of text which normally would be recognized as markup.

 

XML DOM - The Comment Object

Examples

createComment
How to create a comment node.

 

The Comment object

·       The comment object represents the comment nodes in a document.

·       The comment nodes have no nodeName, but their nodeValue is the comment text.

 

XML DOM Examples

http://www.w3schools.com/dom/dom_examples.asp

 

 

 

SAX (Simple API for XML) http://developerlife.com/saxtutorial1/default.htm

 

Source code file

 

o      • processing instructions, comments, entity declarations, are encountered in your document.

 

 

Three steps to SAX

  1. Create a own custom Java object model for the data.
  2. Create a SAX document handler to create instances of the custom object models from the information stored in the XML document.
  3. Use the SAX parser to create instances of thecustom object model based on the data stored in the XML documents.

 

<?xml version = “1.0”?>

     <addressbook>

          <person>

               <lastname>Idris</lastname>

               <firstname>Nazmul</firstname>

               <company>The Bean Factory, LLC.</company>

               <email>xml@beanfactory.com</email>

          </person>

     </addressbook>

 

 

Step1: Creating a custom object model

·       A simple Java object model to represent the information in my address book XML document.

·       Two classes, - an AddressBook class and a Person class.

·       Object model maps from the elements into classes.

·       Classes:

o      AddressBook class:

o      is a container of Person objects.

o      Is a simple adapter over the java.util.List interface.

o      has methods that add Person objects, get Person objects, and find out how many Person objects are in the AddressBook.

o      The addressbook element maps to the AddressBook class.

o      Person class holds 4 String objects: the last name, first name, email,company name.

o      This information is embedded within the <person> tag.

o      The person element maps into the Person class.

o      The firstname, lastname, company and email elements map into String class.

·        Listing of the Person class:

public class Person{

// Data Members

String fname, lname, company, email;

 

// accessor methods

     public String getCompany(){return company;}

     public String getEmail(){return email;}

     public String getFirstName(){return fname;}

     public String getLastName(){return lname;}

 

// mutator methods

     public void setLastName( String s ){lname = s;}

     public void setFirstName( String s ){fname = s;}

     public void setCompany( String s ){company = s;}

     public void setEmail( String s ){email = s;}

 

// toXML() method

     public String toXML(){

     StringBuffer sb = new StringBuffer();

 

     sb.append( "<PERSON>\n" );

     sb.append( "\t<LASTNAME>"+lname+"</LASTNAME>\n" );

     sb.append( "\t<FIRSTNAME>"+fname+"</FIRSTNAME>\n" );

     sb.append( "\t<COMPANY>"+company+"</COMPANY>\n" );

     sb.append( "\t<EMAIL>"+email+"</EMAIL>\n" );

     sb.append( "</PERSON>\n" );

 

return sb.toString();

}}

 

Note the toXML() method.

·       This method returns a String that contains the XML representation of a Person object.

·       This method can be used to save Person objects to an XML file (or other kind of XML persistence/storage engine).

 

·       The AddressBook class has a toXML() method, and that method uses the Person class’s toXML() method .

·        Listing of the AddressBook class:

 

public class AddressBook{

 

// Data Members

     List persons = new java.util.ArrayList();

 

// mutator method

     public void addPerson( Person p ){persons.add( p );}

 

// accessor methods

     public int getSize(){ return persons.size();}

     public Person getPerson( int i ){

     return (Person)persons.get( i );}

 

// toXML method

     public String toXML(){

     StringBuffer sb = new StringBuffer();

 

     sb.append( "<?xml version=\"1.0\"?>\n" );

     sb.append( "<ADDRESSBOOK>\n\n" );

 

     for(int i=0; i<persons.size(); i++) {

          sb.append( getPerson(i).toXML() );

          sb.append( "\n" );

     }

 

     sb.append( "</ADDRESSBOOK>" );

 

return sb.toString();

}}

 

 

Step2: Creating a SAX parser

Code to create a SAX parser:

import java.net.*;

import java.io.*;

import org.xml.sax.*;

...

try{

//create an InputSource from the XML document source

 

     InputStreamReader isr = new InputStreamReader(

     new URL(“http://host/AddressBook.xml”).openStream();

     //new FileReader( new File( “AddressBook.xml” ))

     );

     InputSource is = new InputSource( isr );

 

//create an documenthandler to create obj model

     DocumentHandler handler = //new YourHandler();

 

//create a SAX parser using SAX interfaces and classes

     String parserClassName = “com.sun.xml.parser.Parser”;

     org.xml.sax.Parser.parser =      org.xml.sax.helpers.ParserFactory.

     makeParser( parserClassName );

 

//create document handler to do something useful

//with the XML document being parsed by the parser.

 

     parser.setDocumentHandler( handler );

     parser.parse( is );

}

 

catch(Throwable t){

     System.out.println( t );

     t.printStackTrace();

}

 

• Replace the value of the parserClassName string with the class name of the SAX Parser class of your choice.

 

 

Step3: Creating a DocumentHandler

 

 

·       Figure shows the sequence of method calls that the SAX parser makes on the DocumentHandler interface implementation class.

 

 

org.xml.sax.Parser.parser = //create a SAX parser

DocumentHandler handler = //instantiate your implementation

parser.setDocumentHandler( handler );

 

 

ErrorHandler interface

·       The ErrorHandler interface methods - custom error handling behaviors in response to erroneous conditions in source XML document.

·       SAX parser is not required to do any error handling

·       Implemention of this interface not required.

·       Default implementation of the HandlerBase class is already provided which throws a SAXException

·       The methods in the ErrorHandler interface are:

o      error(SAXParseException e)

o      fatalError(SAXParseException e)

o      warning(SAXParseException e)

·       If custom error handling is to be performed - must tell the SAX parser to use your class as an error handler.

·        The following code snippet does this:

org.xml.sax.Parser.parser = //create a SAX parser

ErrorHandler handler = //instantiate your implementation

parser.setErrorHandler( handler );

 

DTDHandler interface

·       The DTDHandler interface methods deal with entity and notation declaration sections in the DTDs of the XML documents.

·       SAX parser is not required to do anything with this notation and entity declarations

·       An interface must be used if you wish to do something with these things then you have to write a class that implements the DTDHandler interface and tell the SAX parser to use it.

·       The DTDHandler interface has the following methods:

o      notationDecl( String name , String publicId, String systemId )

o      unparsedEntityDecl( (String name, String publicId, String systemId, String notationName).

·        The following code snippet does this:

org.xml.sax.Parser.parser = //create a SAX parser

DTDHandler handler = //instantiate your DTDHandler implementation

parser.setDTDHandler( handler );

 

EntityResolver interface

·       The EntityResolver interface allows creation of customized InputSource objects for external entities.

·       External entities could be DTDs that are located by using a URI to a remote resource or any other resource that is external to your local system.

·       EntityResolver interface can be used to creatw an InputSource given an external entity.

·       The EntityResolver interface only has one method: InputSource resolveEntity (String publicId, String systemId).

The following code snippet does this:

org.xml.sax.Parser.parser = //create a SAX parser

EntityResolver handler = //instantiate your implementation

parser.setDTDHandler( handler );

 

HandlerBase class

 

Building the actual object model (using DocumentHandler)

·       DocumentHandler is used to create Person objects that are inserted into one AddressBook object

·       The HandlerBase class is extended by using the three methods that are available in the DocumentHandler interface: startElement(...), endElement(...) and characters(...).

·       The address book contains person elements, which in turn contain name and email elements.

·       The DocumentHandler implementation must create one AddressBook object and many Person objects

·       The SAX parser reports this information to the DocumentHandler by making a sequence of method calls

·       The parser must remember “current position” in the XML document to create Person objects (and add them to the AddressBook object).

o      Use a String to remember the name of the last tag that was encountered.

·       Every time an startElement( “person” , .. ) method is called on the handler (by the SAX parser), a new Person object (and save a reference to it) is created.

o      Thereafter, when the endElement( “person” , .. ) method is called by the handler, the current  Person object is added to the AddressBook object.

·       In between the startElement( “person” , ... ) and endElement( “person” , ... ) method calls there are 4 more startElement(...) and endElement(...) method calls for the “lastname”, “firstname”, “email”and “company” tags.

 

·       Can think of this process as a document -> event -> method -> object model mapping.

·       Must provide the document and the object model and the method handler.

·       SAX only takes care of the event generation and method invocation (on your handler implementations).

·       Partial listing of the code for the DocumentHandler implementation (and HandlerBase subclass) called SaxAddressBookHandler.

java:

import java.io.*;

import org.xml.sax.*;

import org.xml.sax.helpers.ParserFactory;

import com.sun.xml.parser.Resolver;

 

public class SaxAddressBookHandler extends HandlerBase{

 

// data members

      private AddressBook ab = new AddressBook();

      private Person p = null; //temp Person ref

      private String currentElement = null; //current element name

 

// AddressBook accessor method

      public AddressBook getAddressBook(){ return ab; }

 

// HandlerBase method overrides. This is SAX.

/*

This method is called when the SAX parser encounters an open element

tag. Must remember which element tag was just opened (so that the

characters(..) method can do something useful with the data that is

read by the parser.

*/

 

public void startElement( String name , AttributeList atts ){

      if( name.equalsIgnoreCase("LASTNAME") ) {

            currentElement = "LASTNAME";

      }

      else if( name.equalsIgnoreCase("FIRSTNAME") ) {

            currentElement = "FIRSTNAME";

      }

      else if( name.equalsIgnoreCase("COMPANY") ) {

            currentElement = "COMPANY";

      }

      else if( name.equalsIgnoreCase("EMAIL") ) {

            currentElement = "EMAIL";

      }

      else if( name.equalsIgnoreCase("PERSON") ) {

            p = new Person();

      }

}

 

/*

This method is called when the SAX parser encounters a close element

tag. If the person tag is closed, then the person objec must be

added to the AddressBook (ab).

*/

 

public void endElement( String name ){

      if( name.equalsIgnoreCase("PERSON") ) {

            ab.addPerson( p );

            p = null;

            }

}

 

/*

This method is called when the SAX parser encounters #PCDATA or CDATA.

It is important to remember which element tag was just opened so that

this data can be put in the right object.

Must trim() the textual data and make sure that empty data is

just ignored.

Also the start index and length integer must be used to retrieve only

a portion of the data stored in the char[].

*/

 

public void characters( char ch[], int start, int length ){

 

//dont try to read ch[] as it will go on forever, must use the

//range provided by the SAX parser.

 

      String value = new String( ch , start , length );

      if(!value.trim().equals("")) {

            if( currentElement.equalsIgnoreCase("FIRSTNAME") ) {

                  p.setFirstName( value );

            }

            else if( currentElement.equalsIgnoreCase("LASTNAME") ) {

                  p.setLastName( value );

            }

            else if( currentElement.equalsIgnoreCase("COMPANY") ) {

                  p.setCompany( value );

            }

            else if( currentElement.equalsIgnoreCase("EMAIL") ) {

                  p.setEmail( value );

            }

      }

}