SE735 - Data and Document Representation & Processing

Lecture 4 - Models, Patterns, and Reuse


In Document Engineering models are developed that emphasize document requirements and patterns of information exchange


What is a Model?

•     Models are simplified descriptions of a subject that abstract from its complexity to emphasize some features or characteristics while intentionally de-emphasizing others

•     Models enable us to describe and communicate systems regardless of the specific domain or discipline that they represent

•     A model can represent a human activity, a natural system, or a designed system

•     We can model structures – objects, their characteristics, their static relationships with each other like hierarchy, and reference

•     We can model functions, processes, behaviors – dynamic activities that create and affect structures


Recipes as Everyday Models

•     A recipe describes both objects and structures (ingredients) and the processes (instructions) for creating a food dish

•     Important characteristics and uses of the recipe model are:

•     You can FOLLOW the recipe to create the dish

•     You can COMMUNICATE the recipe to someone else who can then create the same dish

•     You can use the recipe as a GUIDE FOR EXPERIMENTATION with the objects or the processes in the recipe to create alternative dishes


A Model of Wonton Soup


"Static" or "Structure" Recipe Model: The Ingredients



"Dynamic" or "Process" Recipe Model: The Instructions



The Classical Modeling Approach



1.   Find and analyze real-world artifacts and represent the results in a model that describes their physical implementation. We then Analyze these artifacts to create what are often called the As-Is models.

2.   The As-Is models are transformed into To-Be business process models by selecting and adapting patterns appropriate for the required context of use.

·         For documents - the As-Is model is called a document component model and the To-Be models are called document assembly models. 

3.   Conceptual view returned to a physical view by expressing it in technology appropriate for the contexts in which it will be used.

·         These new document implementation model and process implementation models are the As-Implemented models.


Physical Modeling for Analysis

•     The primary purpose of modeling is to better understand some existing system or environment and its entities ("the things that exist in it that contain or embody relevant information") and to describe this understanding so it can be communicated

•     Models of things as they currently exist are Physical or As-Is models This modeling activity is usually called "systems analysis" or simply "analysis"

•      A basic task of modeling for analysis is capturing the languages and practices of the people who work with the instances and artifacts in the "real world"

•     Any system (especially business ones) has groups of stakeholders who do not fully understand the "big picture" – an analysis model can be used to prevent or repair misunderstandings

•      A physical model synthesizes different views or observations into a more complete or generic perspective that accurately accounts for all of them


Conceptual Modeling for Design or Re-Design

•     The next purpose of modeling is to assist in the design or re-design of a system or set of artifacts

•     This modeling activity is usually called "systems design" or simply "design"

•     Models of things as they could be are Conceptual or To-Be models

•     Design abstracts away or generalizes from the technology and implementation details in the physical model to create a conceptual model

•     A conceptual model that is implementation-independent is often easier to talk about than one that is encoded in a specific technology context

•     When technologies change, the implementation model may change but the conceptual model won't.

•     When implementations in different technologies are based on the same conceptual models, they can more readily understood because of their common conceptual components.


Conceptual View of Wonton Soup


Using Conceptual Models

•     Objects can be manipulated as conceptual components without impacting the real world that they describe, or in ways that are impossible in the real world

•     This encourages the re-use of common components via standardization, patterns and libraries

•     It facilitates the rationalization of components and the removal of redundancies and inefficiencies


The Modeling Gaps

•     There is an essential difference or gap between the real world being modeled and any models of it, or else the models would serve no purpose

•     Likewise, there is always a gap between a physical model and a conceptual model, because an analysis model is often most useful when it isn't tied to specific or feasible implementations or technologies

•     But this means we can sometimes see what the current world looks like and what we would like it to be without being able to see how to get from one to the other

•     Put a different way, this means that models can be designed that are impossible to implement


Implementing a Conceptual Model as a Physical One

•     A model is purely theoretical until it is encoded in a technology that lets it operate in the real world again.

•     This is often a two-stage process: encoding conceptual models as physical ones, and then applying transformations to create instances with desired properties (e.g., implementing / formatting / rendering in some concrete medium / syntax / technology)

•     When instances implemented in different technologies are generated or re-generated from models, they can more readily interoperate because of their common conceptual components


Iteration is Inevitable



Adapting the Classical Modeling Approach to Document Engineering

•     In Document Engineering context we can analyze a domain and analyze its entities and their processes or behaviors as we would in any other modeling activity

•     But what we generally care more about is the information or intangible content about the entities in the domain

•     Or put another way, we only model those entities and their properties that convey information that is relevant

•     So we are more likely to engage in "pseudo-real-world" modeling than "real-world-modeling"


Real-World Modeling of a Library System



Pseudo-Real-World Modeling of a Library System




So We Model the Information, not the Things


Methodologies – Disciplines for Modeling

•     When we create a model we follow – implicitly or explicitly – some steps or techniques for analysis, design, and implementation

•     This is called the modeling methodology

•     Methodologies can be formal, prescriptive, step-by-step, documented and auditable or they can be the opposite: informal, ad hoc, "seat of the pants" with no trace other than the model artifact itself

•     A methodology can contain or define:

o   Meta-models

o   Processes / Activities / Steps

o   Notations

o   Tools

•     and how these are applied or fit together to produce:

o   Artifacts


Sequential, Iterative, and Artifact-Centered Methodologies

•     A methodology's process describes the work to be done and the order in which it is to be done (the modeling workflow)

•     Many methodologies prescribe a Sequential process -- the "waterfall" model

•     Other methodologies are more iterative or recursive -- like the "spiral" model of progressive refinement or "agile" modeling

•     Other methodologies are looser about the modeling activities but emphasize the results that must be obtained at each step or phase


Why XML for Models

•     XML provides syntactic mechanisms that capture the semantic distinctions between documents in terms of the sets of elements and attributes used to encode their content and the rules that govern their occurrence and organization.

•     Two semantically related document models like purchase order and invoice may share elements from a common library or subset, but they are distinguished by elements that occur only in one of them or that have different possible values in each.

•     So different vocabularies used to mark up the content of purchase orders and of invoices.

•     XML’s ease of use, its expressive power, and its processability have made it attractive for Document Engineering because it can realize document models suitable for implementation in applications


Conceptual Views: Document Component and Assembly Models

•     We can best describe the semantics of documents using models of the concepts they contain. 

•     This conceptual view lets us distinguish one class of document from another.

•     Conceptual views are independent of the physical implementation and are not tied to any particular technology.



Two types of conceptual models for documents that are more formal and precise than prose definitions.

•     The first is the document component model, which describes the complete set of semantic components in a domain, including their structure and their potential relationships.

•     The document component model portrays the network of associations between the components, so rather than describing a single type of document, it implicitly describes many different types of documents.

•     Such a conceptual model of information about books is shown in the notation of a class diagram.

The model also captures important rules about the relationships between classes:

•     A Publisher can be considered a reuse of the object class called Party with a special association to Book that we label as Publishes

•     The Publisher is modeled as a “Published by” Party. 

•     An Author is not a reuse of Party because the former has attributes that do not apply to the latter.

•     The model also depicts the business rules that an author can write more than one book and that books can have more than one author.

•     It also tells us that even if the book has more than one edition, it has only one ISBN.


A document assembly model describes the way in which selected components are assembled into a hierarchical structure

•     Document assembly models are best visualized as tree diagrams of hierarchical structures

o   Three different document assembly models may be made by traversing the associations in different orders.

o   The resulting hierarchical document types would organize the same semantic components in different structures to impose different interpretations or contexts that emphasize the book, the author, or the publisher.


a          b             c              


•     Conceptual views of models are a better way to represent and communicate the results of analysis and design than the physical views because they are not constrained by any specific technology.

Text Box: <?xml version=“1.0” encoding=“UTF-8”?>
<!ELEMENT Book (Title, Author, ISBN, Publisher)>
<!ELEMENT Publisher (#PCDATA)>


<?xml version=“1.0” encoding=“UTF-8”?>
<xs:schema xmlns:xs=“” elementFormDefault=“qualified”>
	<xs:element name=“Book”>
				<xs:element name=“Title” type=“xs:string”/>
				<xs:element name=“Author” type=“xs:string”/>
				<xs:element name=“ISBN” type=“xs:string”/>
 <xs:element name=“Publisher” type=“xs:string”/>



•     This technology independence also makes them easier to manipulate or revise.

•     Similarly, at the conceptual level it is easier to generalize a model to make it describe a larger set of possible or desired artifacts than the ones we happened to observe when we first created an implementation model.

•     It is also easier to specialize at the conceptual level, for example, by deriving a related model that incorporates additional characteristics or relationships; in this case, we could express a conceptual view of a model for chemistry books based on the model for a book.


The Model Matrix

•     Two dimensions of model abstraction and model granularity form a matrix for organizing the analysis and modeling approaches in Document Engineering

X-Axis: the abstraction dimension

•     The most abstract or context-free conceptual models are arranged on the left.

•     Moving to the right implies more physical models, and finally specific implementations of actual documents or processes are the external models.

Y-Axis: the Granularity Dimension

•     We can depict the amount of detail with which we describe the business relationships in each model.

•     From the organizational or business-to-business perspective, patterns show only the most important roles and relationships.

•     At the process level more details about the relationship are visible, and we begin to see the documents that are exchanged to carry out each process.

•     The information level is the most granular perspective, and we can see specific information components within the document models.


Metadata and Metamodels

•     Metadata hard to define in Document Engineering because its usual definition is “data about data”.

•     Meta- is used to convey concern with the concepts and results of the discipline named in the suffix. 

o   e.g., a metalanguage is a language or system of symbols used to discuss another language or system

o   e.g., a metatheory is a formal system that describes the structure of some other system.

•     Metadata consists of data structures used to discuss other data structures.

•     Metadata augments the values of information (or data) with additional properties that explain its meaning, organization, cardinality, and other characteristics of interest in our models.

•     What constitutes metadata is relative.

o   Data may be metadata depending on your perspective. For example, statistics are data to some people and metadata to others.


Metamodel required to help use metadata.

•     A metamodel is a higher abstraction of a model, used to describe the type of information in a model.

o   e.g., a model of a document: the document model’s metamodel might specify that the content of a document can be described using separate data objects, each of which has properties such as cardinality, definitions, conditional rules, and sets of legitimate values.

o   e.g., the metamodel for Book.xsd is the specification for W3C Schema (XSD), which explains how these schemas (or implementation models) are constructed.

•     By explaining what metadata is required and how they relate to each other, metamodels enable us to build consistent and robust models.


Difficult recognize correspondences among models

•     While the overall purpose of metadata may be similar in various types of models, because of their terminology, syntaxes, or different notations

o   e.g., Is an XML element equivalent to an SQL table? What is the relationship between elements and classes?

•     Metamodels are also useful to exchange or compare different models.

o   If two models share the same metamodel, it is easier to compare and align the two.


Metamodels for Processes

Business processes are inherently more abstract than documents

•     Metamodels for describing business processes have evolved that distinguish multiple levels of abstraction along with the semantic properties that are necessary to define each level.

•     ebXML Business Process metamodel

What is ebXML ?

ebXML Vision:

ebXML is designed to create a global electronic market place where enterprises of any size, anywhere can:

Why ebXML ?

Founding organizations:

ebXML is a joint initiative by OASIS and UN/CEFACT.



o   and compare business processes when they are described using the BPSS.


ebXML Architecture

By definition, the iterative life cycle of B2B collaboration includes following steps:

The overall ebXML specifications are intended to cover almost the entire process of B2B collaboration and are designed to meet the needs described above.

ebXML architecture as defined by ebXML team provides:

Consequently, the technical architecture of ebXML is composed of five modules:

  1. Business Process Specifications
  2. Partner Profile and Agreements
  3. Registry and Repository
  4. Core Components
  5. Messaging Service

Below is the diagram showing simplified architecture of ebXML.


ebXML Business Process

•     The Business Process and Information model defines how to describe the basic information elements used in business messages and to describe business processes.

•     A Business Process is something that a business does, such as buying computer parts or selling a professional service.

o   It involves the exchange of information between two or more Trading Partners in some predictable way.

•     The specification for business process definition enables an organization to express its business processes so that they are understandable by other organizations.

o   This enables the integration of business processes within a company, or between companies.

•     The ebXML Business Process Specification Schema (BPSS) provides the definition of an XML document that describes how an organization conducts its business.

o   An ebXML BPSS is a declaration of the partners, roles, collaborations, choreography and business document exchanges that make up a business process.

Following diagram gives a conceptual view of Business Process.


Business Collaborations:

•     A Business Collaboration is a choreographed set of Business Transaction Activities, in which two Trading Partners exchange documents.

o   The most common one is a Binary Collaboration, in which two partners exchange documents.

o   A Multiparty Collaboration takes place when information is exchanged between more than two parties.

o   Multiparty Collaborations are actually choreographed Binary Collaborations.

o   At its lowest level, a Business Collaboration can be broken down into Business Transactions.

Business Transactions:

•     A Business Transaction is the atomic level of work in a Business Process.

o   It either succeeds or fails completely.

•     Business Transactions are transactions in which Trading Partners actually transfer Business Documents.

Business Document flows:

•     A business transaction is realized as Business Document flows between the requesting and responding roles.

o   There is always a requesting Business Document, and optionally a responding Business Document, depending on the desired transaction semantics, e.g. one-way notification vs. two-way conversation.

•     Actual document definition is achieved using the ebXML core component specifications, or by some methodology external to ebXML but resulting in a DTD or Schema that an ebXML Business Process Specification can point to.


•     The choreography is expressed in terms of states and the transitions between them.

•     A Business Activity is known as an abstract state, with Business Collaborations and Business Transaction Activities known as concrete states.

•     The choreography is described in the ebXML Business Process Specification Schema using activity diagram concepts such as start state, completion state etc.

Business Documents:

•     The Business Documents are composed of Business Information Objects, or smaller chunks of information that have previously been identified.

o   These chunks, or components, don't carry any information.

o   They are merely structures, such as an XML Schema or a DTD, that define information and how it must be presented.

o   The end result is a predictable structure into which information is placed, so that the receiver of the final document can interpret it to extract the information.

Business Process Specification Example:

A partial example of Business Process Specification is given below:

<BusinessTransaction name="Create Order">

    <RequestingBusinessActivity name=""




    <DocumentEnvelope BusinessDocument="Purchase Order"/ >


    <RespondingBusinessActivity name=""



    <DocumentEnvelope isPositiveResponse="true"

        BusinessDocument="PO Acknowledgement"/>




A Business Process Specification:


ebXML Usage Example taken from the Technical Architecture Specification ( )

The example shows how organizations prepare for ebXML, search for new trading partners, and then engage in electronic business.


1.   Company A browses the ebXML registry to see what is available online.

•     At best, company A can reuse all the existing business processes, documents, and core components common to its industry that are already stored in the ebXML registry. Otherwise company A designs the missing parts, stores them in the ebXML registry and makes them available for its industry partners.

2.   Company A decides to do electronic business the ebXML way and considers implementing a local ebXML compliant application.

•     An ebXML Business Service Interface (BSI) provides the link between the company and the outside ebXML world. The company has to create a Collaboration Protocol Profile (CPP) which describes the supported business process capabilities, constraints and technical ebXML information such as choice of encryption algorithms, encryption certificates and choice of transport protocols.

3.   Company A submits its CPP to a ebXML registry.

•     From that point on, company A is publicly listed in the ebXML registry and is likely to be discovered by other companies querying for new trading partners.

4.   Company B is already registered at the ebXML registry and is looking for new trading partners.

•     Company B queries the ebXML registry and receives the CPP of company A.

•     Company B then has two CPP's: Company A's CPP and its own.

•     The two companies have to come to an agreement on how to do business, which is called a Collaboration Protocol Agreement (CPA) in the ebXML terminology.

•     Company B uses an ebXML CPA formation tool to derive a CPA from the requirements of the two CPP's

5.   In this scenario company B communicates with company A directly and sends the newly created CPA for acceptance to company A.

•      Upon agreement of the CPA by company A, both companies are ready for electronic business.

6.   The companies then use the underlying ebXML framework and exchange business documents conforming to the CPA.

•     This means that both companies follow the business processes defined in the CPA.



Patterns are models that are sufficiently general, adaptable, and worthy of imitation that we can reuse them

•     e.g., the system of government called a parliament is a pattern used by numerous countries and states.

o   The Parliament model is an abstract, conceptual pattern because a country that adopts this model does not adopt any specific politicians and bureaucrats, just the pattern describing the ways in which its elected representatives are organized to govern

•     Patterns are useful in every activity, from constructing houses to building software applications  to describing human behavior.

•     Document Engineering is mostly concerned with patterns of information exchange within and between enterprises and the patterns of semantic components in the documents being exchanged


Patterns in Business

Businesses exhibit both great variety and great regularity in what they do and how they do it

Why Businesses Follow Patterns

Businesses in different industries adopt patterns specific to their activities for a number of reasons:

•             They may be affected by common laws or regulations.

•             They may follow similar trade practices and be affected by the same microeconomic factors, such as common suppliers or customers and similar opportunities or threats related to the introduction of new technologies or methods.

•             They may be affected by common external forces imposed by the overall economic and financial environment such as tax and interest rates, levels of employment and education, and consumer confidence and other macroeconomic factors.

•             They want to minimize the cost of hiring and training workers.


Advantages of Business Patterns

•             Reusing well-understood patterns makes businesses easier to start, manage, and improve.

•             Adopting common patterns can reduce development and maintenance costs, improve performance, and enhance relationships with suppliers and customers.

•             A business can more easily learn from others in its industry if it contributes to and follows industry best practices or reference models.

•             The more systematic the practices in an industry, the more a business benefits from following them because of the network effects of standardization.


Finding Patterns in the Model Matrix

•     Generic or abstract conceptual patterns become more specific or concrete by adding context.

•     Contextualization means moving from left to right in the model matrix.

•     Similarly, moving up the granularity axis in the Model Matrix gives a coarser granularity can suggest patterns that hides details and so might encourage new innovations.


Using the Model Matrix as a Framework

•     The model matrix is as a roadmap to the analysis and design activities and methods that get us to its middle.

o   Here the systematic differences in abstraction and granularity of these kinds of models in the matrix suggest that different kinds of modeling approaches are needed to create them.

•     Different models emerge from the skills and tools of the business analyst, document analyst, data analyst, and task analyst.

o   Each of these approaches looks at documents and processes differently, and while each of them is highly effective in some areas, they all have blind spots where their methods do not work well.