SE735 - Data and Document Representation
& Processing |
Lecture 4 - Models, Patterns, and Reuse |
In Document Engineering
models are developed that emphasize document requirements and patterns of
information exchange
What is a Model?
•
Models are simplified
descriptions of a subject that abstract from its complexity to emphasize
some features or characteristics while intentionally de-emphasizing
others
•
Models enable us
to describe and communicate systems regardless of the specific domain or
discipline that they represent
•
A model can
represent a human activity, a natural system, or a designed system
•
We can model
structures – objects, their characteristics, their static relationships with
each other like hierarchy, and reference
•
We can model
functions, processes, behaviors – dynamic activities that create and affect
structures
Recipes as Everyday
Models
•
A recipe
describes both objects and structures (ingredients) and the processes
(instructions) for creating a food dish
•
Important
characteristics and uses of the recipe model are:
•
You can FOLLOW
the recipe to create the dish
•
You can
COMMUNICATE the recipe to someone else who can then create the same dish
•
You can use the
recipe as a GUIDE FOR EXPERIMENTATION with the objects or the processes in the
recipe to create alternative dishes
A Model of Wonton
Soup
"Static" or "Structure" Recipe Model: The Ingredients
"Dynamic"
or "Process" Recipe Model: The Instructions
The Classical
Modeling Approach
1.
Find and analyze real-world
artifacts and represent the results in a model that describes their physical
implementation. We then Analyze these artifacts to create what are often called
the As-Is models.
2.
The As-Is models are transformed into To-Be business process models by selecting and adapting patterns
appropriate for the required context of use.
·
For documents - the As-Is model is called a document component
model and the To-Be models are called
document assembly models.
3.
Conceptual view returned to a physical
view by expressing it in technology appropriate for the contexts in which it
will be used.
·
These new document implementation
model and process implementation models are the As-Implemented models.
Physical Modeling for
Analysis
•
The primary purpose of modeling is to better
understand some existing system or environment and its entities ("the
things that exist in it that contain or embody relevant information") and
to describe this understanding so it can be communicated
•
Models of things as they currently exist are Physical
or As-Is models This modeling activity is usually called
"systems analysis" or simply "analysis"
• A basic task of modeling for analysis is capturing the languages and practices of the people who work with the instances and artifacts in the "real world"
•
Any system (especially business ones) has
groups of stakeholders who do not fully understand the "big picture"
– an analysis model can be used to prevent or repair misunderstandings
• A physical model synthesizes different views or observations into a more complete or generic perspective that accurately accounts for all of them
Conceptual Modeling
for Design or Re-Design
•
The next purpose of modeling is to assist in
the design or re-design of a system or set of artifacts
•
This modeling activity is usually called
"systems design" or simply "design"
•
Models of things as they could be are Conceptual
or To-Be models
•
Design abstracts away or generalizes from the
technology and implementation details in the physical model to create a
conceptual model
•
A conceptual model that is
implementation-independent is often easier to talk about than one that is
encoded in a specific technology context
•
When technologies change, the implementation
model may change but the conceptual model won't.
•
When implementations in different
technologies are based on the same conceptual models, they can more readily understood because of their common conceptual components.
Conceptual View of
Wonton Soup
Using Conceptual
Models
•
Objects can be manipulated as conceptual
components without impacting the real world that they describe, or in ways that
are impossible in the real world
•
This encourages the re-use of common
components via standardization, patterns and libraries
•
It facilitates the rationalization of components
and the removal of redundancies and inefficiencies
The Modeling Gaps
•
There is an essential difference or gap
between the real world being modeled and any models of it, or else the models
would serve no purpose
•
Likewise, there is always a gap between a
physical model and a conceptual model, because an analysis model is often most
useful when it isn't tied to specific or feasible implementations or
technologies
•
But this means we can sometimes see what the
current world looks like and what we would like it to be without being able to
see how to get from one to the other
•
Put a different way, this means that models
can be designed that are impossible to implement
Implementing a
Conceptual Model as a Physical One
•
A model is purely theoretical until it is encoded
in a technology that lets it operate in the real world again.
•
This is often a two-stage process: encoding
conceptual models as physical ones, and then applying transformations to create
instances with desired properties (e.g., implementing / formatting / rendering
in some concrete medium / syntax / technology)
•
When instances implemented in different
technologies are generated or re-generated from models, they can more readily
interoperate because of their common conceptual components
Iteration is Inevitable
Adapting the Classical Modeling Approach to Document Engineering
•
In Document Engineering context we can
analyze a domain and analyze its entities and their processes or behaviors as
we would in any other modeling activity
•
But what we generally care more about is the
information or intangible content about the entities in the domain
•
Or put another way, we only model those
entities and their properties that convey information that is relevant
•
So we are more likely to engage in
"pseudo-real-world" modeling than "real-world-modeling"
Real-World Modeling
of a Library System
Pseudo-Real-World
Modeling of a Library System
So We Model the Information, not the Things
Methodologies – Disciplines for Modeling
•
When we create a
model we follow – implicitly or explicitly – some steps or techniques for
analysis, design, and implementation
•
This is called
the modeling methodology
•
Methodologies
can be formal, prescriptive, step-by-step, documented and auditable or they can
be the opposite: informal, ad hoc, "seat of the pants" with no trace
other than the model artifact itself
•
A methodology
can contain or define:
o
Meta-models
o
Processes /
Activities / Steps
o
Notations
o
Tools
•
and how these
are applied or fit together to produce:
o
Artifacts
Sequential,
Iterative, and Artifact-Centered Methodologies
•
A methodology's process describes the work to
be done and the order in which it is to be done (the modeling workflow)
•
Many methodologies prescribe a Sequential
process -- the "waterfall" model
•
Other methodologies are more iterative or
recursive -- like the "spiral" model of progressive refinement or
"agile" modeling
•
Other methodologies are looser about the
modeling activities but emphasize the results that must be obtained at each step
or phase
Why XML for Models
•
XML provides
syntactic mechanisms that capture the semantic distinctions between documents
in terms of the sets of elements and attributes used to encode their content
and the rules that govern their occurrence and organization.
•
Two semantically
related document models like purchase order and invoice may share elements from
a common library or subset, but they are distinguished by elements that occur
only in one of them or that have different possible values in each.
•
So different
vocabularies used to mark up the content of purchase orders and of invoices.
•
XML’s ease of use,
its expressive power, and its processability have
made it attractive for Document Engineering because it can realize document
models suitable for implementation in applications
Conceptual Views: Document Component and Assembly Models
•
We can best
describe the semantics of documents using models of the concepts they
contain.
•
This conceptual
view lets us distinguish one class of document from another.
•
Conceptual views
are independent of the physical implementation and are not tied to any
particular technology.
Two types of conceptual models for documents that are more formal and precise than prose definitions.
•
The first is the
document component model, which describes the complete set of semantic
components in a domain, including their structure and their potential
relationships.
•
The document
component model portrays the network of associations between the components, so
rather than describing a single type of document, it implicitly describes many
different types of documents.
•
Such a conceptual
model of information about books is shown in the notation of a class diagram.
The model also captures important
rules about the relationships between classes:
•
A Publisher can be
considered a reuse of the object class called Party with a special association
to Book that we label as Publishes
•
The Publisher is
modeled as a “Published by” Party.
•
An Author is not a
reuse of Party because the former has attributes that do not apply to the
latter.
•
The model also
depicts the business rules that an author can write more than one book and that
books can have more than one author.
•
It also tells us
that even if the book has more than one edition, it has only one ISBN.
A document assembly model describes the
way in which selected components are assembled into a hierarchical structure
•
Document assembly
models are best visualized as tree diagrams of hierarchical structures
o Three different document assembly models may be made by
traversing the associations in different orders.
o The resulting hierarchical document types would
organize the same semantic components in different structures to impose
different interpretations or contexts that emphasize the book, the author, or
the publisher.
a b c
•
Conceptual views of
models are a better way to represent and communicate the results of analysis
and design than the physical views
because they are not constrained by any specific technology.
•
This technology
independence also makes them easier to manipulate or revise.
•
Similarly, at the
conceptual level it is easier to generalize a model to make it describe a
larger set of possible or desired artifacts than the ones we happened to
observe when we first created an implementation model.
•
It is also easier
to specialize at the conceptual level, for example, by deriving a related model
that incorporates additional characteristics or relationships; in this case, we
could express a conceptual view of a model for chemistry books based on the
model for a book.
The Model Matrix
•
Two dimensions of
model abstraction and model granularity form a matrix for
organizing the analysis and modeling approaches in Document Engineering
X-Axis: the abstraction dimension
•
The most abstract
or context-free conceptual models are arranged on the left.
•
Moving to the right
implies more physical models, and finally specific implementations of actual
documents or processes are the external models.
Y-Axis: the Granularity Dimension
•
We can depict the
amount of detail with which we describe the business relationships in each
model.
•
From the organizational or business-to-business
perspective, patterns show only the most important roles and relationships.
•
At the process level more details about the
relationship are visible, and we begin to see the documents that are exchanged
to carry out each process.
•
The information level is the most granular
perspective, and we can see specific information components within the document
models.
Metadata and Metamodels
•
Metadata hard to
define in Document Engineering because its usual definition is “data about
data”.
•
Meta-
is used to convey concern with the concepts and results of the discipline named
in the suffix.
o e.g., a metalanguage is a
language or system of symbols used to discuss another language or system
o e.g., a metatheory is a
formal system that describes the structure of some other system.
•
Metadata consists
of data structures used to discuss other data structures.
•
Metadata augments
the values of information (or data) with additional properties that explain its
meaning, organization, cardinality, and other characteristics of interest in
our models.
•
What constitutes
metadata is relative.
o Data may be metadata depending on your perspective. For
example, statistics are data to some people and metadata to others.
Metamodel required to help use metadata.
•
A metamodel is a higher abstraction of a model, used to
describe the type of information in a model.
o e.g., a model of a document: the document model’s metamodel might specify
that the content of a document can be described using separate data objects,
each of which has properties such as cardinality, definitions, conditional
rules, and sets of legitimate values.
o e.g., the metamodel for
Book.xsd is the specification for W3C Schema (XSD), which explains how these
schemas (or implementation models) are constructed.
•
By explaining what
metadata is required and how they relate to each other, metamodels
enable us to build consistent and robust models.
Difficult recognize correspondences
among models
•
While the overall
purpose of metadata may be similar in various types of models, because of their
terminology, syntaxes, or different notations
o e.g., Is an XML element
equivalent to an SQL table? What is the relationship between elements and
classes?
•
Metamodels are also useful to exchange or compare different
models.
o If two models share the same metamodel,
it is easier to compare and align the two.
Metamodels for Processes
Business processes are inherently more
abstract than documents
•
Metamodels for describing business processes have evolved that
distinguish multiple levels of abstraction along with the semantic properties
that are necessary to define each level.
•
ebXML Business Process metamodel
What
is ebXML ?
ebXML Vision:
ebXML is designed to create a global electronic market place
where enterprises of any size, anywhere can:
Why
ebXML ?
Founding
organizations:
ebXML is a joint initiative by OASIS and UN/CEFACT.
UN/CEFACT:
OASIS:
o and compare business processes when they are described
using the BPSS.
ebXML Architecture
By
definition, the iterative life cycle of B2B collaboration includes
following steps:
The
overall ebXML specifications are intended to cover
almost the entire process of B2B collaboration and are designed to meet the
needs described above.
ebXML architecture as defined by ebXML
team provides:
Consequently,
the technical architecture of ebXML is composed of
five modules:
Below
is the diagram showing simplified architecture of ebXML.
ebXML Business Process
•
The Business Process and Information model
defines how to describe the basic information elements used in business
messages and to describe business processes.
•
A Business Process
is something that a business does, such as buying computer parts or selling a
professional service.
o It involves the exchange of information between two or
more Trading Partners in some predictable way.
•
The specification
for business process definition enables an organization to express its business
processes so that they are understandable by other organizations.
o This enables the integration of business processes
within a company, or between companies.
•
The ebXML Business Process Specification Schema
(BPSS) provides the definition of an XML document that describes how an
organization conducts its business.
o An ebXML BPSS is a
declaration of the partners, roles, collaborations, choreography and business
document exchanges that make up a business process.
Following
diagram gives a conceptual view of Business Process.
Business
Collaborations:
•
A Business
Collaboration is a choreographed set of Business Transaction Activities, in
which two Trading Partners exchange documents.
o The most common one is a Binary Collaboration, in which
two partners exchange documents.
o A Multiparty Collaboration takes place when information
is exchanged between more than two parties.
o Multiparty Collaborations are actually choreographed
Binary Collaborations.
o At its lowest level, a Business Collaboration can be
broken down into Business Transactions.
Business
Transactions:
•
A Business
Transaction is the atomic level of work in a Business Process.
o It either succeeds or fails completely.
•
Business Transactions
are transactions in which Trading Partners actually transfer Business
Documents.
Business
Document flows:
•
A business
transaction is realized as Business Document flows between the requesting and
responding roles.
o There is always a requesting Business Document, and
optionally a responding Business Document, depending on the desired transaction
semantics, e.g. one-way notification vs. two-way conversation.
•
Actual document
definition is achieved using the ebXML core component
specifications, or by some methodology external to ebXML
but resulting in a DTD or Schema that an ebXML
Business Process Specification can point to.
Choreography:
•
The choreography is
expressed in terms of states and the transitions between them.
•
A Business Activity
is known as an abstract state, with Business Collaborations and Business
Transaction Activities known as concrete states.
•
The choreography is
described in the ebXML Business Process Specification
Schema using activity diagram concepts such as start state, completion state
etc.
Business
Documents:
•
The Business
Documents are composed of Business Information Objects, or smaller chunks of
information that have previously been identified.
o These chunks, or components, don't carry any information.
o They are merely structures, such as an XML Schema or a
DTD, that define information and how it must be presented.
o The end result is a predictable structure into which
information is placed, so that the receiver of the final document can interpret
it to extract the information.
Business Process
Specification Example:
A
partial example of Business Process Specification is given below:
<BusinessTransaction
name="Create Order"> <RequestingBusinessActivity name="" isNonRepudiationRequired="true" timeToAcknowledgeReceipt="P2D" timeToAcknowledgeAcceptance="P3D"> <DocumentEnvelope BusinessDocument="Purchase
Order"/ > </RequestingBusinessActivity> <RespondingBusinessActivity name="" isNonRepudiationRequired="true" timeToAcknowledgeReceipt="P5D"> <DocumentEnvelope isPositiveResponse="true" BusinessDocument="PO Acknowledgement"/> </DocumentEnvelope> </RespondingBusinessActivity> </BusinessTransaction> |
A Business Process
Specification:
ebXML Usage Example taken from the Technical Architecture Specification (http://www.tutorialspoint.com/ebxml/index.htm
)
The
example shows how organizations prepare for ebXML,
search for new trading partners, and then engage in electronic business.
1.
Company A browses
the ebXML registry to see what is available online.
•
At best, company A
can reuse all the existing business processes, documents, and core components
common to its industry that are already stored in the ebXML
registry. Otherwise company A designs the missing
parts, stores them in the ebXML registry and makes
them available for its industry partners.
2.
Company A decides
to do electronic business the ebXML way and considers
implementing a local ebXML compliant application.
•
An ebXML Business Service Interface (BSI) provides the link
between the company and the outside ebXML world. The
company has to create a Collaboration Protocol Profile (CPP) which describes
the supported business process capabilities, constraints and technical ebXML information such as choice of encryption algorithms,
encryption certificates and choice of transport protocols.
3.
Company A submits
its CPP to a ebXML registry.
•
From that point on,
company A is publicly listed in the ebXML registry
and is likely to be discovered by other companies querying for new trading
partners.
4.
Company B is
already registered at the ebXML registry and is
looking for new trading partners.
•
Company B queries
the ebXML registry and receives the CPP of company A.
•
Company B then has
two CPP's: Company A's CPP and its own.
•
The two companies
have to come to an agreement on how to do business, which is called a
Collaboration Protocol Agreement (CPA) in the ebXML
terminology.
•
Company B uses an ebXML CPA formation tool to derive a CPA from the
requirements of the two CPP's
5.
In this scenario
company B communicates with company A directly and sends the newly created CPA
for acceptance to company A.
•
Upon agreement of the CPA by company A, both
companies are ready for electronic business.
6.
The companies then
use the underlying ebXML framework and exchange
business documents conforming to the CPA.
•
This means that
both companies follow the business processes defined in the CPA.
Patterns
Patterns
are models that are sufficiently general, adaptable, and worthy of imitation
that we can reuse them
•
e.g., the system of
government called a parliament is a
pattern used by numerous countries and states.
o The Parliament model is an abstract, conceptual pattern
because a country that adopts this model does not adopt any specific
politicians and bureaucrats, just the pattern describing the ways in which its
elected representatives are organized to govern
•
Patterns are useful
in every activity, from constructing houses to building software applications to
describing human behavior.
•
Document
Engineering is mostly concerned with patterns of information exchange within
and between enterprises and the patterns of semantic components in the
documents being exchanged
Patterns in Business
Businesses
exhibit both great variety and great regularity in what they do and how they do
it
Why Businesses Follow Patterns
Businesses in different industries adopt
patterns specific to their activities for a number of reasons:
•
They may be
affected by common laws or regulations.
•
They may follow
similar trade practices and be affected by the same microeconomic factors, such
as common suppliers or customers and similar opportunities or threats related
to the introduction of new technologies or methods.
•
They may be
affected by common external forces imposed by the overall economic and
financial environment such as tax and interest rates, levels of employment and
education, and consumer confidence and other macroeconomic factors.
•
They want to
minimize the cost of hiring and training workers.
Advantages of Business Patterns
•
Reusing
well-understood patterns makes businesses easier to start, manage, and improve.
•
Adopting common
patterns can reduce development and maintenance costs, improve performance, and
enhance relationships with suppliers and customers.
•
A business can more
easily learn from others in its industry if it contributes to and follows
industry best practices or reference models.
•
The more systematic
the practices in an industry, the more a business benefits from following them
because of the network effects of standardization.
Finding Patterns in the Model Matrix
•
Generic or abstract
conceptual patterns become more specific or concrete by adding context.
•
Contextualization
means moving from left to right in the model matrix.
•
Similarly, moving
up the granularity axis in the Model Matrix gives a coarser granularity can
suggest patterns that hides details and so might encourage new innovations.
Using the Model Matrix as a Framework
•
The model matrix is
as a roadmap to the analysis and design activities and methods that get us to
its middle.
o Here the systematic differences in abstraction and
granularity of these kinds of models in the matrix suggest that different kinds
of modeling approaches are needed to create them.
•
Different models
emerge from the skills and tools of the business analyst, document analyst,
data analyst, and task analyst.
o Each of these approaches looks at documents and
processes differently, and while each of them is highly effective in some
areas, they all have blind spots where their methods do not work well.