SE735 - Data and Document Representation
& Processing |
Lecture 7 - The Document Engineering Approach |
Three key factors shape the concepts and methods of the Document
Engineering Approach:
1.
End-to-end
scope.
a.
Must be able to:
i.
describe the information content and processes
in a document exchange
ii.
identify the context of use and
its relevant requirements and constraints
iii.
analyze and design a solution
iv.
implement and deploy that solution.
b.
Must expect that the requirements
and constraints will change, so our solution must be evolvable.
i.
Pointless to develop a solution
that can’t be adapted to changing environments, no matter how theoretically
elegant or powerful it might be.
2.
The breadth of
documents that must be analyzed, designed, and implemented.
a.
Document Type Spectrum that spans
from narrative, publication-style documents to transactional, data-intensive
ones.
b.
These contrasting types of
documents have traditionally been analyzed and designed using substantially
different approaches, which are unified by emphasizing what they have in
common.
3.
The requirement
that document exchanges must be implementable in a loosely coupled,
technology-independent manner.
a.
Fundamental principle of
distributed and service-oriented architectures that the relationships between
organizations or service providers must be adaptable and flexible because only
the document interfaces are visible.
b.
It is neither necessary nor
desirable for each party to know anything about the implementation on the other
side of the exchange.
·
Every modeling methodology proposes a set of modeling activities.
·
They may differ in the order in which the activities are carried out or
how prescriptive they are about the activities and descriptions of their
results
·
How the models are described reflects the metamodel adopted by the methodology.
·
Metamodels define the kinds of information that models
contain.
·
Common metamodels also provide a useful basis
for libraries of reusable patterns because the models they contain can be
interpreted by anyone or any application that understands the metamodel.
·
Business analysis - starts with abstract views
of business models and processes
o
This high-level analysis establishes the context for understanding the
semantics of the information in the other sections of the matrix.
·
Task analysis (or user analysis) - the
observation of people performing the tasks or use cases when the application or
system must support human interfaces and not just other applications.
o
Task analysis identifies the specific steps and information that people
need to carry out a task, so it is based on actual artifacts and activities,
which are represented on the right side of the matrix.
o
Task analysis reveals rules about their intent and usage.
o
Task analysis is important when few documents or information sources
exist.
·
Document analysis - starts from analysis of
document instances.
o
These techniques extract or disentangle the presentational, structural,
and content components of documents or other information sources.
·
Data analysis (or object
analysis) -
often start from a conceptual perspective about a domain and yield an abstract
view of the information components revealed by document analysis.
Phases of the Document
Engineering Approach
·
Analyzing the Context of Use - involves identifying
strategic business objectives in terms of business model requirements and the
rules they must satisfy.
·
Analyze the Business Process - create process As-Is models.
·
Applying Patterns to Process
Models -
designing business processes.
·
Analyze Documents - describing the actual
documents needed by a business model
o
To-Be process model identifies the
roles that documents will play; document analysis exposes the specific business
rules that govern the content, structure, presentation, syntax, and semantics
of the information contained in the documents.
·
Analyze Document Components - starts with the harvesting
task.
o
Identify the individual semantic components contained in each of the
selected documents or information sources.
·
Assemble Document Components - assemble sets of information
components into meaningful structures to create a coherent conceptual view we
call the document component model.
·
Assemble Document Models - create models for new types
of documents based on the components, structures, and associations in our
document component model.
o
Apply the rules for assembling the information components necessary for
each different type of document required for the given context of use.
·
Document implementation model - the realized artifact.
o
Document implementation models realized in markup languages are more
commonly known as schemas
·
For models of business processes, realization means adopting a suitable metamodel (such as the ebXML
BPSS) to encode the specific rules and the requirements for our given context
of use.
·
Business process
implementation model - the modeling artifact itself encoded as a document
·
They are most often expressed as functional descriptions of what the
solution must do.
·
Can also include performance characteristics, quality attributes, or
conformance to regulations or standards.
·
Many requirements will be expressed as rules about the content,
structure, and presentation of documents and their components.
o
Used to identify and design new types of documents.
·
Other requirements will be expressed as usage rules or policies about
access to information or control of its processing.
o
Used to formalize the definitions of the context in which the documents
are used.
·
Collecting requirements and rules is a heuristic and iterative exercise
o
Archaeologist - search for artifacts and try to interpret them even
though the organizations or people who created them might be extinct and no
longer available to help.
§ Might discover legacy formats
and paper documents whose processes have been frozen in time.
o
Anthropologist - locate people who work with the artifacts, and they may
refer or link us to other people, who help us find more artifacts and people.
·
Business process -
a chain of related activities or events that take specified inputs, add value
to them, and yield a specific service or product that can be the input to
another business process
·
Two businesses might
use different levels of abstraction or granularity to describe the processes
they need to connect, making their process descriptions incompatible.
•
Solution:
§ Use the concepts and components provided by a business
reference model, whose hierarchical organization of processes has been
rigorously designed to reinforce granularity.
§ Express all process models at the granularity where we
can identify the documents that they produce and consume.
·
Objective of document analysis is to create a conceptual model that
encompasses all the information requirements within the required context of
use.
·
Phase begins by determining what documents and information sources we
need to analyze
·
Issues:
o
much of what must be analyzed may not be in a traditional document form
o
not all information requirements are necessarily recorded in documents
themselves
o
useful metadata about documents and their components may
be in the form of document definitions, data models and schemas.
o
additional metadata can be found in style guides, industry standards for
the domain, application interfaces, and artifacts from previous studies and
analyses
o
inventory should include any undocumented information from the people
involved in the exchange of documents
·
Need to take a representative sample of inventory
·
Not everything in the inventory is equally valuable
·
May also want to emphasize or give more weight to documents that are
especially important or authoritative
·
Harvesting the components -
isolating any semantic components they contain
·
Two distinct tasks involved in
harvesting:
o Separating the underlying meaning from presentational components
§ Involves recognizing the stylistic conventions or presentational components being applied to information in its
various formats
§ Presentational structures are usually required by people because business applications don’t care
§ Presentational structures are often the most salient patterns in narrative documents.
§ Identifying presentation
components and presentational
structures allows determination of whether stylistic characteristics are
necessary to understand the information contained in the document
o Disaggregating existing structures
·
Content Components givennames to distinguish them and suggest their meaning.
·
Naming components is a
contentious, iterative and ongoing activity.
·
Primary modeling artifact from
the analysis of information components is a Table of Candidate Content
Components.
o Aligns the components harvested from all the document sources so as to
identify synonyms, homonyms, and semantic overlaps.
·
Need to merge any synonyms (components with different names and the same
meaning) by selecting a single term to replace the different ones.
·
Need to split the different senses of homonyms (components with the same
name but different meanings) by assigning more distinctive names to each one.
·
This consolidation activity merges the separate sets of candidate
components created from each source during the harvesting activity into a
master or combined set.
·
The modeling artifact produced is called a Consolidated Table of Content Components.
·
First step in creating models
of documents from this set is to establish the required structures and identify
any associations between them
·
More rigorous techniques for assembling structural components produce
more predictable results.
·
Assemble components based on the concept of functional dependency
·
Techniques used by database designers to yield relational models that
minimize redundancy and maintain information integrity.
·
This modeling artifact is called a document
component model but it may be more familiar to data analysts as a domain model
o This
model presents an overall conceptual view of the all the information components
required for a given context of use.
o
It is convenient to represent this model as a UML class diagram.
·
From this set of associated semantic structures we can assemble all our
new document models that may span the transactional and narrative ends of the
document type spectrum
·
The document component model
that emerges from analysis does not describe a single document structure.
o
It defines a network of all potential document structures that might be
required within context of use
·
Specific types of documents are designed by organizing their structural
components into document assembly models.
o
A document assembly model is created by defining a specific path through
this network of associations.
·
First consideration in designing documents is that they are hierarchical
in their structure.
·
A document can be seen as a set of nested structure of components.
o
This is why models of documents are often expressed as tree diagrams
because such a hierarchy is the best way to represent them.
·
But … the document component model represents a network, not a
hierarchy.
o
It cannot define a document because it has no definite roots, branches,
or leaves.
·
To create a suitable hierarchical model of a document :
o
First select the entry point - the structural component required as
the root of the hierarchy.
o
Then assemble a document model by adding the required roles and
associations as dictated by the business rules and requirements of the
document’s context of use. We refer to this task as.
·
Need to create physical, computable artifacts from our models to realize model based
applications.
·
The best available way to realize physical models from conceptual ones
is to encode them in an XML schema language.
·
Document assembly models are realized by encoding them
as document implementation models.
·
The implementation language influences the potential to reuse existing
patterns.
·
Business process
implementation models encode the To-Be process,
collaboration, and transaction models defined together with any patterns adopted
or adapted for our new designs.
·
Business service interfaces can then interpret these documents to guide
the processing of the documents they receive
Phase |
Artifact |
Analyzing the Context |
UML use case diagrams |
Analyzing/Designing Business Processes |
Business Domain View Worksheet UML use case diagrams |
Analyzing/Designing Business Collaborations |
Business Process Area Worksheet UML activity diagrams |
Analyzing/Designing Business Transactions |
Business Transaction View Worksheet UML sequence diagrams |
Applying Patterns to Business Processes |
Document checklist |
Analyzing Documents |
Document inventory |
Analyzing Document Components |
Consolidated table of content components |
Assembling Document Components |
UML class diagram |
Assembling Document Models |
UML class diagram or spreadsheet assembly model |
Implementing Model-Based Applications |
XML schema for document models XML schema for process models |