SE735 - Data and Document Representation & Processing
Lecture 7 - The Document Engineering Approach
Three key factors shape the concepts and methods of the Document Engineering Approach:
1. End-to-end scope.
a. Must be able to:
i. describe the information content and processes in a document exchange
ii. identify the context of use and its relevant requirements and constraints
iii. analyze and design a solution
iv. implement and deploy that solution.
b. Must expect that the requirements and constraints will change, so our solution must be evolvable.
i. Pointless to develop a solution that can’t be adapted to changing environments, no matter how theoretically elegant or powerful it might be.
2. The breadth of documents that must be analyzed, designed, and implemented.
a. Document Type Spectrum that spans from narrative, publication-style documents to transactional, data-intensive ones.
b. These contrasting types of documents have traditionally been analyzed and designed using substantially different approaches, which are unified by emphasizing what they have in common.
3. The requirement that document exchanges must be implementable in a loosely coupled, technology-independent manner.
a. Fundamental principle of distributed and service-oriented architectures that the relationships between organizations or service providers must be adaptable and flexible because only the document interfaces are visible.
b. It is neither necessary nor desirable for each party to know anything about the implementation on the other side of the exchange.
· Every modeling methodology proposes a set of modeling activities.
· They may differ in the order in which the activities are carried out or how prescriptive they are about the activities and descriptions of their results
· How the models are described reflects the metamodel adopted by the methodology.
· Metamodels define the kinds of information that models contain.
· Common metamodels also provide a useful basis for libraries of reusable patterns because the models they contain can be interpreted by anyone or any application that understands the metamodel.
· Business analysis - starts with abstract views of business models and processes
o This high-level analysis establishes the context for understanding the semantics of the information in the other sections of the matrix.
· Task analysis (or user analysis) - the observation of people performing the tasks or use cases when the application or system must support human interfaces and not just other applications.
o Task analysis identifies the specific steps and information that people need to carry out a task, so it is based on actual artifacts and activities, which are represented on the right side of the matrix.
o Task analysis reveals rules about their intent and usage.
o Task analysis is important when few documents or information sources exist.
· Document analysis - starts from analysis of document instances.
o These techniques extract or disentangle the presentational, structural, and content components of documents or other information sources.
· Data analysis (or object analysis) - often start from a conceptual perspective about a domain and yield an abstract view of the information components revealed by document analysis.
Phases of the Document Engineering Approach
· Analyzing the Context of Use - involves identifying strategic business objectives in terms of business model requirements and the rules they must satisfy.
· Analyze the Business Process - create process As-Is models.
· Applying Patterns to Process Models - designing business processes.
· Analyze Documents - describing the actual documents needed by a business model
o To-Be process model identifies the roles that documents will play; document analysis exposes the specific business rules that govern the content, structure, presentation, syntax, and semantics of the information contained in the documents.
· Analyze Document Components - starts with the harvesting task.
o Identify the individual semantic components contained in each of the selected documents or information sources.
· Assemble Document Components - assemble sets of information components into meaningful structures to create a coherent conceptual view we call the document component model.
· Assemble Document Models - create models for new types of documents based on the components, structures, and associations in our document component model.
o Apply the rules for assembling the information components necessary for each different type of document required for the given context of use.
· Document implementation model - the realized artifact.
o Document implementation models realized in markup languages are more commonly known as schemas
· For models of business processes, realization means adopting a suitable metamodel (such as the ebXML BPSS) to encode the specific rules and the requirements for our given context of use.
· Business process implementation model - the modeling artifact itself encoded as a document
· They are most often expressed as functional descriptions of what the solution must do.
· Can also include performance characteristics, quality attributes, or conformance to regulations or standards.
· Many requirements will be expressed as rules about the content, structure, and presentation of documents and their components.
o Used to identify and design new types of documents.
· Other requirements will be expressed as usage rules or policies about access to information or control of its processing.
o Used to formalize the definitions of the context in which the documents are used.
· Collecting requirements and rules is a heuristic and iterative exercise
o Archaeologist - search for artifacts and try to interpret them even though the organizations or people who created them might be extinct and no longer available to help.
§ Might discover legacy formats and paper documents whose processes have been frozen in time.
o Anthropologist - locate people who work with the artifacts, and they may refer or link us to other people, who help us find more artifacts and people.
· Business process - a chain of related activities or events that take specified inputs, add value to them, and yield a specific service or product that can be the input to another business process
· Two businesses might use different levels of abstraction or granularity to describe the processes they need to connect, making their process descriptions incompatible.
§ Use the concepts and components provided by a business reference model, whose hierarchical organization of processes has been rigorously designed to reinforce granularity.
§ Express all process models at the granularity where we can identify the documents that they produce and consume.
· Objective of document analysis is to create a conceptual model that encompasses all the information requirements within the required context of use.
· Phase begins by determining what documents and information sources we need to analyze
o much of what must be analyzed may not be in a traditional document form
o not all information requirements are necessarily recorded in documents themselves
o useful metadata about documents and their components may be in the form of document definitions, data models and schemas.
o additional metadata can be found in style guides, industry standards for the domain, application interfaces, and artifacts from previous studies and analyses
o inventory should include any undocumented information from the people involved in the exchange of documents
· Need to take a representative sample of inventory
· Not everything in the inventory is equally valuable
· May also want to emphasize or give more weight to documents that are especially important or authoritative
· Harvesting the components - isolating any semantic components they contain
· Two distinct tasks involved in harvesting:
o Separating the underlying meaning from presentational components
§ Involves recognizing the stylistic conventions or presentational components being applied to information in its various formats
§ Presentational structures are usually required by people because business applications don’t care
§ Presentational structures are often the most salient patterns in narrative documents.
§ Identifying presentation components and presentational structures allows determination of whether stylistic characteristics are necessary to understand the information contained in the document
o Disaggregating existing structures
· Content Components givennames to distinguish them and suggest their meaning.
· Naming components is a contentious, iterative and ongoing activity.
· Primary modeling artifact from the analysis of information components is a Table of Candidate Content Components.
o Aligns the components harvested from all the document sources so as to identify synonyms, homonyms, and semantic overlaps.
· Need to merge any synonyms (components with different names and the same meaning) by selecting a single term to replace the different ones.
· Need to split the different senses of homonyms (components with the same name but different meanings) by assigning more distinctive names to each one.
· This consolidation activity merges the separate sets of candidate components created from each source during the harvesting activity into a master or combined set.
· The modeling artifact produced is called a Consolidated Table of Content Components.
· First step in creating models of documents from this set is to establish the required structures and identify any associations between them
· More rigorous techniques for assembling structural components produce more predictable results.
· Assemble components based on the concept of functional dependency
· Techniques used by database designers to yield relational models that minimize redundancy and maintain information integrity.
· This modeling artifact is called a document component model but it may be more familiar to data analysts as a domain model
o This model presents an overall conceptual view of the all the information components required for a given context of use.
o It is convenient to represent this model as a UML class diagram.
· From this set of associated semantic structures we can assemble all our new document models that may span the transactional and narrative ends of the document type spectrum
· The document component model that emerges from analysis does not describe a single document structure.
o It defines a network of all potential document structures that might be required within context of use
· Specific types of documents are designed by organizing their structural components into document assembly models.
o A document assembly model is created by defining a specific path through this network of associations.
· First consideration in designing documents is that they are hierarchical in their structure.
· A document can be seen as a set of nested structure of components.
o This is why models of documents are often expressed as tree diagrams because such a hierarchy is the best way to represent them.
· But … the document component model represents a network, not a hierarchy.
o It cannot define a document because it has no definite roots, branches, or leaves.
· To create a suitable hierarchical model of a document :
o First select the entry point - the structural component required as the root of the hierarchy.
o Then assemble a document model by adding the required roles and associations as dictated by the business rules and requirements of the document’s context of use. We refer to this task as.
· Need to create physical, computable artifacts from our models to realize model based applications.
· The best available way to realize physical models from conceptual ones is to encode them in an XML schema language.
· Document assembly models are realized by encoding them as document implementation models.
· The implementation language influences the potential to reuse existing patterns.
· Business process implementation models encode the To-Be process, collaboration, and transaction models defined together with any patterns adopted or adapted for our new designs.
· Business service interfaces can then interpret these documents to guide the processing of the documents they receive
Analyzing the Context
UML use case diagrams
Analyzing/Designing Business Processes
Business Domain View Worksheet
UML use case diagrams
Analyzing/Designing Business Collaborations
Business Process Area Worksheet
UML activity diagrams
Analyzing/Designing Business Transactions
Business Transaction View Worksheet
UML sequence diagrams
Applying Patterns to Business Processes
Analyzing Document Components
Consolidated table of content components
Assembling Document Components
UML class diagram
Assembling Document Models
UML class diagram or spreadsheet assembly model
Implementing Model-Based Applications
XML schema for document models
XML schema for process models