SE735 - Data and Document Representation & Processing

Lecture 6 - How Models and Patterns Evolve & When Models Don’t Match


Chapter 5: How Models and Patterns Evolve


The Big Ideas of Chapter 5 (and of the Information-Powered Economy)

·        Business architectures co-evolve with technology

·        Information technology has radically changed the structure of firms

·        Information about goods becomes a good (or a service?)

·        Business models are shifting from forecast/schedule-driven to demand/event-driven

·        Business relationships/architectures shifting from tightly to loosely coupled

·        Business models are shifting from proprietary to standard models with reusable components


Co-evolution of Business Models and Enabling Technologies

·        Business patterns are continuously evolving, mostly as a result of changes in information and communications technology

·        Businesses don't just select a pattern and follow it; they may have to adapt a pattern or change to a different pattern to succeed

·         New technologies pose predictable problems for the business models of incumbents (as opposed to new firms) in an industry


"The Nature of the Firm" – Coase (1937)

·        Why do firms exist at all? Why does an entrepreneur hire people instead of "renting" them in the marketplace?

·        A transaction costs analysis says that firms are created when hierarchical coordination of internal processes is more efficient than carrying out the same processes externally "in the market"

·        The marketplace sets prices and coordinates the actions of self-interested buyers and sellers through the "invisible hand" (Adam Smith), but it also imposes "transaction costs"

·         When transactions are brought inside, the administrative coordination with the "visible hand" of management and authority can reduce transaction costs


"Transaction Costs"

·        SEARCH – Discovery of potential business partners

·        INFORMATION ANALYSIS – Determining what products and services are offered and whether the partner is appropriate on other dimensions

·        BARGAINING – Proposing the terms of a business relationship

·        DECISIONMAKING – Agreeing on the terms and ensuring their fit with other business processes

·        MONITORING – Ensuring that the terms and conditions are being met

·         ENFORCEMENT – Taking corrective action if they are not


"The New Industrial State"

The size of General Motors is in the service not of monopoly or the economies of scale but planning…and (thanks to) this planning—control

of supply, control of demands, provision of capital, minimization of risk—there is no clear limit to the desirable size (of the company.)


Size is the general servant of technology, not the special servant of profits. Small businesses have no need for technological innovations and

can hardly afford to keep up with new technologies (as big businesses do) and therefore struggle to survive in the economical whirlwind of

production and profit. The enemy is advanced technology, the specialization and organization of men and process that this requires and

the resulting commitment of time and capital.


John Kenneth Galbraith (1957)


The Hierarchical Firm

·        The traditional industrial corporation of the mid-to-late 20th century was large, vertically integrated, and hierarchically organized to produce standardized products for mass markets

·        In 1960 all but two of the world’s largest companies based in US General Motors earned as much in profits as 10 biggest firms from France, UK, Germany combined (30 total)

·         US firms produced 50% of world output; this amounted to more than the next 9 industrial nations combined


Example: Ford's River Rouge Plant

·        The ultimate in vertical integration - with docks on the Rouge River, 100 miles of interior railroad track, its own electricity plant, and ore processing, raw materials were turned into running vehicles within this single complex

·        1.5 miles (2.4 km) wide by 1 mile (1.6 km) long, including 93 buildings with nearly 16 million square feet (1.5 km²) of factory floor space

·         Over 100,000 workers worked in this single complex in the mid 1900's


Transaction Costs and New Technologies

·        New technologies (e.g. telephone, mainframe computer) reduce coordination costs so firms can get bigger...

·        But what if new technologies reduce the external costs proportionally more than internal costs?

·        As communication, coordination, and monitoring costs decline because of new technology and more organizational autonomy it becomes possible to outsource non-essential functions

·        And makes it cheaper to work with new business partners on shorter term, more ad hoc relationships

·         Technical standards for product description and document exchange can also be seen as technology that reduces transaction costs


From Hierarchy to Network

·        Today, the large vertical integrated firm of the mid- to late- 900s has been transformed into a more "network" form, no longer driven by command-and-control

·        IBM, Cisco and other large firms are repositioning themselves as comprehensive "service networks" whose business units are both more autonomous and collaborative

·        Competition is increasingly between entire supply chains or ecosystems, not just between firms

·        This requires large amounts of formal and informal information exchange


5.3 Information About Goods Becomes a Good 7

Information About Goods Becomes a Good

·        Information about the supply chain is taking on independent value

·         Information about where products are, who uses them, and when and how they are used can be worth more than the products themselves

·         Once inventory and information are equivalent, the boundary between the physical and virtual worlds becomes blurred

·         New services are arising from the aggregation of information about business transactions



Example: UPS Supply Chain Solutions


Smart Firms Outsource Their Logistics


5.4 New Business Models for Information Goods       9

Toward On Demand/Event-Driven Business Models

·        No forecast can ever be as accurate as actual sales and demand information

·        The key to supply chain optimization isn't moving things faster according to plans, it is moving things smarter according to actual demand

·        "Information-driven decisions" can be make more reliably and with less latency when sensor networks collect information

·        The Internet has vastly increased the viability of direct sales for information goods

·        Two especially significant patterns are evolving for the creation and distribution of information goods and software:

o    the open access movement in scholarly and scientific publishing that seeks lawful free access to online publications 

o   the trend toward software as a service (SaaS).


From Forecast- or Schedule-Driven to Demand- or Event-Driven Models

Example: GPS & Sensor-Driven "Precision Agriculture" [1]



Example: GPS & Sensor-Driven "Precision Agriculture" [2]


Example: Mobile Telemedicine for Home Care and Patient Monitoring



Example: Mobile Telemedicine – Patient Monitor


EDF+ Data Format

·         1990 - European Data Format (EDF) - simple and flexible format for exchange and storage of multichannel biological and physical signals

·         2003 - EDF+ extension of EDF that can also contain interrupted recordings, annotations, stimuli and events.



From Tightly Coupled to Loosely Coupled Models

More flexible business models require the loosely coupled architecture of the Internet

Tight Coupling

·        "Tight coupling" between two businesses, applications or services means that their interactions and information exchanges are completely automated and optimized in performance...

·        ... by taking advantage of knowledge of their internal processes, information structures, technologies or other private characteristics that are not revealed in their public interfaces

·        ... and usually implemented with a custom program that fit only between the two of them

·        Tight coupling is most often used, and usually limited to, situations in which the same party controls both ends of the information exchange


The Integration Challenge

·        Can we have integration and loose coupling at the same time?

·        The idea of service-oriented integration says we can

·         But we can get there from here?


Co-Evolution of Business and Technology Architecture



Document- or Service-Oriented Integration

·         Loose coupling—in particular using XML documents to define interfaces—allows for the transparent scalability of business process automation as browser-based tasks are incrementally upgraded to computer-mediated ones

·        Internet protocols and XML are enabling "loosely coupled" architectures and "coarse-grained" information exchanges that make far fewer (or no) assumptions about the implementation on the "other side"

·        When integration is done with loose coupling, the two sides can make (some) changes to their implementations without affecting the other

·        This is even more true when they communicate through an "integration hub" which can further abstract their implementation by doing transport protocol/envelope/syntax translation for them

·         The particular integration technology for loose coupling is less important than the philosophy or business model that requires it – treating different organizations, applications, and devices as loosely-coupled cooperating entities regardless of where they fit within or across enterprise boundaries


Service Oriented Architecture – A Conceptual Perspective and Design Philosophy

·        Business processes are increasingly global and involve widely dispersed parts of an enterprise or multiple enterprises

·        A business needs to be able to quickly and cost-effectively change how it does business and who it does business with (suppliers, business partners, customers)

·         A business also needs more flexible relationships with its partners and "assets" to handle variable demands


Web Services {and,vs} Service Oriented Architecture

·        Web services are an important PHYSICAL architectural idea and a set of standards and techniques for loose coupling

·        Service Oriented Architecture is a CONCEPTUAL architectural perspective and design philosophy for loose coupling

·         MBAs and CIOs talk about SOAs, software architects and developers talk about web services


Web Services

·        Web Services -- with a capital "S" -- generally means a particular set of specifications for doing service-oriented integration with XML documents as the "payload" that conveys the information required by the service interface

·        (Or put another way -- the interface is specified using an XML schema that defines in a formal way the information the service expects and how it should be structured)

·         The most important Web Service specifications are those for a service's public interfaces (Web Service Description Language) and for the messaging protocol used to send and receive XML documents through those interfaces (SOAP)


The Service Discovery Myth

·        Many discussions about services highlight the concept of service discovery and a specification called UDDI (Universal Description, Discovery and Integration)

·        UDDI was proposed as a kind of services "white" and "yellow" pages directory that would enable services to be registered by their providers and discovered by potential users, all by automated means

·        But UDDI is mostly used for "internal" service directories and rarely for "public" ones

·         Most service relationships are established "offline" and then the information about how to access the service is built into the service requestor's implementation


WS-* ("star" or "splat")

·        The major platform and enterprise software vendors have developed and "standardized" a few dozen specifications for extending the basic

·        Web Services specifications to handle issues that emerge in complex distributed applications and service systems

·        These specifications cover things like security, multi-hop addressing, process choreography, policy assertion, performance management, ...

·        Their proponents argue that these additional specifications are essential for service oriented computing to be viable for enterprise-level applications and services

·         But they've made Web Services (with a capital "S") seem needlessly complex for a great many applications where they might have been useful Many services are being implemented today with simpler protocols


Web-based Services

·        This is a category coined by Erik Wilde for his courses at the I-school to mean "Web Services and any services that use any Internet protocol"

·        This includes services implemented using the basic HTTP protocol and its mechanisms for providing "better service" using content negotiation (provide different information to the client based on the type of browser, etc.)

·         This broader category makes it easier to understand and make tradeoffs in the design and implementation of services



Chapter 6: When Models Don’t Match

Four Ways to Misunderstand a Document Component

Differences in Content:

·       option a. <A>USD 100</A>

·       option b. <A>One Hundred US Dollars</A>

·       option c. <A>$US100</A>


Differences in Encoding:

·       option a. <Amount>USD 100</Amount>

·       option b. USD,100

·       option c. CUR:USD|AMT:100


Differences in Structure:

·       option a. <Amount>USD 100</Amount>

·       option b. <Currency>USD</Currency><Amount>100</Amount>

·       option c. <Amount>100<Currency>USD</Currency></Amount>


Differences in Semantics:

·       option a. <Amount>USD 100</Amount>

·       option b. <PreTaxAmount>USD90</PreTaxAmount><Tax>USD10</Tax>

·       option c. <Price>USD 100</Price>



The Interoperability Challenge



The Interoperability Problem

·        The vocabulary problem implies an interoperability problem

·        This means that two applications or services can't use each other's models or document instances "as is"

·        Some interoperability problems can be detected and resolved by completely automated mechanisms

·        Other problems can be detected and resolved with some human intervention

·        Other problems can be detected but not resolved

·         Some problems can go undetected


Syntactic and Semantic Interoperability

·        Syntactic interoperability is just the ability to exchange information. It requires agreement or compatibility at the transport and application layers of the communications protocol stack, with the messaging protocol and format, and with messaging choreography / sequencing

·        Syntactic interoperability is necessary but not sufficient

·        Semantic interoperability requires that the content of the message be understood by the recipient application or process


The E-Business "Standards Pyramid"



Why Semantic Interoperability Problems Are Often Inevitable

·        Each new vocabulary for a particular industry is a step forward for that community, but proliferates definitions of information models that are

·        common to many of them Since the distinctive or specialized parts of each vocabulary are the industry-specific "vertical" parts, a lot of attention gets paid to them

·        In contrast, relatively less effort is given to the "horizontal" parts that seem more familiar or understandable

·         Nevertheless, any large company – even highly verticalized ones – engages in diverse business activities that require it to understand multiple vocabularies at different times


Vertical and Horizontal Vocabularies Must Work Together


When Models Don't Match

·        Suppose you publish your web service interface description and tell the world "my ordering service requires a purchase order that conforms to this schema"

·        This says "send me MY purchase order" not "send me YOUR purchase order"

·         How likely is it that the purchase orders being used by other firms will be able to meet your interface requirement, either directly or after being transformed?


How Bad Can the Interoperability Problem Be?




The Interoperability Target

Conceptual Model for Electronic Orders


Physical Model (XML Schema) for Electronic Orders


    <xs:complexType name=“PartyNameType”>


                   <xs:element name=“Name” type=“xs:stringminOccurs=“0”/>



     <xs:complexType name=“AddressType”>


                   <xs:element name=“Room” type=“xs:string”/>

                   <xs:element name=“BuildingNumber” type=“xs:string”/>

                   <xs:element name=“StreetName” type=“xs:string”/>

                   <xs:element name=“CityName” type=“xs:string”/>

                   <xs:element name=“PostalZone” type=“xs:string”/>

                   <xs:element name=“CountrySubentity” type=“xs:string”/>

                   <xs:element name=“Country” type=“xs:string”/>



      <xs:complexType name=“OrderLineType”>


                    <xs:element name=“LineItem” type=“LineItemType”/>



       <xs:complexType name=“LineItemType”>


                    <xs:element name=“BookItem” type=“BookItemType”/>

                    <xs:element name=“BasePrice” type=“xs:decimal”/>

                    <xs:element name=“Quantity” type=“xs:int”/>



        <xs:complexType name=“BookItemType”>


                     <xs:element name=“Title” type=“xs:string”/>

                     <xs:element name=“Author” type=“xs:string”/>

                     <xs:element name=“ISBN” type=“xs:string”/>






The XSD Schema for the Expected Order [1]

<xs:schema xmlns:xs=""


<xs:element name="Order" type="OrderType"/>

<xs:complexType name="OrderType">


<xs:element name="BuyersID" type="xs:string"/>

<xs:element name="BuyerParty" type="PartyType"/>

<xs:element name="OrderLine" type="OrderLineType"




<xs:complexType name="PartyType">


<xs:element name="ID" type="xs:string"/>

<xs:element name="PartyName" type="PartyNameType"/>

<xs:element name="Address" type="AddressType"/>



<xs:complexType name="PartyNameType">


<xs:element name="Name" type="xs:string" minOccurs="0"/>





The XSD Schema for the Expected Order [2]

<xs:complexType name="AddressType">


<xs:element name="Room" type="xs:string"/>

<xs:element name="BuildingNumber" type="xs:string"/>

<xs:element name="StreetName" type="xs:string"/>

<xs:element name="CityName" type="xs:string"/>

<xs:element name="PostalZone" type="xs:string"/>

<xs:element name="CountrySubentity" type="xs:string"/>

<xs:element name="Country" type="xs:string"/>



<xs:complexType name="OrderLineType">


<xs:element name="LineItem" type="LineItemType"/>



<xs:complexType name="LineItemType">


<xs:element name="BookItem" type="BookItemType"/>

<xs:element name="BasePrice" type="xs:decimal"/>

<xs:element name="Quantity" type="xs:int"/>



<xs:complexType name="BookItemType">


<xs:element name="Title" type="xs:string"/>

<xs:element name="Author" type="xs:string"/>

<xs:element name="ISBN" type="xs:string"/>




Instance of an Electronic Order that conforms to this schema

<?xml version=“1.0” encoding=“UTF-8”?>






              <Name>Maynard James Keenan</Name>





              <StreetName>Ventura Blvd.</StreetName>

              <CityName>Studio City</CityName>









                   <Title>Document Engineering</Title>

                   <Author>Glushko and McGrath</Author>










Recognizing Equivalence

Variations in strategies, technology platforms, legacy applications, business processes, and terminology make it difficult to use compatible documents


Content Conflicts

·         Content conflicts occur when two parties use different sets of values for the same components

·         e.g. Order Fragment with Base Price Content Conflict



                   <Title>Document Engineering</Title>

                   <Author>Glushko and McGrath</Author>







·        The base price for the book contains a $ symbol.

·         This creates a data type conflict in the content of the component.

· has defined BasePrice in its XML schema as a decimal (meaning a positive or negative number with a decimal point) and this does not specify a currency code or symbol

·         The $ symbol in the base price value sent by the affiliate may cause it to be rejected by the order system


Encoding Conflicts

A more obvious way in which information exchanges can conflict is at the level of encoding—that is, the syntax chosen for implementing the exchange or the way information is represented within that syntax.


Syntax Conflicts

·         The most apparent differences in encoding occur when two different syntaxes are chosen

e.g. [1] Order Encoded in UN/EDIFACT (ISO 9735) standard Syntax

UNH+0GMB91604004600001+ORDERS:1:911:UN+362910 04061815???:15’















·         It is not immediately compatible with the order example in XML.

·         But as UN/EDIFACT is the only internationally recognized standard for electronic order documents

·         The affiliate might be annoyed to be told by that it is using an unacceptable format.


e.g. [2] Order Encoded in ANSI ASC X12 Syntax





PID*F****Document Engineering GLUSHKO MCGRATH





·         Popular EDI syntax developed by the American National Standards Institute known as ANSI ASC X12.

·         During the 1990s this syntax was increasingly adopted by U.S. publishers and booksellers and built into their order processing systems



·         The components of these examples require mapping or transforming into their counterpart.

·         A one-to-one mapping of document components is not always achievable

·         Numerous mapping or translation tools exist to convert EDI and other formats to XML (and vice versa), but most of them work near the surface of the message to relate parts of one message to the other and don’t provide much support for understanding or reusing the models below the surface


Grammatical Conflicts

·         Many XML encoding conflicts result from different uses of the element and attribute constructs

·         Encoding conflicts can be resolved if the underlying semantics and structures are compatible

o   If two parties have been creating models for the same business context, they will have similar conceptual models and assemblies of structures, any different choices at the encoding phase should be easy to diagnose and reconcile.


Structural Conflicts

·         Conflicts arise when the models of documents or their components have different structures.

·         Even when both parties use the same encoding rules, structural conflicts can cause interoperability problems.


Component Assembly Conflicts

·         Two parties assemble components into structures in incompatible ways.

·         This may happen when they view some of the components in a different context.

o   Even both parties have the same models for names, addresses, and other components in isolation, the differences in how they are put together results in different hierarchies and different documents

·         More significantly, the position of components in the hierarchy affects their meaning

·         The earlier in the modeling process that two parties make different decisions, the greater the possibilities for their models to be incompatible


Component Granularity Conflicts

·         Conflicts that derive from identifying components in different levels of details—these are issues about the granularity of structure in a component.

·         e.g. under specified vs over specified granularity

A.   BuyerParty Fragment with Underspecified Granularity

B.   BuyerParty Fragment with Overspecified Granularity




            <Name>Maynard James Keenan</Name>



            <StreetAddress>11271 Ventura Blvd. #505</StreetAddress>

            <City>Studio City 91604</City>
















                  <StreetName>Ventura Blvd.</StreetName>

                  <CityName>Studio City</CityName>







·         These granularity differences result in one-way interoperability—a more granular model can be transformed into a less granular model, but not vice versa.


Semantic Conflicts

·         The most complex issues affecting interoperability in document exchange are the result of semantic conflicts.

·         Even if we resolve the encoding and structural conflicts, we have a long way to go to ensure meaningful communication of information


Vocabulary Conflicts

·         Two modelers will often choose different names for the same component

·         Two possible solutions:

o   controlled vocabularies, a closed set of defining terms

o   ontologies, which define the meaning of terms using a formal or logic-based language.


Scoping Conflicts

·         Different document samples can lead to incompatible models

·         The decision about what information sources to analyze when developing a model—the inventory and sampling phase—occurs early in the modeling process.

·         If two parties begin with different samples, their models can diverge at a very early stage and chances are that the resulting models will be incompatible

·         The inventory will include information sources that are not in the form of traditional documents, such as databases, spreadsheets, web pages, and the people who create and use them