SE735 - Data and Document Representation & Processing
|
Lecture 1 - Convergence : Data, Documents,
Delivery
|
Brief History of
Documents
1. Clay tablets (Mesopotamia - 3rd millennium BC): Babylonia: Early
Accounting Practice
Administrative
tablet with cylinder seal impression of a male figure, hunting dogs, and boars,
3100–2900 b.c.;
Jemdet Nasr period (Uruk III script), Mesopotamia,
H. 2 in. (5.3 cm)
Metropolitan
Museum of Art, Purchase, Raymond and Beverly Sackler
Gift, 1988 (1988.433.1)
This tablet most likely documents grain
distributed by a large temple, although the absence of verbs in early texts
makes them difficult to interpret with certainty.
Drawing of a tablet from the Uruk III period (ca. 3300-3000 BC) containing an accounting
of deliveries of barley and malt from two individuals for the production of
beer.
·
The bottom row bears the name of the official in
charge.
·
The tablet is read from right-to-left and top-down.
·
Each row corresponds to an individual, with the
first two columns containing entries for malt, followed by a column for barley.
·
Subtotals are given in the third column (barley groats (top) and malt (bottom)).
·
The left-most box displays the grand total.
·
No formal language was used to express the
relationship between the signs and symbols in the tablet.
2. Sumerian Accounting Practice
A record of sources of revenue and monthly disbursements to forty-six
temple personnel by its bursar Ḫunabi for the year
1295 BCE
Table attributes:
·
Column headings
and row titles.
·
Column headings at the top of the table specify
month names.
·
Names and professions are shown in the right-hand
column (e.g. seeress, weaver, overseer, temple
servant).
·
Eighteen of the individuals listed receive no
payment for all or half the year. (Notice the blank “smooth” cells along rows.)
·
These individuals are classified as either dead or
fugitive.
·
Grid locations within the table contain numerical
information that are part of calculations, flowing first down a column, and
then across a row.
·
Subtotals for each individual are given every six
months, culminating with a yearly total adjacent to row labels.
·
The table is annotated with explanatory
interpolations under columns containing totals, and a summary column at the
table's end.
3. Egypt: Papyrus
·
In Ancient Egypt, papyrus was used for writing - first evidence is from
the account books of King Neferirkare Kakai of the Fifth Dynasty (about 2400 BC)
·
Papyrus books were in the form of a scroll of several sheets pasted
together, for a total length of up to 10 meters or even more.
·
The spread of books, and attention to their cataloging and conservation,
as well as literary criticism developed during the Hellenistic period with the
creation of large libraries in response to the desire for knowledge exemplified
by Aristotle.
·
e.g.
o The Library of
Alexandria, a library created by Ptolemy Soter and
set up by Demetrius Phalereus contained 500,900
volumes (in the Museion section) and 40,000 at the Serapis temple (Serapeion).
o All books in the
luggage of visitors to Egypt were inspected, and could be held for copying.
·
Papyrus supported book production in Rome in the 1st century BC with
Latin literature that had been influenced by the Greek.
4. The Codex
·
Forerunner of the contemporary
book.
·
Invented by the Romans (1C. AD)
·
Originally constructed by binding
together waxed wooden writing tablets, and eventually papyrus, and the
parchment sheets (animal membrane).
·
The codex is more practical than a scroll, given that it allows random
information access, as opposed to a scroll's sequential access; and unlike the
scroll, both sides of a sheet may be used for writing.
·
5. Medieval Manuscripts
Carolingian
Renaissance (Emperor Charlemagne (c. 742 – 814))
·
As the
lands under his dominion continued to grow during the eighth century, there
were insufficient literate individuals to help administer the expanding state.
·
The
decline of the Roman Empire had engendered a regionalization of Latin dialects,
the future modern romance languages, which seriously impeded communication
across Europe.
·
During the
latter quarter of the eighth century Charlemagne executed a program of reforms
that would transform the state and become known as the Carolingian Renaissance.
·
A major
part of his program was to attract many of the leading scholars of his day to
his court.
·
With the
aid of one of these scholars, the English monk Alcuin of York (c. 735 – 804),
who arrived at his court in 782, a program of cultural revitalization and
educational transformation was undertaken to restore old schools and found new
ones throughout his empire under the guidance of a monastery, cathedral, or
noble court.
·
A standard
curriculum was developed that established the trivium
(grammar, logic, and rhetoric) and quadrivium (arithmetic, geometry, music, and
astronomy) as the basis for education, and writing of textbooks was undertaken.
·
A
standardized version of Latin was also developed that became the common
language of scholarship and supported pan-European administration of the
empire. Writing was standardized too.
·
The
Carolingian minuscule was introduced to increase the uniformity, clarity, and
legibility of handwriting.
o
It was
used between 800 and 1200 to write codices, pagan and Christian manuscripts,
and educational texts.
·
Rise of scriptoria
for writing and copying manuscripts.
6. China: The Technological Roots of Printing
·
Paper - rag paper and silk paper by
105AD
·
Paper's migration to Europe - passed
onto through Middle East by 9th century AD and to Europe via
Crusades by 13th Century.
·
Large
ideographic alphabet
·
Invented
movable type
7. Gutenberg and the Historical Moment in Western Europe
- Scribal
hand-copying on Velum through 15th Century
- Rise
of paper use - lowering of cost
- Increased use of paper for business practices and
government documentation
- Church
indulgences written on paper
- Growth
of scholarship at universities required textbooks
- Movable
metal type invented (1452)
- Gutenberg Bibles printed - 300 two-volume sets
- The
Protestant Reformation - large-scale publishing of Martin Luther’s Theses
(~300K copies)
- William
Caxton set up 1st printing press in England and published
popular works such as Canterbury Tales
8. Print and Modern Thought
- scientific
thinking - broad dissemination of scientific insights and discoveries by
print
- rise
of scientific community -
- easy exchange of ideas gave rise to a scientific
community that functioned without geographical constraints
- systematization
of methodologies and development of rational thought.
- expand
the collective body of knowledge
- indexes
and cross-referencing emerged to manage volumes of information and
make creative associations between ideas.
- the
rise of an intellectual class
- Move toward standardization of language
- Move
toward editing and peer review
- transformations:
oral, written and print cultures
- Writing facilitates interpretation and reflection
since memorization is no longer required for the communication and
processing of ideas
- Recorded
history could persist and be added to through the centuries.
- Written
manuscripts sparked a variation on the oral tradition of communal
storytelling -- became common for one person to read out loud to the
group.
- privacy
o
Less
expensive and more portable books lent themselves to solitary and silent
reading
9. Advances in Print Technology
- Linotype machine (1884) - movable type created by machine – poured lead
into molds.
·
Teletypewriter
(1913) - could be attached directly to a Linotype machines to control
composition by means of a perforated tape
o
Tape
was punched on a separate keyboard unit
·
Tape-reader
translated the punched code into electrical signals that could be sent by wire
to tape-punching units in many cities simultaneously
·
Xerography (1938) – uses
photoconductivity
·
Computer-based
printer technology
Desktop Publishing (DTP)
Definition: Preparation of typeset or near typeset documents on
desktop computers (personal computers). All text composition, page makeup,
manipulation of digitized graphics and integration of text and graphics are
performed on desktop computers.
Three activities of DTP
- Pure
text preparation
- Creation
and manipulation of graphic images, where text plays only a minor role
3.
Complex
page makeup, in which text and graphic elements are united in a harmonious way
within the confines of a single page
Assumptions:
- Desktop
publishing equipment fits on a desktop
- DTP
software will enable the integration of multi-font text and graphics and
its display in a what-you-see-is-what-you-get (WYSIWYG) form
Components of a DTP
system
- a
data input device
- data
manipulation software (Page Layout Software)
- a
cpu to run the software
- a
display device
- an
output device
Key stages in the
process of DTP
- Need
for publication: Conduct appropriate analysis
to determine need for publication
- Purpose
and audience: Consider the audience,
content, style, language, purpose.
- Create
text: Word processed, scanned or
directly typed into program. Proof read text to ensure content is OK.
- Create
graphics: Graphics created with
appropriate software,scanner,
tablet or digitiser.
- Design
format: Determine grid, columns,
headers and footers, page numbers, text style, design final layout.
- Load
files and lay out publication:
Text and graphics are combined, formatted, scaled and positioned.
- Print: Choice of a suitable high resolution printer, i.e.
laser printer or imagesetter
History
1979
- Alto - Xerox PARC
- Local
processing and memory
- High
resolution bit-mapped graphics
- Keyboard,
mouse
- LAN
connection
- Graphical
windows interface(WYSIWYG) with integrated text editor, illustration
creation, and email
1981
- Model 8010 (Star) - Xerox PARC
- Designed
for Offices
- Innovative
HCI Interface
- Direct
manipulation
- Options
or properties
- WYSIWYG
- Generic
Commands (e.g. Move, Copy, Delete)
- High
Degree of Consistency
- Not
Success
- Limited
Functionality (e.g. no spreadsheet)
- Closed
architecture (difficult for 3rd parties to write programs)
- Perceived
by users as slow
- Mouse
overused
- Lessons
Learned
- Use
of Objects-and-Actions design method
- Attention
to Detail
- Participation
of Designers
1983
- Canon develops the 'engine' used in low cost laser printers
1983 -
Lisa - Apple
- Steve
Jobs saw Star prototype at PARC
- $10K
- Not
networked like Star
1984- Hewlett-Packard produces the HP LaserJet
1984
- Macintosh - Apple
- $2500
- Mac
succeeded but Star did not
- Mac
did not need to trailblaze
- Apple
learned from experience
- Mac
aggressively priced
- Mac
had powerful developer toolkit
·
Mac
had excellent graphics & 300dpi laser printer
1984 - Adobe introduce PostScript page description language
(PDL)
1985 - Aldus develops PageMaker for Mac
1985 - Adobe builds PostScript hardware/software interface to Apple
LaserWriter (cost $5000)
1986 - Microsoft release Windows 1.1
Mark-up Languages
Definition: A notation for identifying the components of a document
to enable each component to be appropriately formatted, displayed, or used.
1967 - William Tunnicliffe paper- titled The
Separation of Information Content of Documents from their Format –
separates content from formatting
1969 - Charles Goldfarb - GenCode project at IBM
expanded this work to develop the Generalized Markup Language (GML) – by
1980, 90% of
IBM documents formatted in GML
1973 – Joe Osanna - Unix operating System
(PDP-11)
·
nroff produced text output suitable for
terminals and line printers
·
troff generated a graphical output for a
Wang typesetter
·
1979
– troff modified to work
with different output devices.
1977 - Donald Knuth – TeX – begun in 1977, evolved through early ‘80s - detailed
layout of text and font descriptions to typeset mathematical books in
professional quality.
1980 – Brian Reid – Scribe : a
document specification language and its compiler
·
Prepare
a manuscript file using a text editor.
·
Process
this manuscript file through Scribe to generate a document file, which is then
printed on some convenient printing
·
Scribe
controls the words, lines, pages, spacing, headings, footings, footnotes,
numbering, tables of contents, indexes and more.
·
Scribe
has a database of document format definitions which tell it the rules for
formatting a document in a particular style.
·
Under
normal circumstances, writers need not concern themselves with the details of
formatting, because Scribe does it for them.
·
The
manuscript document an author creates has markup statements throughout.
o
Describe
the various components of the document to the Scribe processor.
o
The
descriptive markup the author places in the document is interpreted and
formatted by the Scribe document processor.
1986 - Standard
Generalized Markup Language (SGML) extended GML and was accepted as an ISO
standard
·
1st
working document by Charles Goldfarb in 1980
·
Influenced
by Scribe
·
focused
on the structural aspects of a document and left the visual presentation of
that structure to the interpreter
·
Specifies
a syntax for including the markup in documents and a
"metalanguage" for separately describing
what the markup meant.
·
Allowed
authors to create and use any markup they wished, selecting tags that made the
most sense to them
·
Issues:
o
Generally
found to be cumbersome, a side effect of attempting to do too much and be too
flexible
o
Unknown
to the masses
o
Too
few tools to create files
o
Tools
are expensive
o
Companion
norms for style or hypertext are not ready
o
Not
well supported by the major editors of the software market
1991 - Tim Berners-Lee
and Robert Caillau - HyperText
Markup Language (HTML) - some SGML syntax, without the meta-language
·
HTML
consists of a set of "known" tags that handle common formatting tasks
·
Originally
created to markup simple scientific papers and therefore need to be expanded in
order to offer the rich content the web has today
·
As
a result additions often follow no logical design, although recent efforts have
attempted to address this.
·
Advantages
o
Simple to learn and to use
·
Easy to create from
scratch or by converting legacy text files
·
Easy to parse
·
Disadvantages
o
Syntaxless
o
Much
more a presentation language than a structural language
o
Too
limited
1998 – XML – extended
Markup Language
·
XML
is a strict subset of SGML
·
Like
SGML, XML is a grammar ( or a metalanguage ) and NOT
a language
·
XML
extends SGML features
·
Out
of date SGML features are eliminated
·
Well-formed document syntax
Other Languages
Adobe PostScript
- Interpreted
programming language used to control typesetting machines and laser
printers
- Used
as a page description language
- Can
be used to create vector graphics
Adobe PDF
· Optimized PostScript
· PDF document attributes:
·
external
links
·
article
threads
·
security
features
·
device
independent colour
·
notes
Shared Documents: Groupware and Computer-Supported Cooperative Work
(CSCW)
·
Computer-assisted
coordinated activity carried out by groups of collaborating individuals
·
e.g.
·
communication
·
problem
solving
·
co-authoring
a document
- Groupware
- information technology used to help people work together more
effectively.
- Groupware makes user aware that he is part of a group.
- Groupware
defined by Peter and Trudy Johnson-Lenz (1982) - their ideas included:
- messaging
- conferencing
- filtered exchanges
- relational structures
- voting
- decision support tools
Computer Supported Cooperative Work
(1984) coined by Gireif and Cashman
o
Today
serves as a forum - collaborative/cooperative a metaphor, it could support
competition.
A Paradigm Shift for Computing
Transformation
from human-machine to human-human interaction
Results
from several convergent phenomena:
- Pervasive networking
- Growth of workgroup computing
- Growth of technology supporting executive and
managerial group decision making
- Merging of telecommunications and computing supporting
applications such as video conferencing
- Advancement in work-at-a-distance
- New technologies - ISDN, DSL, cable modem
Widespread
groupware:
- Email
- Computer conferencing (aka structured email)
- Teleconferencing (use of audio/video)
- Joint problem solving:
- Collaborative writing or drawing
- Group decision support systems (with electronic
meeting rooms)
CSCW
Taxonomy
- DeSanctis
and Gallup (1987) - 2x2 matrix differentiates groupware technologies into
two groups - bridge time and bridge space
·
Today
systems are moving toward anytime/anyplace
|
One
Meeting Site
|
Multiple
Meeting Sites
|
Synchronous
Communications
|
Face-to-Face
Interactions
- Public Computer Displays
- Electronic Meeting Rooms
- Group Decision Support Systems
|
Remote
Interactions
- Shared View Desktop conference Systems
- Desktop Conferencing with Collaborative Editors
- Video Conferencing
- Media Spaces
|
Asynchronous
Communications
|
Ongoing
Tasks
- Team Rooms
- Group Displays
- Project Management
|
Communicationand Coordination
- Vanilla email
- Async
conferencing bulletin boards
- Stuctured
messaging systems
- Workflow management
- Version Control
- Meeting Schedulers
- Cooperative hypertext, organizational memory
|
Asynchronous
Groupware
·
Supports
communication and problem solving among groups of individuals who contribute at
different times
- Email and computer conferencing systems
- Structures messaging systems
- Cooperative hypertext or hypermedia systems
·
Email and Computer Conferencing
- Most successful: asynchronous, fast, can be sent to
multiple receivers, has built-in external memory
- Can contain text, image, video, sound
- Organized email:
- Computer conferencing system: Messages organized by topic, emphasizing dialogue
- Electronic bulletin boards: Messages organized by
time, emphasizing broadcast of information
Structured
Messages, Agents and Workflow
- Structured
messaging systems - better methods of organizing,
classifying, filtering, and managing messages
- Agent - create intelligent messaging system delegating tasks
to computer process
·
Workflow - focus on messages
that define processes - sets of rules which create conversations
Cooperative
Hypertext and Organizational Memory
- Applications
focus on messages or documents and their interrelationships - cooperative
hypertext systems
- Supports:
- collaborative knowledge building
- Asynchronous collaborative writing
- Creating organizational memory
- e.g. Schatz (1991-1992)
- Community systems project - build electronic
scientific community by collecting all community's scientific Knowledge
and make available
- The Telesophy System - 500
researchers studying the nematode worm
- Extend support beyond document to include process.
(Conklin 1992)
- Software integrates three technologies:
- Hypertext, groupware, and a rhetorical method (improves
dialogue and conversational record and Organizational memory
- Most successful organizational memory - Lotus Notes
- Integrated communications and database network
application
- Designed to gather, organize, distribute information
among workgroups
- Platform for developing workgroup applications
- Used for message routing, report distribution, idea
discussion, and for tracking and managing projects
- Biggest application - Price-Waterhouse (tens of
thousands of notes licenses) why?
- No on knew who had knowledge
to solve a particular problem
- Constantly reinventing wheel worldwide
- Need for better communication
- Results:
- Retention of knowledge
- Support for global collaboration and global
discussion
- Enhanced communication
Synchronous
Groupware
- Software
that assists a group of individuals in working together simultaneously to
carry out a task
- Four
classes:
- Desktop conference systems (e.g. outlining, writing,
sketching, spreadsheet)
- System infrastructure for desktop conferencing
- Electronic meeting and decision rooms
- Media spaces that include computer controlled AV
networks and virtual meeting environments
- Desktop
Conferencing Systems
- NLS shared-screen conferencing systems 1968 -augment
face-to-face communication
- Xerox Parc Colab Project 1987
- Tools for collaborative brainstorming, argument
development, free style sketching
- Workstation-based with large touch screen in front of
room
- WYSIWIS (what you see is what I see)
- System
Infrastructure for Desktop Publishing
o
Two
approaches to developing groupware:
1.
Collaboration transparency - single user software made available to group
2.
Collaboration aware - rewritten software for group use
o
#1
is simplest approach - some software run on multiple workstations under control
of screen sharing software
- Electronic
Meeting and Decision Rooms
- Decision support systems - work in electronic meeting
room
- Supports parallel or sequential activity
- Idea generation
- Idea organization
- Voting
- Media
Spaces
- Computer controlled teleconferencing system where A/V
communication and shared digital workspaces overcome physical separation
- e.g. Hiroshi Ishii - Teamwork system
- Display of shared digital workspaces with displays
of drawing surfaces and desktop materials
- Implements seamlessness between individual and
workgroup by overlaying translucent workgroup by overlaying translucent
workspace images-live video analog images of computer screens and
desktop surfaces
- Computer screen overlay is a shared screen combining
windows from individual collaborators
- Creates shared interpersonal space using small
windows displaying live video of one's collaborator
e-Books: