CS835 - Data and Document
Representation & Processing
|
Lecture 1 - Convergence :
Data, Documents, Delivery |
History of
Printing
1. China:
The Technological Roots
2.
Gutenberg and the Historical Moment in Western
Europe
- Scribal hand-copying on Velum
through 15th Century
- Rise of paper use - lowering of
cost
- Increased use of paper for
business practices and government documentation
- Church indulgences written on
paper
- Growth of scholarship at
universities required textbooks
- Movable metal type invented
(1452)
- Gutenberg Bibles printed - 300
two-volume sets
- The Protestant Reformation -
large-scale publishing of Martin Luther’s Theses (~300K
copies)
- William Caxton set up 1st
printing press in England and published popular works such as
Canterbury Tales
3.
Print and Modern Thought
- scientific thinking - broad
dissemination of scientific insights and discoveries by print
- rise of scientific
community -
- easy exchange of ideas gave
rise to a scientific community that functioned without geographical
constraints
- systematization of methodologies
and development of rational thought.
- expand the collective body of
knowledge
- indexes and cross-referencing
emerged to manage volumes of information and make creative
associations between ideas.
- the rise of an intellectual
class
- Move toward standardization of
language
- Move toward editing and peer
review
- transformations: oral, written
and print cultures
- Writing facilitates
interpretation and reflection since memorization is no longer
required for the communication and processing of ideas
- Recorded history could persist
and be added to through the centuries.
- Written manuscripts sparked a
variation on the oral tradition of communal storytelling -- became
common for one person to read out loud to the group.
- privacy
4.
Advances in Print Technology
Desktop
Publishing (DTP)
Definition:
Preparation of typeset or near typeset documents
on desktop computers (personal computers). All text composition, page
makeup, manipulation of digitized graphics and integration of text
and graphics are performed on desktop computers.
Three activities of DTP
- Pure text preparation
- Creation and manipulation of
graphic images, where text plays only a minor role
Complex page makeup, in which
text and graphic elements are united in a harmonious way within the
confines of a single page
- Assumptions:
-
Desktop publishing equipment fits on
a desktop
- DTP software will enable the
integration of multi-font text and graphics and its display in a
what-you-see-is-what-you-get (WYSIWYG) form
Components of a DTP system
- a data input device
- data manipulation software (Page
Layout Software)
- a cpu to run the software
- a display device
- an output device
Key stages in the process of DTP
- Need for publication:
Conduct appropriate analysis to determine need for publication
- Purpose and audience:
Consider the audience, content, style, language, purpose.
- Create text:
Word processed, scanned or directly typed into program. Proof read
text to ensure content is OK.
- Create graphics:
Graphics created with appropriate software,scanner, tablet or
digitiser.
- Design format:
Determine grid, columns, headers and footers, page numbers, text
style, design final layout.
- Load files and lay out
publication: Text
and graphics are combined, formatted, scaled and positioned.
- Print:
Choice of a suitable high resolution printer, i.e. laser printer or
imagesetter
History
1979
- Alto - Xerox PARC
- Local processing and memory
- High resolution bit-mapped
graphics
- Keyboard, mouse
- LAN connection
- Graphical windows
interface(WYSIWYG) with integrated text editor, illustration
creation, and email
1981
- Model 8010 (Star) - Xerox PARC
- Designed for Offices
- Innovative HCI Interface
- Direct manipulation
- Options or properties
- WYSIWYG
- Generic Commands (e.g. Move,
Copy, Delete)
- High Degree of Consistency
- Not Success
- Limited Functionality (e.g. no
spreadsheet)
- Closed architecture (difficult
for 3rd parties to write programs)
- Perceived by users as slow
- Mouse overused
- Lessons Learned
- Use of Objects-and-Actions
design method
- Attention to Detail
- Participation of Designers
1983
- Canon develops the 'engine' used in low cost laser printers
1983 -
Lisa - Apple
- Steve Jobs saw Star prototype at
PARC
- $10K
- Not networked like Star
- 1984-
Hewlett-Packard produces the HP LaserJet
1984
- Macintosh - Apple
- 1984
- Adobe introduce PostScript page description language (PDL)
-
-
1985 -
Aldus develops PageMaker for Mac
-
1985 -
Adobe builds PostScript hardware/software
interface to Apple LaserWriter (cost $5000)
- 1986 -
Microsoft release Windows 1.1
Mark-up
Languages
Definition: A
notation for identifying the components of a document to enable each
component to be appropriately formatted, displayed, or used.
1967
- William Tunnicliffe paper- titled The Separation of
Information Content of Documents from their Format –
separates content from formatting
1969
- Charles Goldfarb - GenCode project at IBM
expanded this work to develop the Generalized Markup Language (GML) –
by
1980, 90% of IBM documents
formatted in GML
1973
– Joe Osanna - Unix operating System (PDP-11)
nroff produced text output
suitable for terminals and line printers
troff generated a graphical
output for a Wang typesetter
1979 – troff modified
to work with different output devices.
1977
- Donald Knuth – TeX – begun in 1977, evolved through
early ‘80s - detailed layout of text and font descriptions to
typeset mathematical books in professional quality.
1980
– Brian Reid – Scribe : a
document specification language and its compiler
Prepare a manuscript file
using a text editor.
Process this manuscript file
through Scribe to generate a document file, which is then printed on
some convenient printing
Scribe controls the words,
lines, pages, spacing, headings, footings, footnotes, numbering,
tables of contents, indexes and more.
Scribe has a database of
document format definitions which tell it the rules for formatting a
document in a particular style.
Under normal circumstances,
writers need not concern themselves with the details of formatting,
because Scribe does it for them.
The manuscript document an
author creates has markup statements throughout.
1986
- Standard Generalized Markup Language (SGML) extended GML and was
accepted as an ISO standard
Influenced by Scribe
focused on the structural
aspects of a document and left the visual presentation of that
structure to the interpreter
Specifies a syntax for
including the markup in documents and a "metalanguage" for
separately describing what the markup meant.
Allowed authors to create and
use any markup they wished, selecting tags that made the most sense
to them
Issues:
Generally found to be
cumbersome, a side effect of attempting to do too much and be too
flexible
Unknown to the masses
Too few tools to create
files
Tools are expensive
Companion norms for style or
hypertext are not ready
Not well supported by the
major editors of the software market
1991
- Tim Berners-Lee and Robert Caillau - HyperText Markup Language
(HTML) - some SGML syntax, without the meta-language
HTML consists of a set of
"known" tags that handle common formatting tasks
Originally created to markup
simple scientific papers and therefore need to be expanded in order
to offer the rich content the web has today
As a result additions often
follow no logical design, although recent efforts have attempted to
address this.
Advantages
Easy to
create from scratch or by converting legacy text files
Easy to
parse
Disadvantages
1998
– XML – extended Markup Language
XML is a strict subset
of SGML
Like SGML, XML is a grammar (
or a metalanguage ) and NOT a language
XML extends SGML features
Out of date SGML features are
eliminated
Well-formed
document syntax
- Other
Languages
-
Adobe PostScript
-
Interpreted programming language
used to control typesetting machines and laser printers
-
Used as a page description language
-
Can be used to create vector
graphics
-
-
Adobe PDF
-
· Optimized
PostScript
-
· PDF
document attributes:
external links
article threads
Groupware
and Computer-Supported Cooperative Work (CSCW)
communication
problem solving
co-authoring a document
- Groupware - information
technology used to help people work together more effectively.
- Groupware makes user aware that
he is part of a group.
- Groupware defined by Peter and
Trudy Johnson-Lenz (1982) - their ideas included:
- messaging
- conferencing
- filtered exchanges
- relational structures
- voting
- decision support tools
- Computer Supported Cooperative
Work (1984) coined by Gireif and Cashman
A
Paradigm Shift for Computing
Transformation from human-machine
to human-human interaction
Results from several convergent
phenomena:
- Pervasive networking
- Growth of workgroup computing
- Growth of technology supporting
executive and managerial group decision making
- Merging of telecommunications
and computing supporting applications such as video conferencing
- Advancement in
work-at-a-distance
- New technologies - ISDN, DSL,
cable modem
Widespread groupware:
- Email
- Computer conferencing (aka
structured email)
- Teleconferencing (use of
audio/video)
- Joint problem solving:
- Collaborative writing or
drawing
- Group decision support systems
(with electronic meeting rooms)
CSCW
Taxonomy
|
One Meeting Site |
Multiple Meeting Sites |
Synchronous Communications |
Face-to-Face Interactions
- Public Computer Displays
- Electronic Meeting Rooms
- Group
Decision Support Systems
|
Remote Interactions
- Shared View Desktop
conference Systems
- Desktop Conferencing with
Collaborative Editors
- Video Conferencing
- Media
Spaces
|
Asynchronous Communications |
Ongoing Tasks
- Team Rooms
- Group Displays
- Project
Management
|
Communicationand Coordination
- Vanilla email
- Async conferencing
bulletin boards
- Stuctured messaging
systems
- Workflow management
- Version Control
- Meeting Schedulers
- Cooperative
hypertext, organizational memory
|
Asynchronous
Groupware
- Email and computer conferencing
systems
- Structures messaging systems
- Cooperative hypertext or
hypermedia systems
- Most successful: asynchronous,
fast, can be sent to multiple receivers, has built-in external
memory
- Can contain text, image, video,
sound
- Organized email:
- Computer conferencing system:
Messages organized by topic, emphasizing
dialogue
- Electronic bulletin boards:
Messages organized by time, emphasizing broadcast of information
Structured
Messages, Agents and Workflow
Cooperative
Hypertext and Organizational Memory
- Applications focus on messages
or documents and their interrelationships - cooperative hypertext
systems
- Supports:
- collaborative knowledge
building
- Asynchronous collaborative
writing
- Creating organizational memory
- e.g. Schatz (1991-1992)
- Community systems project -
build electronic scientific community by collecting all
community's scientific Knowledge and make available
- The Telesophy System - 500
researchers studying the nematode worm
- Extend support beyond document
to include process. (Conklin 1992)
- Software integrates three
technologies:
- Hypertext, groupware, and a
rhetorical method (improves dialogue and conversational record
and Organizational memory
- Most successful organizational
memory - Lotus Notes
- Integrated communications and
database network application
- Designed to gather, organize,
distribute information among workgroups
- Platform for developing
workgroup applications
- Used for message routing,
report distribution, idea discussion, and for tracking and
managing projects
- Biggest application -
Price-Waterhouse (tens of thousands of notes licenses) why?
- No on knew who had knowledge
to solve a particular problem
- Constantly reinventing wheel
worldwide
- Need for better communication
- Results:
- Retention of knowledge
- Support for global
collaboration and global discussion
- Enhanced communication
Synchronous
Groupware
- Software that assists a group of
individuals in working together simultaneously to carry out a task
- Four classes:
- Desktop conference systems
(e.g. outlining, writing, sketching, spreadsheet)
- System infrastructure for
desktop conferencing
- Electronic meeting and decision
rooms
- Media spaces that include
computer controlled AV networks and virtual meeting environments
- Desktop Conferencing Systems
- NLS shared-screen conferencing
systems 1968 -augment face-to-face communication
- Xerox Parc Colab Project 1987
- Tools for collaborative
brainstorming, argument development, free style sketching
- Workstation-based with large
touch screen in front of room
- WYSIWIS (what you see is what
I see)
- System Infrastructure for
Desktop Publishing
Collaboration transparency
- single user software made available to group
Collaboration aware -
rewritten software for group use
- Electronic Meeting and
Decision Rooms
- Decision support systems - work
in electronic meeting room
- Supports parallel or
sequential activity
- Idea generation
- Idea organization
- Voting
- Media Spaces
- Computer controlled
teleconferencing system where A/V communication and shared digital
workspaces overcome physical separation
- e.g. Hiroshi Ishii - Teamwork
system
- Display of shared digital
workspaces with displays of drawing surfaces and desktop
materials
- Implements seamlessness
between individual and workgroup by overlaying translucent
workgroup by overlaying translucent workspace images-live video
analog images of computer screens and desktop surfaces
- Computer screen overlay is a
shared screen combining windows from individual collaborators
- Creates shared interpersonal
space using small windows displaying live video of one's
collaborator