Based on a presentation by Howard Besser
http://besser.tsoa.nyu.edu/howard/
·
Companies keep information for day, years, or
decades
·
Individuals keep things for years or a
lifetime
·
Archives, Libraries, and museums keep things for hundreds
of years
Cultural Institutions have a much
greater responsibility for preservation!
·
In the past, we
knew about history by finding written documents:
Changes between
different drafts of a scientific or literary paper
Letters and
correspondence between a scientist (or literary figure) and colleagues (that
both helps contextualize the work, and lets us see changes in thought processes
or discovery)
·
But today, these documents are not on paper!
·
They are in the form of:
Email correspondence
Word processing files
that do not show changes between drafts/versions
·
Who will take responsibilityto
save these works for future study?
|
Old |
New |
Physical preservation |
atmospheric control |
ongoing mgmt |
What to save? |
artifact |
idea + ancillary material & documentation |
Cataloging |
Individual work in hand |
FRBR (Functional
Requirements for Bibliographic Records) |
Later access |
Artifact |
Restaging, ancillary material, & documentation |
Conservation of
Electronic Art-
·
How are new works even more problematic than older forms of moving image material?
·
Issues with Digital Preservation
·
Issues with New Works
·
Technical & Conceptual Approaches to solutions
·
Efforts to watch (projects, standards)
Manuscripts,
books, paintings, sculpture
We have a good sense of what the original object is
o Objective is to make object
itself endure (temperature/humidity control,
chemicals/pigments/fibers/adhesives,
)
o Goal is to keep object as
close as possible to original state (though occasionally controversy arises
over whether to let aging show)
Video, audio, digital, new media
o Often difficult to
determine what the original object is
o
Difficult to make the original object endure (magnetic particle
deterioration, warping, etc.)
o Even if we could make the
original object endure, we wouldnt have the infrastructure to view it in the
future
o
Need to develop a paradigm shift from preserving the original object to
preserving info content
o
Need to pay more attention to maintaining authenticity and replicating
user experience
May include
o Moving image materials
o Multimedia
o Interactive programs
(including hypertext novels & games)
o
Computer generated art
o Most electronic art works share some common
characteristics with other strange works like Performance Art, Conceptual
Art, Site-specific installations, Experiential Art
Digital
Longevity Problems:
Dangers from:
o
Info is increasingly inter-related to other info
o
How do we make our own Info persist when it points to and integrates with Info
owned by others?
o What is the boundary of a set of information
(or even of a digital object)?
In the past, much of survival was due to redundancy
o How do we decide what to save?
o Who should save it?
o
How should they save it?-
How to save
information?
o Methods for later access
o
Refreshing
o
Migration
o
Emulation
o Issues of authenticity and evidence
Content translated into new delivery devices changes meaning
Thinking of the Future
o Screens
will be different resolutions and different aspect ratios
o CRTs wont exist
o A decade or 2 from now,
todays user interfaces will look like arrow-key navigation looks like today
o Todays streaming media are
small windows, slow speeds
o As bandwidth increases,
viewers will expect higher quality streams
o Creators may need to
consider how theyll be able to deliver higher-bandwidth
streams
Delivery Derivatives vs. Masters encoded w/standards
May also want to re-edit the piece to take advantage of changes in
technology, viewer expectations, society
o Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn)
o
Digital formats require intense ongoing
management
o Need for:
§
What is the work?
§
Complexity of rich media
§
Difficulty of making the work last
o The installation?
o Documentation
of the Installation?
o The
directions for the Installation?
o What is the goal of our
documentation and preservation?
o Works often have artistic nature
(including video games)
o
Enormous number of elements can, at times, be very important to preserve
(pacing, original artifact, elements used to constructthe artifact)
o Too complex to save every
one of these aspects for every type of
material
o Importance of saving
documentation
o What really is the Work?
o Disappearing
software
o Enormous number of elements
can, at times, be very important to preserve (randomness, interactivity, pacing,
color, format, original artifact, elements used
to construct the artifact)
o Pieces and Boundaries
o Recontextualization (Postmodernism)--which
rendition to save?
o Dynamic
& Lack of
Fixity (evolving works)
o Interactivity
o Historical
context
o Difficulty of authentication
over time
What is attempted to be
done?
o
Show the work the way people saw and interacted with it when it was first
created (may be impossible; in the past, the artifact and how one interacted
with it didnt change much, so preservation and documentation were relatively
straightforward)
o
Show documentation of the work and people interacting with it when it was
first created
o Reinstall/Recreate/Reenact
the work
o Works themselves may no
longer even exist; in many cases, what we can
save amounts to forensic evidence
o Enormous number of elements
can, at times, be very important to preserve
(pacing, original artifact, elements used to construct the artifact)
o Too complex to save every
one of these aspects for every type of material
o Importance
of saving pieces, representations, and documentation
o
Involve the artists to capture their intentions
o
Importance
of Standards
o Familiarize ourselves with
recent conservation developments (Who Knows?, TechArcheology,
Tate, IMAP)
Approaches to Solutions-
Possible endless need
for reformatting implies
o Possible
loss with each
generation
o Requires managed
environment
o Save the Hardware &
Software
o Emulate
o Migrate
o
Refreshing always necessary due to volatility of physical strata
Impact on evidential value
o
Migration -- advantages & disadvantages
o Emulation --
advantages &
disadvantages
o And will need a long-term
managed environment
Wordstar to Word 1 to Word 3,
-Tables and complex features often get corrupted
-Need to repeat every 4-5 years (maybe forever)
+We know how to do this ourselves
+If theres a problem, we can catch it soon
o
Keep the Wordstar file format, but write emulators
to make it work in newer environments
o +A better chance of
carrying over complexity
o +Many more features can
survive
o Problems may not be caught
until its too late
o
Specialists and a whole infrastructure of emulators required
o Serious problems(reverse engineering?)
o More than temperature &
humidity control
o Periodic monitoring of the
works
o Periodic monitoring of the
technical environment for viewing the works (software, systems, hardware)
o Trusted repositories
work
expression
manifestation
item
(group efforts with Cultural Heritage community)
o Matters
in Media Art--New Arts Foundation, MOMA, SFMOMA, Tate
o DOCAM (Documentation and Conservation of the Media Arts
Heritage)
o INCCA (International
Network for the Conservation of Contemporary Art)
o Past
·
Seeing Double Exhibition, & Symposium
·
Variable Media Initiative
·
Artists Interviews Project, Netherlands
Institute for Cultural Heritage 1998-1999, Modern Art: Who Cares
·
TechArcheology: A
Symposium on Installation Preservation (SFMOMA)
·
Special issues raised by non-library institutions
·
Special issues raised by images and rich media
·
What is the work (or salient points we need to preserve)?
·
Bring the arts communities (artist intent, BAVC) together with the
preservation repository communities and the preservation metadata communities
·
Specifically get Cult Heritage communities involved with the selected
OCLC /RLG recommendations
·
Get cult heritage groups started on working to make sure that structure
·
standards incorporate our works
·
What organizations will take responsibility to save todays digital
ephemeral materials (online zines, arts discussion
groups, etc.)?
Risk Management
Best Practices for Reformatting
Preservation Repositories & Metadata
Other Metadata & Standards
·
We cant say definitively that we can make every digital work persist
·
What we CAN say is that the more a digital work conforms to standards and
best practices, the greater the likelihood that we can assure persistance
·
Our preservation repositories can even accept deposits of non -conforming
works, but the less they conform, the less likely that theyll be salvageable
·
Persistance is most likely for works that
share standards, metadata, and best practices
·
Think about users (and potential users), uses, and type of
material/collection
·
Scan at the highest quality that does not exceed the likely potential
users/uses /material
·
Do not let todays delivery limitations influence your scanning file
sizes; understand the difference between digital masters and derivative files
used for delivery
·
Many documents which appear to be bitonal actually
are better represented with greyscale scans
·
Include color bar and ruler in the scan
·
Use objective measurements to determine scanner settings (do NOT attempt
to make the image good on your particular monitor or use image processing to
color correct)
·
Dont use lossy compression
·
Store in a common (standardized) file format
·
Capture as much metadata as is reasonably possible (including metadata
about the scanning process itself)
An Open Archival
Information System (or OAIS) is an archive, consisting of an
organization of people and systems, that have accepted the responsibility for the preservation of
information and to make it available for a Designated Community.
·
The entities
within the OAIS are based on the concept of an information package: a conceptualization of the structure of information as it
moves into, through, and out of the digital archives.
·
An information
package consists of the digital information object that is the focus of
preservation, along with metadata necessary to support its long-term
preservation and access, bound into a single logical package.
The OAIS recognizes three primary types of
information packages:
1.
The Submission Information Package (SIP), is the version of the information package that is
transferred from the Producer to the digital archives when information is
ingested.
2.
The Archival Information Package (
3.
The Dissemination Information Package (DIP)
is the version of the information package delivered to the Consumer in response
to an access request.
A nonprofit, membership,
computer library service and research organization dedicated to the public purposes
of furthering access to the worlds information and reducing information costs
Digital Repository Attributes
·
Administrative responsibility
·
Organizational viability
·
Financial
sustainability
·
Technological suitability
·
System security
·
Procedural
accountability
Selected
Recommendations
·
Policies, Certification processes, Risk management, Persistent ID,
Migration/Emulation experiments
·
Stakeholders meet to decide how to describe what is in a dig repository
·
Examine special properties of particular classes of digital objects
·
Technical standards for exchange and interoperability btwn
repositories
· Develop
projects and case studies
·
Copyright issues
·
Too complex for small institutions to manage
·
Will be done through partnering (small museum with University) or through consortia (museum
association, state-wide organization,
)
·
Archive or museum will direct what is needed,
but digital repository will carry out the actual work (as
defined in SIP/DIP/
PREMIS Data Dictionary for
Preservation Metadata was the first comprehensive specification for
preservation metadata produced from an international, cross-domain consensus-building
process.
Entities:
Digital Object, Intellectual Entity, Event, Agent,
& Rights
Relationships are statements of
association between instances of entities
Semantic Units are the properties of an
entity, and have values
Digital Object = a discrete unit of
information
Files = named and ordered sequence of bytes known by
an operating system
Bitstream = a set of bits
embedded within a file
Representation = the set of files needed for a
"complete and reasonable" rendering of an Intellectual Entity
Intellectual Entity = a coherent set of content
that can be viewed as a single unit
Event = an action involving at least one Object or
Agent known to the repository
Documents actions that modify Digital Objects, records
validity checks, etc.
Objects can be associated with any number of events
Agent = persons, organisations, or programs
associated with preservation events
Not the main focus of the data dictionary
Rights Statements = assertions of rights pertaining to Objects or Agents
WG concentrates on rights and permissions associated with preservation
activities
Relationships:
Relationships between Objects:
Structural relationships, e.g. how files combine to
make up an Intellectual Entity
Derivation relationships, e.g. resulting from format
transformations or replications
Dependency relationships, e.g. when Objects depend on
others, e.g. fonts, DTDs, etc.
1:1 principle
Fixity: Property that a Digital Object has not been changed between two points in time.
·
Synchronicitybetween media/streams
·
Performance Archive & Retrieval Working Group
·
Performing Arts Data Service (PADS)
·
Persistent
IDs-
·
Website management-
·
Technical Imaging Metadata-
·
Structural & Administrative Metadata-
·
Complexity of formats (storage & compression)-
·
Crosswalking Metadata-
o
A crosswalk
is a table that shows equivalent elements (or "fields") in more than
one database. It maps the elements in one metadata scheme to the equivalent
elements in another scheme.
o
Need to separate work ID from work location
o
Becomes a business process issue when one organization maintains the
resource and another organization references it (ie.
licensed from vendors or managed by separate
administrative structures)
Approach for today:
PURLs
(Persistent Uniform Resource Locators)
o Web addresses that act as permanent identifiers in
the face of a dynamic and changing Web infrastructure.
o Instead of resolving directly to Web resources, PURLs provide a level of indirection that allows the
underlying Web addresses of resources to change over time without negatively
affecting systems that depend on them.
o This capability provides continuity of references to
network resources that may migrate from machine to machine for business, social
or technical reasons.
Handles
o The Handle System is a general purpose distributed
information system that provides efficient, extensible, and secure
o It includes an open set of protocols, a namespace,
and a reference implementation of the protocols.
o The protocols enable a distributed computer system to
store identifiers, known as handles, of arbitrary resources and resolve those
handles into the information necessary to locate, access, contact,
authenticate, or otherwise make use of the resources.
o This information can be changed as needed to reflect
the current state of the identified resource without changing its identifier,
thus allowing the name of the item to persist over changes of location and
other related state information.
o The original version of the Handle System technology
was developed with support from the Defense Advanced Research Projects Agency
(DARPA).
HTTP redirects
More
issues with referencing IDs
References for mirror sites
References for back-up sites when
main site is down or bottle-necked
References for off-site copies and
archival copies
Non-proprietary file format
supports 10-bit/pixel
no compression or lossless
compression using non-proprietary CODEC
supports multiple frame rates/frame
sizes
supports time code data in file
supports audio (multichannel)
and video in a single file
o MPEG seems to be only
non-proprietary format
o
o Not enough companies
produce encoders for Motion JPEG 2000
for us to feel comfortable about its long-term sustainability
o Many quality questions
Quality of playback?
Theater experience?
o
Moving images on DVDs becoming interactive; need for more extensive
source materials
o Video installation works
o Net-based works
incorporating moving images
o Images & rich media;
new media and multi-media works
Inter-relationships between parts
For Contemporary Art: What is the Work?
Born digital--need to be kept in
digital form
Video probably; at least soon
Film- Not very soon
A guessing game; we need
more R&D, as well as education
o Library of Congress National
Digital Information Infrastructure & Preservation
o The InterPARES Project: (International Research on
Permanent Authentic Records in Electronic Systems)
o Electronic
Literature Organization
o Virtualization
o ERPANET - Electronic Resource Preservation and Access
Network
o
Emulation
o Open
Emulation Project: Nintendo
o Stella
- Atari 2600 Emulator
o MESS: emulates a
large variety of different systems
o nedlib
o
Producers Intention
o
Physical Characteristics
Structure
Risks
Documentation
Recommendations
Archive-It: a subscription service from the Internet Archive,
allows institutions to build and preserve collections of born digital content.
o Build, manage, & search
own web archive through user-friendly web application
o
No need for technicalexpertise or file-hosting
o
Subscription service of Internet Archive Designed for archives, museums,
libraries, educational institutions, state organizations, individual researchers
More issues with
referencing IDs
o References for mirror sites
o
References for back-up sites when main site is down or bottle-necked
o
References for off-site copies and archival copies
Digital Repository
Traditions & Services require
And all of these require Standards and
Metadata
From the technological point of view
Standards
offer the best hope of overcoming Impediments
o
Easier to maintain a single set of standards over long periods
of time
o
Puts all institutions in the same large boat who will
face obsolescence and migration problems periodically throughout the future
For artistic and other
challenging works:
How
Best to save these works?
o Use Standards wherever
possible
o
Be aggressive about asset mgmt -- saving component parts and ancillary
materials
o
Both creator and Archive should develop an institution-wide plan for
saving electronic works
Refreshing and either migration or emulation Standard encoding schemes
What is the work? And prioritize what needs to be saved Save ancillary
materials and records
What can we do specific
to electronic media?
·
Works themselves may no longer even exist; in many cases, what we can
save amounts to forensic evidence
·
Enormous number of elements can, at times, be very important to preserve
(pacing, original artifact, elements used to construct the artifact)
·
Too complex to save every one of these aspects for every type of material
·
Importance of saving pieces, representations, and documentation
·
Involve creators & curators to capture intentions
·
Importance of Standards
·
Familiarize ourselves with recent conservation developments (Guggenheims
Variable Media, Who Knows?, TechArcheology, Tate,
IMAP)-
Esther Conway, Brian
Matthews, Arif Shaon, Juan Bicarregui,
Catherine Jones, Jim
Woodcock (Univ of
What is software preservation?
Storing a copy of a software product
Enabling its retrieval in the future
Enabling its reconstruction in the
future
Enabling its execution in the future
Not what most software developers and
maintainers do.
Museums and archives:
Either
supporting Hardware
E.g.
Or
in its own right
Chilton
Computing, Multics History Project
Preserving the work
E.g.
research work in Computing Science
Reproducible
Preserving the Data
Preserving
the software is necessary to preserve other data
Keep
the data live and reusable
Handling Legacy
Specialised code from the past which still needs to be used
Usually
seen as a problem!
Collect the source!
A
cultural artifact
a
form of literature (D. Gabriel)
beautiful
programs are works of art (D. Knuth)
A
view into the mind of the designer
intentions,
assumptions, abstractions, mistakes, humor
little
of this gets captured in any written form
This
is the embryonic first 50 years of millennia of software development
The
transition from cave painting to impressionism
A
voluminous source repository can be analyzed to teach us about the evolution of
software engineering
architectural
evolution
data
structure design
use
of algorithms
optimization
(and premature optimization)
locality
of function
information
hiding
coding
style and idioms
defensive
programming styles
software
redundancy
failures
and bugs
module
decomposition
joint
authorship
programming
language use
Collect the binaries!
For
use on restored, reconstructed or simulated old computers
Collect the documentation!
manuals,
notes, papers, email
Collect the stories
interviews,
reminiscences, websites
Adequacy:
How do we know we have captured enough?
Depends
crucially on Preservation Approach
Technical Preservation.
(techno-centric)
Maintain
the original software (binary), within the original operating
environment.
Sometimes
maintain the hardware as well
Emulation (data-centric).
Re-creating
the original operating environment by programming future platforms and operating
systems to emulate the original environment,
so that
software can be preserved in binary and run "as is".
E.g.
British Library
Migration (process-centric).
Transferring
digital information to new platforms before the earlier one becomes obsolete.
Updating
the software code to apply to a new software environment.
Reconfiguration
and recompilation Porting
An
extreme version of migration may involve rewriting the original code from the
specification.
Different
preservation approaches required different significant properties
Use
a notion of Performance to assess adequacy
Test
case suites as tests of adequacy
Three
aspects to the framework:
A Performance Model for software
Determine
what it means to preserve s/w
Retrieve
Reconstruct Replay
Adequacy
of performance of s/w
Model for describing s/w artifacts
As
complex digital objects.
Versions
and variants
Properties for preservation
For
retrieve, reconstruct, replay
·
Testing data
performance to judge adequacy of the software performance.
·
Important to
maintain software test suite to assess preservation of significant properties
of the software.
Adequacy.
A software package (or any digital object) can be
said to perform adequately relative to a particular set of features(significant
properties), if in a particular performance (that is after it has been
subjected to a reconstruction and replay process) it preserves those
significant properties to an acceptable tolerance.
Software Category |
Adequacy Factor(s) |
Scientific Data Processing Software |
The adequacy of the behavior of this type
of software may be measured by: Running
the software to process some pre-specified test input
data
Comparing the output of the test
run with the corresponding pre-specified test result; Checking if the output exceeds the acceptable level of error tolerance for the software.
For example, the NAG Software Library publishes test
cases. |
Games |
The adequacy of the behavior of a game
may be measured by: Comparing its User Interface UI with the screen capture of its original UI. Comparing its performance against some
pre-defined use cases. For example, the completion time of a particular level
can be compared
against the average completion time for that level in the original game. For example, when playing
the emulated version of the 1990s DOS-based computer game
Prince of Persia5,
some of the operations do not always
work on the emulator and the original appearance of the game is also somewhat lost but it is still possible to play the complete
game. |
Programming
Language Compilers |
A compiler may be said to have been preserved adequately, if: it covers
all features of the programming language that
it supports, e.g.
concurrency (i.e. threads), polymorphism, etc. . the application resulting from compiling
its source code (written
in a language supported by the compiler) using the compiler
yields the expected behaviour. For example, some programming languages (e.g.
Fortran, C, C++
etc.), have ISO standards6 which describe the correct behavior of a software written in these languages. These standards
also provide test programs that may be used
to assess the adequacy
of a compiler for rendering all features of the programming
language that it supports |
Word Processor |
The adequacy of a word
processor may
be measured based on its ability
to: render
existing supported word documents with an acceptable level of error tolerance. For example, a word processor may be regarded as adequate as long as it clearly displays the contents (e.g. text,
diagram, etc.) of a word document, even
if some of the features of the document content, such as font color and
size, may have been
rendered incorrectly or even lost completely. enable
editing (e.g. add/change/remove text, change font) and saving
existing word documents
enable creation and saving of new
word documents For example, OpenOffice Word is adequate for viewing and editing word documents originally created
using Microsoft Word with some level of error tolerance (e.g. images do not always
appear as originally intended but
viewable nevertheless). |
·
Provide
a general model of software digital objects Relate each concept in the model
with a set of significant properties
·
For
a different preservation approach, we need different significant properties to
achieve a desired level of performance.
Product
The
whole software object under consideration
Could
be single library module, or very large system (e.g. Linux)
Comes
under one authority (legal control)
Defines
gross functionality
Version
Releases
of the system
Characterizes
by changes in detailed functionality
Variant
Versions
for a particular platform
Characterized
by operating system and environment
Instance
A
particular instance of a particular variant at a particular location
Ownership
An
individual license
Fixed to particular
What to attributes do we need to take
into account?
Functionality
what
it does and what data it depends on
Environment
platform,
operating system, programming language
versions
Dependencies
Compilation dependency
graph
Standard libraries
Other software products
Specialized hardware
Software is a Composite digital object
Collection of modules
Specifications,
Configuration scripts, test suites, documentation
Architecture
Client/server, storage
system, input / output
User interaction
Command line, User
Interface
User model
·
Software
is highly complex with a lot of factors which need to be considered
·
We
need a framework to organize and express software.
Software
creator:
Has
detailed knowledge of the software
Can provide reconstruction
and replay
properties, to make it easier
to maintain software
in short and long term.
Software procurer:
Funds
the software
creator.
Software
user
Repository manager:
Collects
and curates institutions software
References
An
Annotated Bibliography: Approaches to Software Preservation