Based on a presentation by Howard Besser
· Companies keep information for day, years, or decades
· Individuals keep things for years or a lifetime
· Archives, Libraries, and museums keep things for hundreds of years
Cultural Institutions have a much greater responsibility for preservation!
· In the past, we knew about history by finding written documents:
Changes between different drafts of a scientific or literary paper
Letters and correspondence between a scientist (or literary figure) and colleagues (that both helps contextualize the work, and lets us see changes in thought processes or discovery)
· But today, these documents are not on paper!
· They are in the form of:
Word processing files that do not show changes between drafts/versions
· Who will take responsibilityto save these works for future study?
What to save?
idea + ancillary material & documentation
Individual work in hand
FRBR (Functional Requirements for Bibliographic Records)
Restaging, ancillary material, & documentation
Conservation of Electronic Art-
· How are new works even more problematic than older forms of moving image material?
· Issues with Digital Preservation
· Issues with New Works
· Technical & Conceptual Approaches to solutions
· Efforts to watch (projects, standards)
Manuscripts, books, paintings, sculpture
We have a good sense of what the original object is
o Objective is to make object itself endure (temperature/humidity control, chemicals/pigments/fibers/adhesives, )
o Goal is to keep object as close as possible to original state (though occasionally controversy arises over whether to let aging show)
Video, audio, digital, new media
o Often difficult to determine what the original object is
o Difficult to make the original object endure (magnetic particle deterioration, warping, etc.)
o Even if we could make the original object endure, we wouldnt have the infrastructure to view it in the future
o Need to develop a paradigm shift from preserving the original object to preserving info content
o Need to pay more attention to maintaining authenticity and replicating user experience
o Moving image materials
o Interactive programs (including hypertext novels & games)
o Computer generated art
o Most electronic art works share some common characteristics with other strange works like Performance Art, Conceptual Art, Site-specific installations, Experiential Art
Digital Longevity Problems:
o Info is increasingly inter-related to other info
o How do we make our own Info persist when it points to and integrates with Info owned by others?
o What is the boundary of a set of information (or even of a digital object)?
In the past, much of survival was due to redundancy
o How do we decide what to save?
o Who should save it?
o How should they save it?-
How to save information?
o Methods for later access
o Issues of authenticity and evidence
Content translated into new delivery devices changes meaning
Thinking of the Future
o Screens will be different resolutions and different aspect ratios
o CRTs wont exist
o A decade or 2 from now, todays user interfaces will look like arrow-key navigation looks like today
o Todays streaming media are small windows, slow speeds
o As bandwidth increases, viewers will expect higher quality streams
o Creators may need to consider how theyll be able to deliver higher-bandwidth streams
Delivery Derivatives vs. Masters encoded w/standards
May also want to re-edit the piece to take advantage of changes in technology, viewer expectations, society
o Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn)
o Digital formats require intense ongoing management
o Need for:
§ What is the work?
§ Complexity of rich media
§ Difficulty of making the work last
o The installation?
o Documentation of the Installation?
o The directions for the Installation?
o What is the goal of our documentation and preservation?
o Works often have artistic nature (including video games)
o Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to constructthe artifact)
o Too complex to save every one of these aspects for every type of material
o Importance of saving documentation
o What really is the Work?
o Disappearing software
o Enormous number of elements can, at times, be very important to preserve (randomness, interactivity, pacing, color, format, original artifact, elements used to construct the artifact)
o Pieces and Boundaries
o Recontextualization (Postmodernism)--which rendition to save?
o Dynamic & Lack of Fixity (evolving works)
o Historical context
o Difficulty of authentication over time
What is attempted to be done?
o Show the work the way people saw and interacted with it when it was first created (may be impossible; in the past, the artifact and how one interacted with it didnt change much, so preservation and documentation were relatively straightforward)
o Show documentation of the work and people interacting with it when it was first created
o Reinstall/Recreate/Reenact the work
o Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence
o Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
o Too complex to save every one of these aspects for every type of material
o Importance of saving pieces, representations, and documentation
o Involve the artists to capture their intentions
o Importance of Standards
o Familiarize ourselves with recent conservation developments (Who Knows?, TechArcheology, Tate, IMAP)
Approaches to Solutions-
Possible endless need for reformatting implies
o Possible loss with each generation
o Requires managed environment
o Save the Hardware &
o Refreshing always necessary due to volatility of physical strata
Impact on evidential value
o Migration -- advantages & disadvantages
o Emulation -- advantages & disadvantages
o And will need a long-term managed environment
Wordstar to Word 1 to Word 3,
-Tables and complex features often get corrupted
-Need to repeat every 4-5 years (maybe forever)
+We know how to do this ourselves
+If theres a problem, we can catch it soon
o Keep the Wordstar file format, but write emulators to make it work in newer environments
o +A better chance of carrying over complexity
o +Many more features can survive
o Problems may not be caught until its too late
o Specialists and a whole infrastructure of emulators required
o Serious problems(reverse engineering?)
o More than temperature & humidity control
o Periodic monitoring of the works
o Periodic monitoring of the technical environment for viewing the works (software, systems, hardware)
o Trusted repositories
(group efforts with Cultural Heritage community)
o Matters in Media Art--New Arts Foundation, MOMA, SFMOMA, Tate
o DOCAM (Documentation and Conservation of the Media Arts Heritage)
o INCCA (International Network for the Conservation of Contemporary Art)
· Seeing Double Exhibition, & Symposium
· Variable Media Initiative
· Artists Interviews Project, Netherlands Institute for Cultural Heritage 1998-1999, Modern Art: Who Cares
· TechArcheology: A Symposium on Installation Preservation (SFMOMA)
· Special issues raised by non-library institutions
· Special issues raised by images and rich media
· What is the work (or salient points we need to preserve)?
· Bring the arts communities (artist intent, BAVC) together with the preservation repository communities and the preservation metadata communities
· Specifically get Cult Heritage communities involved with the selected OCLC /RLG recommendations
· Get cult heritage groups started on working to make sure that structure
· standards incorporate our works
· What organizations will take responsibility to save todays digital ephemeral materials (online zines, arts discussion groups, etc.)?
Best Practices for Reformatting
Preservation Repositories & Metadata
Other Metadata & Standards
· We cant say definitively that we can make every digital work persist
· What we CAN say is that the more a digital work conforms to standards and best practices, the greater the likelihood that we can assure persistance
· Our preservation repositories can even accept deposits of non -conforming works, but the less they conform, the less likely that theyll be salvageable
· Persistance is most likely for works that share standards, metadata, and best practices
· Think about users (and potential users), uses, and type of material/collection
· Scan at the highest quality that does not exceed the likely potential users/uses /material
· Do not let todays delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery
· Many documents which appear to be bitonal actually are better represented with greyscale scans
· Include color bar and ruler in the scan
· Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)
· Dont use lossy compression
· Store in a common (standardized) file format
· Capture as much metadata as is reasonably possible (including metadata about the scanning process itself)
An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that have accepted the responsibility for the preservation of information and to make it available for a Designated Community.
· The entities within the OAIS are based on the concept of an information package: a conceptualization of the structure of information as it moves into, through, and out of the digital archives.
· An information package consists of the digital information object that is the focus of preservation, along with metadata necessary to support its long-term preservation and access, bound into a single logical package.
The OAIS recognizes three primary types of information packages:
1. The Submission Information Package (SIP), is the version of the information package that is transferred from the Producer to the digital archives when information is ingested.
The Archival Information Package (
3. The Dissemination Information Package (DIP) is the version of the information package delivered to the Consumer in response to an access request.
A nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the worlds information and reducing information costs
Digital Repository Attributes
· Administrative responsibility
· Organizational viability
· Financial sustainability
· Technological suitability
· System security
· Procedural accountability
· Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments
· Stakeholders meet to decide how to describe what is in a dig repository
· Examine special properties of particular classes of digital objects
· Technical standards for exchange and interoperability btwn repositories
· Develop projects and case studies
· Copyright issues
· Too complex for small institutions to manage
· Will be done through partnering (small museum with University) or through consortia (museum association, state-wide organization, )
Archive or museum will direct what is needed,
but digital repository will carry out the actual work (as
defined in SIP/DIP/
PREMIS Data Dictionary for Preservation Metadata was the first comprehensive specification for preservation metadata produced from an international, cross-domain consensus-building process.
Digital Object, Intellectual Entity, Event, Agent, & Rights
Relationships are statements of association between instances of entities
Semantic Units are the properties of an entity, and have values
Digital Object = a discrete unit of information
Files = named and ordered sequence of bytes known by an operating system
Bitstream = a set of bits embedded within a file
Representation = the set of files needed for a "complete and reasonable" rendering of an Intellectual Entity
Intellectual Entity = a coherent set of content that can be viewed as a single unit
Event = an action involving at least one Object or Agent known to the repository
Documents actions that modify Digital Objects, records validity checks, etc.
Objects can be associated with any number of events
Agent = persons, organisations, or programs associated with preservation events
Not the main focus of the data dictionary
Rights Statements = assertions of rights pertaining to Objects or Agents
WG concentrates on rights and permissions associated with preservation activities
Relationships between Objects:
Structural relationships, e.g. how files combine to make up an Intellectual Entity
Derivation relationships, e.g. resulting from format transformations or replications
Dependency relationships, e.g. when Objects depend on others, e.g. fonts, DTDs, etc.
Fixity: Property that a Digital Object has not been changed between two points in time.
· Synchronicitybetween media/streams
· Performance Archive & Retrieval Working Group
· Performing Arts Data Service (PADS)
· Persistent IDs-
· Website management-
· Technical Imaging Metadata-
· Structural & Administrative Metadata-
· Complexity of formats (storage & compression)-
· Crosswalking Metadata-
o A crosswalk is a table that shows equivalent elements (or "fields") in more than one database. It maps the elements in one metadata scheme to the equivalent elements in another scheme.
o Need to separate work ID from work location
o Becomes a business process issue when one organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)
Approach for today:
PURLs (Persistent Uniform Resource Locators)
o Web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure.
o Instead of resolving directly to Web resources, PURLs provide a level of indirection that allows the underlying Web addresses of resources to change over time without negatively affecting systems that depend on them.
o This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons.
o The Handle System is a general purpose distributed
information system that provides efficient, extensible, and secure
o It includes an open set of protocols, a namespace, and a reference implementation of the protocols.
o The protocols enable a distributed computer system to store identifiers, known as handles, of arbitrary resources and resolve those handles into the information necessary to locate, access, contact, authenticate, or otherwise make use of the resources.
o This information can be changed as needed to reflect the current state of the identified resource without changing its identifier, thus allowing the name of the item to persist over changes of location and other related state information.
o The original version of the Handle System technology was developed with support from the Defense Advanced Research Projects Agency (DARPA).
More issues with referencing IDs
References for mirror sites
References for back-up sites when main site is down or bottle-necked
References for off-site copies and archival copies
Non-proprietary file format
no compression or lossless compression using non-proprietary CODEC
supports multiple frame rates/frame sizes
supports time code data in file
supports audio (multichannel) and video in a single file
o MPEG seems to be only non-proprietary format
o Not enough companies produce encoders for Motion JPEG 2000 for us to feel comfortable about its long-term sustainability
o Many quality questions
Quality of playback?
o Moving images on DVDs becoming interactive; need for more extensive source materials
o Video installation works
o Net-based works incorporating moving images
o Images & rich media; new media and multi-media works
Inter-relationships between parts
For Contemporary Art: What is the Work?
Born digital--need to be kept in digital form
Video probably; at least soon
Film- Not very soon
A guessing game; we need more R&D, as well as education
o The InterPARES Project: (International Research on Permanent Authentic Records in Electronic Systems)
o Open Emulation Project: Nintendo
o MESS: emulates a large variety of different systems
o Producers Intention
o Physical Characteristics
Archive-It: a subscription service from the Internet Archive, allows institutions to build and preserve collections of born digital content.
o Build, manage, & search own web archive through user-friendly web application
o No need for technicalexpertise or file-hosting
o Subscription service of Internet Archive Designed for archives, museums, libraries, educational institutions, state organizations, individual researchers
More issues with referencing IDs
o References for mirror sites
o References for back-up sites when main site is down or bottle-necked
o References for off-site copies and archival copies
Digital Repository Traditions & Services require
And all of these require Standards and Metadata
From the technological point of view
Standards offer the best hope of overcoming Impediments
o Easier to maintain a single set of standards over long periods of time
o Puts all institutions in the same large boat who will face obsolescence and migration problems periodically throughout the future
For artistic and other challenging works:
How Best to save these works?
o Use Standards wherever possible
o Be aggressive about asset mgmt -- saving component parts and ancillary materials
o Both creator and Archive should develop an institution-wide plan for saving electronic works
Refreshing and either migration or emulation Standard encoding schemes
What is the work? And prioritize what needs to be saved Save ancillary materials and records
What can we do specific to electronic media?
· Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence
· Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
· Too complex to save every one of these aspects for every type of material
· Importance of saving pieces, representations, and documentation
· Involve creators & curators to capture intentions
· Importance of Standards
· Familiarize ourselves with recent conservation developments (Guggenheims Variable Media, Who Knows?, TechArcheology, Tate, IMAP)-
Esther Conway, Brian Matthews, Arif Shaon, Juan Bicarregui,
Catherine Jones, Jim
Woodcock (Univ of
What is software preservation?
Storing a copy of a software product
Enabling its retrieval in the future
Enabling its reconstruction in the future
Enabling its execution in the future
Not what most software developers and maintainers do.
Museums and archives:
Either supporting Hardware
Or in its own right
Chilton Computing, Multics History Project
Preserving the work
E.g. research work in Computing Science
Preserving the Data
Preserving the software is necessary to preserve other data
Keep the data live and reusable
Specialised code from the past which still needs to be used
Usually seen as a problem!
Collect the source!
A cultural artifact
a form of literature (D. Gabriel)
beautiful programs are works of art (D. Knuth)
A view into the mind of the designer
intentions, assumptions, abstractions, mistakes, humor
little of this gets captured in any written form
This is the embryonic first 50 years of millennia of software development
The transition from cave painting to impressionism
A voluminous source repository can be analyzed to teach us about the evolution of software engineering
data structure design
use of algorithms
optimization (and premature optimization)
locality of function
coding style and idioms
defensive programming styles
failures and bugs
programming language use
Collect the binaries!
For use on restored, reconstructed or simulated old computers
Collect the documentation!
manuals, notes, papers, email
Collect the stories
interviews, reminiscences, websites
Adequacy: How do we know we have captured enough?
Depends crucially on Preservation Approach
Technical Preservation. (techno-centric)
Maintain the original software (binary), within the original operating environment.
Sometimes maintain the hardware as well
Re-creating the original operating environment by programming future platforms and operating systems to emulate the original environment,
so that software can be preserved in binary and run "as is".
E.g. British Library
Transferring digital information to new platforms before the earlier one becomes obsolete.
Updating the software code to apply to a new software environment.
Reconfiguration and recompilation Porting
An extreme version of migration may involve rewriting the original code from the specification.
Different preservation approaches required different significant properties
Use a notion of Performance to assess adequacy
Test case suites as tests of adequacy
Three aspects to the framework:
A Performance Model for software
Determine what it means to preserve s/w
Retrieve Reconstruct Replay
Adequacy of performance of s/w
Model for describing s/w artifacts
As complex digital objects.
Versions and variants
Properties for preservation
For retrieve, reconstruct, replay
· Testing data performance to judge adequacy of the software performance.
· Important to maintain software test suite to assess preservation of significant properties of the software.
A software package (or any digital object) can be said to perform adequately relative to a particular set of features(significant properties), if in a particular performance (that is after it has been subjected to a reconstruction and replay process) it preserves those significant properties to an acceptable tolerance.
Scientific Data Processing Software
The adequacy of the behavior of this type of software may be measured by:
Running the software to process some pre-specified test input data
Comparing the output of the test run with the corresponding pre-specified test result;
Checking if the output exceeds the acceptable level of error tolerance for the software. For example, the NAG Software Library publishes test cases.
The adequacy of the behavior of a game may be measured by:
Comparing its User Interface UI with the screen capture of its original UI.
Comparing its performance against some pre-defined use cases. For example, the completion time of a particular level can be compared against the average completion time for that level in the original game.
For example, when playing the emulated version of the 1990s DOS-based computer game Prince of Persia5, some of the operations do not always work on the emulator and the original
appearance of the game is also somewhat lost but it is still possible to play the complete game.
Programming Language Compilers
A compiler may be said to have been preserved adequately, if:
it covers all features of the programming language that it supports, e.g. concurrency (i.e. threads), polymorphism, etc. .
the application resulting from compiling its source code (written in a language supported by the compiler) using the compiler yields the expected behaviour.
For example, some programming languages (e.g. Fortran, C, C++ etc.), have ISO standards6 which describe the correct behavior of a software written in these languages. These standards also provide test programs that may be used to assess the adequacy of a compiler for rendering all
features of the programming language that it supports
The adequacy of a word processor may be measured based on its ability to:
render existing supported word documents with an acceptable level of error tolerance. For example, a word processor may be regarded as adequate as long as it clearly displays the contents (e.g. text, diagram, etc.) of a word document, even if some of the features of the document content, such as font color and size, may have been rendered incorrectly or even lost completely.
enable editing (e.g. add/change/remove text, change font) and saving existing word documents
enable creation and saving of new word documents
For example, OpenOffice Word is adequate for viewing and editing word documents originally created using Microsoft Word with some level of error tolerance (e.g. images do not always appear as originally intended but viewable nevertheless).
· Provide a general model of software digital objects Relate each concept in the model with a set of significant properties
· For a different preservation approach, we need different significant properties to achieve a desired level of performance.
The whole software object under consideration
Could be single library module, or very large system (e.g. Linux)
Comes under one authority (legal control)
Defines gross functionality
Releases of the system
Characterizes by changes in detailed functionality
Versions for a particular platform
Characterized by operating system and environment
A particular instance of a particular variant at a particular location
An individual license
Fixed to particular
What to attributes do we need to take into account?
what it does and what data it depends on
platform, operating system, programming language
Compilation dependency graph
Other software products
Software is a Composite digital object
Collection of modules
Specifications, Configuration scripts, test suites, documentation
Client/server, storage system, input / output
Command line, User Interface
· Software is highly complex with a lot of factors which need to be considered
· We need a framework to organize and express software.
Has detailed knowledge of the software
Can provide reconstruction and replay properties, to make it easier to maintain software in short and long term.
Funds the software creator.
Collects and curates institutions software