You are on page 1of 9

Chapter 4

Types of Digital Objects

There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy. (William Shakespeare, Hamlet) There are many types of digital objects which we may come across and we need to recognise the extent of their diversity otherwise we will aim too low when we design our tools and techniques for digital preservation. It is impossible to give an exhaustive list of types of digital objects, yet it is useful to remind ourselves of at least some of the great variety that we must be able to deal with. By types we mean not just different formats, but rather different classications. One reason for being interested in the variety of types is that unless one is aware of the distinctions it is very easy to assume that everything is the same and the same tools can be used. For example if one normally deals with the preservation of documents, for example Word or PDF, then one might assume that all digitally encoded information can be preserved using the same tools. Unfortunately this is not true, as we will see. The next sections present a brief overview of some of the distinctions which can be made, without any claim of being exhaustive.

4.1 Simple vs. Composite


One way to classify digital objects is by whether they normally are treated as a whole for example an image such as Fig. 4.1 or whether they are normally treated as a collection of simpler parts, for example a FITS le which has several images and tables, as in Fig. 4.2. The latter we will call Composite Objects (or sometime Complex Objects). It is important to make this distinction because if we can break the preservation challenge of a composite object into smaller components then it will make the preservation task easier. On the other hand if we treat the composite object as if it were a simple one then we could run into a great deal of trouble in future.

D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_4, C Springer-Verlag Berlin Heidelberg 2011

31

32 Fig. 4.1 A simple image face.jpg

4 Types of Digital Objects

Fig. 4.2 FITS le as a composite object

Header Image 1 Table 1

Image 2

Table 2

However it is never completely clear cut because whether a digital object is simple or composite often depends upon the eye of the beholder. Nevertheless this is a useful distinction to draw.

A Word document may normally be treated a simple object. In actual fact it is, internally, very complex, containing information about styles and page layout etc. However one normally disregards this because the software we use deals with the Word le as a whole. On the other hand some Word les have embedded

4.2

Rendered vs. Non-rendered

33

spreadsheets and drawing objects which can be edited separately; in this case one might often treat such an object as a collection of parts. The FITS le (Fig. 4.2) is a whole digital object but the analysis is normally done on a component by component basis. In other words Image 1 is displayed and processed, and then the same thing, or something different, is done with Image 2. A particular format may allow many possibilities, and such formats may evolve and increase in complexity over time [20]. The original FITS format allowed only simple images; the current denition allows much greater complexity but can still contain a single image if that is what is wanted. Thus we need to be concerned with the particular digital object, not the format, when we look at whether it is simple or composite. Further details for FITS are given in Sect. 7.3.2.1. In some ways one can regard a composite object as a container of simpler things, as with the Word example above, and may be represented in general as in Fig. 4.3.
Fig. 4.3 Composite object as a container

4.2 Rendered vs. Non-rendered


Another way to divide the digital world is as follows. There are digital objects which are usually processed by some software to produce a rendering which is presented to a human user who can then interpret what he/she sees/hears/feels/tastes, and this is normally regarded as adequate. This can include documents, pictures, videos and sounds. These we will refer to as Rendered Digital Objects.

34

4 Types of Digital Objects

On the other hand one can have a digital object for which it is not enough to simply render it but for which one needs to know what the contents mean in order to be able to further process it. It is useful to make this distinction because it is easy to think that every digital object is simply rendered; that every digital object need only be displayed. Indeed one could argue that the ultimate user of a digital object is a human who needs to see or hear (or perhaps in future to feel, taste or smell) the result. For example even a FITS image is (often) displayed. However displaying a FITS image is rarely the ultimate aim. Instead an astronomer might want to make measurements which require an understanding of the units and coordinate systems. He/she might also reasonably want to combine this piece of data with another. In other words what is wanted is to do more than render it in one particular way; instead there is an enormous variety of ways users may want to deal with the object. When we are thinking about digital preservation one must look to the future not in order to guess what it may to be but rather to recognise that it may be different from today. Therefore we need to identify what someone at least the Designated Community needs in order to understand and use a non-rendered object digital object in any number of different ways. For example consider two text les. In one case one can have some English text, say a recipe for a cake in a le recipe.txt (see Fig. 4.4). Using a Windows PC the le is easily readable because the .txt part of the name lets the machine try an application which can display an ASCII encoded le which is what this is. Normally one would say that no special knowledge is needed to understand this it simply needs to be read. However there is a requirement to be able to read English and also to know what the various measures are (for example what size is a cup?) and also to know what the ingredients are (for example what is lemon zest?); without such knowledge the recipe is neither understandable nor usable.

Take 2 eggs Add 3 cups of gram flour Add 2 tsp lemon zest ......

Fig. 4.4 Text le recipe.txt

4.2

Rendered vs. Non-rendered

35

Consider now another text le (table.txt) which, as a simple .txt le is easily readable on a PC again the .txt usually lets us guess, correctly in this case, that this is an ASCII encoded le. In this case we are more obviously in some trouble because although we can see something which we can reasonably assume are numbers, we do not know what the numbers mean. If we are told that the numbers under the headings X, Y and Z provide us with the sides of a rectangular cuboid, then we can calculate the volume of that shape using the formula X Y Z for each row, namely 14.742. 31.8 and 114.034. On the other hand we might be told that X is the longitude on Earth, Y the latitude, both measured in degrees and Z is the concentration of a certain chemical in parts per billion. We see that the format alone is insufcient; one needs to know what the contents (e.g. the numbers) mean.

By Non-Rendered Digital Object we mean things which, like table.txt, are not simply rendered but rather are to be processed to produce any number of possible outputs. For example table.txt could be plotted, displayed as a pie-chart or histogram. Alternatively the information in the columns of table.txt could be used to calculate the density of chlorophyll in the Amazon rain forest (if that is the sort of information there is in table.txt). As another example one can take a digital object from the GOME instrument [21], which might be as shown in Figs. 4.5, 4.6, and 4.7.

Fig. 4.5 GOME data binary

Fig. 4.6 GOME data as numbers/characters

36

4 Types of Digital Objects

Fig. 4.7 GOME data processed to show ozone data with particular projection

We can also have two les of the same format, say a sound le such as MP3, the rst of which (music.mpg) is indeed something that can be used to play music, but a second, also an MP3 le (cong.mpg), which contains numbers which are conguration parameters for setting up some software. If we click on the rst on a home computer then it will play some music because the .mpg causes the computer try to use a music application. Clicking on the second will cause the computer to try to use that same application but it may produce only a brief grating sound, or perhaps nothing audible at all. The important points are that we currently rely on many clues, such as having a le ending .txt or .mpg which many computers use to choose an application for displaying or playing the le. On the other hand, even now these clues are insufcient, as with table.txt (Fig. 4.8). Of course computers are not intelligent in fact they have been instructed which applications to use for which le extensions, for example Notepad for les with

X 1.3, 2.4, 7.4,

Y 2.7, 5.3, 2.3,

Z 4.2 2.5 6.7

Fig. 4.8 Text le table.txt

4.2

Rendered vs. Non-rendered

37

names ending in .txt. Sometimes this does not do what is expected, as with cong.mpg. In other cases we can do something with the le but not very much, as with table.txt. Some others mentioned in the introduction, such as family photographs (face.jpg, Fig. 4.1) are very similar in that what one expects is to display or play contents of the le and then it is up to the viewer, or listener, to understand it. Of course one is not listening to the bits what we mean is that there is an application which is used to convert the bits to an image or a sound. The application may also allow one to zoom in to part of an image or search for a piece of text or copy a piece of music and insert it in a separate le. But even without these extra functions, one can make use of the le, by which we mean we can look at or hear the output of the application and we would be quite happy if that was all we could do. These type of les lets use the term Digital Object as a more general term instead of le- we will refer to as Rendered Digital Objects. For these types of objects it is (currently) normally regarded as sufcient if in future one can simply display it if it is an image or movie, or play it if it is a sound. These are the types of digital objects which one commonly deals with in everyday life, documents, images, web pages etc. There are many books which talk about the preservation of these kinds of objects: word processor documents nancial les spreadsheets databases of various sorts .....

Throughout this book we will also look at examples from a variety of disciplines including science, cultural heritage and contemporary performing arts. Science Observations of the Earth from space, including multi-spectral images, synthetic aperture radar images Measurements of the atmosphere, chemical or electrical composition Software for processing raw date to data which is scientically useful Cultural Heritage Laser scans of buildings and artefacts Plans of buildings 3-D virtual reality models Performing Arts patch le for processing what the performer plays conguration le which map video capture of movement to musical performance. All the above are just some of the example of non-rendered data which are of importance to society.

38

4 Types of Digital Objects

4.3 Static vs. Dynamic


Digital objects do (usually) need software and hardware to extract information from the bits as discussed in Sect. 1.1. Static objects are ones which, unless they are transformed, are unchanged as bit sequences. These we will refer to as Static Digital Objects. On the other hand we can think about database les which naturally change over time as entries are changed. Alternatively we can consider a whole collection of les as the data object. Such a collection might change as additional les are added to the collection over time. Such digital objects we will refer to as Dynamic Digital Objects. Of course at any particular time the Dynamic Digital Object is a particular Static Digital Object which we may preserve. On the other hand it may be of interest, in the case of a Dynamic Digital Object, to know what the state of the object was at any particular time. In fact some would argue that most datasets change over time and the state at each particular moment in time may be important. This is an important area requiring further research; however from the point of view in this book it may be useful to break the issue into separate parts. At each moment in time we could, in principle, take a snapshot and store it. That snapshot has its associated Representation Network. Efcient storage of a series of snapshots may lead one to store differences or include time tags in the data. Additional Representation Information would be needed which describes how to get to a particular times snapshot from the efciently encoded version.

4.4 Active vs. Passive


One other useful distinction is between what may be called active and passive digital objects. By Passive Digital Object we mean something with which things are done, for example used by other applications (software) to do something. For example a document le is used by a word processing programme to print the document or display it on the screen, or an astronomical image in a FITS le would be used by astronomical analysis software to do scientic research. Such digital objects are often referred to as data but since the term Data Object is already used by OAIS we prefer the term Passive Digital Object. An Active Digital Object on the other hand does something. For example the word processing application or the astronomical analysis software mentioned in the previous paragraph might be the digital objects to be preserved. Once again there will always be fuzzy boundaries, so one could consider an Access[TM] database as a Passive Digital Object used by the Access software but it could easily itself contain software (for example some form of BASIC) which would mean that it could be considered to be an Active Digital Object.

4.6

Summary

39

4.5 Multiple-Classications
The classications are not mutually exclusive, and in fact one can think of a simplerendered-static-passive object the image face.jpg is an example of this. One can also have a composite-non-rendered-dynamic-active object such as a database with built in queries into which new rows are being inserted. The Word.exe executable le may be thought of as a composite-non-rendered-static-active object. Figure 4.9 shows a representation of multiple classications although we are limited to drawing in 3-dimensions!
Fig. 4.9 Types of digital objects
Complex Simple Static Dynamic

Sim
Static Dynamic

ple

Co

mp

lex

Rendered

Re

e re nd

Non Rendered

nN o re d e end R

4.6 Summary
The purpose of this chapter has been to provide a partial view of the variety of types of digital objects which exist in the wild and which one might be required to preserve. The reason has been to ensure that the reader can at least recognise the possibilities when confronted with the challenge of preserving a digital object. Later chapters will discuss preservation techniques for some of this multitude of possibilities.

You might also like