This action might not be possible to undo. Are you sure you want to continue?
Robert Gil/esse, Judith Rag and Astrld Verheusen; National Library of the NetherlandslKon/nklljke Bibllotheek; The Netherlands
The TIFF uncompressed file format is a widely accepted standard/or storing master images 0/ digitlsation projects. As in the Koninklijke Bibiiotheek, The National Ubrmy of the Netherlands (KlJ) the scale 0/ these projects is rising rapidly, the need for an alternative, compressed file format is fell. This paper contains a swnmalY 0/ a research project in which four alternative file formats - JPEG 2000 Part J lossless and lossy, PNG. JPEG and TIFF LZW - have been described and tested. Four consequences of a choice for either of the formats have been described: consequences for storage, image quality, long term durability and functionality. In the final recommendations these consequences were weighed against three reasons the KlJ distinguishes for wanting to store master images: I. Substitution (JPEG 2000 Part I lossless or PNG) 2. Redigitalisation is no option (Visual lossless JPEG 2000 Part I lossy or JPEG) 3. Master file is the basis for access (JPEG and JPEG 2000 lossy \Vith higher degrees of compression)
formats for the storage of master image files. This has resulted in the final report Alternative File Formats for Storing Master Images of Digitisatiofl Projects in March 2008 [2). An earlier version of the report was reviewed by II large, international group of digital preservation- digitisation- and image pecialists and their comments have been included in the final version . Although a need for further research was felt, the KB bas, on the basis of the conclusions of this report, decided to stop creating and storing uncompressed TIFF files. For new digitisation projects depending on the reason for storing master Jiles (see below), JPEG 2000 Part. I or JPEG - will be used. This paper contains II summary of the report Alternative File Formats for Storing Masler images of Digitisation Project.
Oefiniti·on and Exclusions
Master images are defined as followed: Raster images that are a nigh quality (in either colour, tonality or resolution) copy from the original source from which in most cases derivate-s are made for access purposes. The following images were excluded from the scope of the study: • Vector images • 3D images • Moving images • Images in various editing layers (not identical to multiresolurion images) • Multipage files (PDF, multipage TIFF are dropped from consideration) • Multispectral, byperspectral images A further exclusion was the focus 00 digitised low contrast material (e.g. older printed text, engravings photographs and paintings). Higher contrast materials - read: (relatively) modern, non illustrated printed material - were out of the scope of this study.
Introduction/Backg rou nd
The storage of image master files for digirisation projects as uncompressed TIFFs has become a good custom, but also II truism, and even a cliche. As mass digitisation projects are taking off in the Koninklijke Bibliotheek, The National library of the Netherlands (KB), a need for a new strategy for storage of images has emerged. Millions of high resolution, RGB master image files will be made and will have to be (permanently) stored. If all these images - 40 million in total - are stored as unccrnpressed TIFFs, the estimate is that the KB will need no less than 650 TB of storage space by 2011. A capacity challenge indeed' and one that can only be mastered with a sound storage strategy. Currently, the main question the KB faces revolves around necessity: do all master image files really need to be stored as uncornpressed TiFF files and added to our long term repository, the e-Depot? Can we not make a distinction between digitisation projects in which images have to be stored for "eternity" (as originals are swiftly decaying and/or costs of digitisation have been so high that scanning anew is not an option) and projects which focus on access, thus all owing more pragmatic and economical storage choices? In her lS&T Archiving 2007 presentation. author Judith Rog reevaluated the possibility of using compression on master images files. She concluded that in certain cases compression might be used, although further research and testing was needed [ I]. In October 2007, Judith Rag and Robert Gillesse of the KB began a research project focusing on finding alternative file
Four Alternative Master Image Formats
In the research project fOUT alternative selected: I. !PEG 2000 Part I (loss less and lossy) 2. PNG 1.2 3. Basic JFlF 1.02 (JPEG) file formats were
precisely these four file KB requirements for all.
The arguments for selecting formats resided in the following alternative master file:
Archiving 2008 Final Program and Proceedings
PNG • • Standardization: PNG 1. clarity. Encoding: A five-step process. Consequences for necessary storage capacity: how much storage space is saved in. Consequences for functionality: what are the possibilities for multi-resolution.• • • Software support (very new or rarely used formats such as Windows Media Photo/JPEG'XR and fl!EG-LS are dropped from consideration). when other. four consequences of a choice for either format were outlined: 1. broadly accepted. the standardization process (ISO or otherwise). Robustness and Dependencies. 2. Encoding: Asix-step process .has become the de faCIO standard and is simply designated as lPEG. Objective: Follow up of the patented and limited GLF format. Consequences for long term durability: for this purpose.was used. The most conspicuous is wavelet transformation (step J) and packetizing (step 6) whereby the codestream is divided into packets and is sorted by either resolution. limited colour palette). the four file formats were first described in genera! terms. 3. or in other words: the master filets identical to the access file. Structure: Topic of investigation. high-quality carrier . Digitisation has been so costly and time consuming that redigitisation is no option. Next. durability criteria: Openness. Objective: To create a standard for the compression of continuous tone greyscale and colour images. digitisation will have to be performed again. TlFF G4/JBlG flies are dropped from consideration. 8 bits. 6. Structure: The basis is a box structure which stores both the header as well as image information. Encoding: A six-step process. and the encoding and decoding process. concentrating on the history of the format. color maintenance. A comparison between the four file formats Description of Formats JPEG 1000 Pari 1 Format descrition and consequences In tbe research project. A full description can be found at http://www. This method 'weighs' file formats 00 the basis of seven. • • JPEG • Standardization: The JPEG standard has been ISO/IEC (10918-1) standardized since 1994. quality. transparency.. lossless compression and expansion of the standard.org/jpeg20001]. • • • 42 Society for Imaging Science and Technology . Consequences for image quality: when using lossy compression. Every file format receives a durability score (0-100). measuring MTF . asweU as GIF due to. Other parts were standardized as well [The JPEG 2000 standard consists of twelve parts. an option for Lossless compression and multiresolution. The KB distinguished three main reason for wanting to stor-e the master files for a long or even indefinite period: 5. which store botb the header as well as image information. In that case. the These three reasons form the basis 0[1 which recommendations for the different file formats were made. I bit.jpeg. Sufficient bit depth: Amiaimum of 8 bits greyseale or 24 bits colour (bitonal.is not available). An extension of Annex B of the standard . 7. Three reasons for long term storage of Master Image Files As said above the master file is a high quality copy from the original source from which in most cases derivates are made for access purposes. The master file is the basis for access. image degradation was evaluated by. among other things. 4. a recent quantifiable method for file format risk assessment= developed by the. Possibility for lossless or high-end lossy compression (EMF excluded) . and the Library of COD gress' quality and functionality factors for still images (normal rendering. The working of.2 has been ISO/lEC standardized since 2003.KB . more demanding u e of the files is needed. support of graphical effects and typography and functionality beyond norma] rendering) . effects of successive compression.preservation microfilm . with II wealth of options as regards progressive structure. and the considerations behind. the structure. Substitution (tbe original is su ceptible to deterioration and another alternative. simple and clear usage. It is possible to delete the master files after the access derivates have been made. Complexity. Most noteworthy is the use of the DCT compression technique. and gradations of lossy compression were all taken into account. Selfdocumentation. Objective: Offer alternatives for the limited JPEG/IFIF format by using more efficient compression techniques. Adoption. • • • • Standardization: JPEG 2000 Part 1 has been standardized since 2000 ISO/lEC.JFIF . technical and descriptive metadata. colour or position.comparison to uncompressed TIFF storage? Lossless and lossy compression. What is notable is the option to apply separate filtering per scan line (thus increasing the effectiveness of the compression. the File Format Assessment Method are given in Appendix three of the final report . Technical Protection Mechanism (DRM). Structure: Chunks are the basis.
However. future research. Based on the experiences gained in the research project it appears necessary to adapt the method.sorne loss of detail occurs only with extreme compression.0 is not an [SO-lEe standard. the characteristic "Usage in the cultural heritage se-ctor as master image file" of the Adoption criterion makes a valuable contribution to the total score. no loss of image quality occurs with the lossless formats JPEG 2000 Part I lossless. Fonnat Baseline TIFF 6.0 with LZW comoresslon Score 84. Feedback is being awaited fromcolleague institutions regarding this method" Additionally.0 uncompressed" is the safest one.0 uncornnresssd PNG1.02 TIFF 6.0 (1992) is freely available on the Adobe website.are the alternatives.0 74. we see that "PNG 1.3 JPEG 2000 Part 1 lossless JPEG ZOOO Part 1 lossy PNG . This is caused by the quantification step in the encoding process. • Only two sers of (about 100) originals have been tested: a set of low contrast text material and a set of photographs. Consequences for the Storage Capacity On the storage test two limitations have been placed: • Only 24 bit.2" and "JP2 (JPEG 2000 Part I) lossless'' -both lossless compressed formats . ln JPEG 2000. Here we reach a point where the applied method may fall short. In the method. A remaining topic of investigation is the expression of PSNR (Peak Signal-to-Noise Ratio) which gives an objective figure (expressed in dB) for the degradation that occurs during lossy compression. • No measurable loss of greyscale and colour (colour shift and Delta E) is observed for both JPEG and JPEG 2000. textual material will yield higher compression profits .4 65. results in the following order from most to least suitable for long-term storage: Rankfna 1 2 3 4 . PNO and TIfF LZW. what is .110tincluded in the method at the moment are the prospects for the future of this criterion. Consequences tor the Long. ROB (8 bit per colour channel) files have been tested. If an alternative format has to be selected. • The detail reproduction of JPEG degrades gradually when compression increases.. LZW compression has been a part of the (extended) TlF F standard since version 5. In appendix 2 of the final study the above scores are further specified. The results of the method will be tested against previous knowledge and experiences. JPEG is not really much inferior 1:0 lossy JPEG 2000 compression other than that compression artefacts occur earlier than with JPEG 2000 (see below). the choice for "Baseline HFF 6.Term Sustainability Application of the previously discussed File Format Assessment Method to the image formats discussed in th. It is therefore too early toentirely ascribe the choice ofa durable format to this method. JPEG 2000 Part l is obviously the most effective for lossless and lossy compression. The following artefacts become visible with mounting compression: o Banding (rough colour or tone transitions) o Visible tiles (the tiles into whicb the files are divided become visible) o Woolly effect around elements rich in contrast. The 'Pile Former Assessment Method" is still Inils infancy .Iossless JPEG lossy TIFF lZ)IIJ lossless 5 6 Between the two sets of originals no obvious differences in storage gain were found.!S report.0 (1988). • LZW is the compression algorithm embedded within the nFF file .0 uncompressed" is the safest one from the perspective of long-term sustainability.• TlFFLZW • Standardization: The baseline TiFF 6. Consequences for Image Quality Naturally. with increasing compression excessive "simplification" of the colour subtleties occurs which in the most extreme case results in unnatural tone and colour transitions (banding). The main thing is that from the perspective of long-term sustainability the choice for "Baseline TIFF 6. • Objective: Creation of a rich and extensible lite format for raster images. FUe Format Storage Gain Compared to the Uncompressed TIFF File 52% Variable between 91% and 98% 43% Variable between 89% and 96% 30% The artefacts that occur with increasing compression in JPEO 2000 and JPEG resemble each other a lot.2 JPZ (JPEG 2000 Part 1) lossless • JP2 (JPEG 2000 Part 1) lossv Basic JFIF (JPEG) 1.7 66. plus the uncompressed TIFF format that has been used until now. • Structure: The basis of the file format is formed by the socalled tags located both in the header (lFH) and in the image file directories (IFD). The lossy formats JPEG 2000 Part I lossy and JPEG degrade when compression levels are rising.8 78. However. Is it clear however that high contrast. What is important to note is that the visibility of these artefacts occurs much earlier in JPSG than in JPEG 2000. The description of the baseline TlFf 6.1 65. However.this is part of further. there is not much experience with the application of this method in practice. As the above table indicates. In practice it appears that this is not a viable option due to the large size of the filesand the associated high storage" costs. Although neither format is currently used on a large scale as a preservation master Archiving 2008 Finol Progrom ond Proceedings 43 .
. greyscale. PNG: 1 to 16 bits per chan nel. sorted from most to least suitable: I. If a lossy compression method is selected nevertheless. YCbCR. you might consider the use of alternative "display" hardware with a better resolution or different scope. JPEG. 4.02" is recommended due to the more certain future of this format as compared to the lossy JPEG 2000 Part J variant. Standard support of colour spaces Option to use ICC profiles Multipage support JPEG 2000 Part 1: unlimited (21164). RGB. P G has been in existence since 1996 and JP2 since 2000. TIFF LZW JPEG 2000 Part 1. for an alternative image format for unoompressed TtFFs comes down to the following list. greyscale. the use of "basic JFIF (JPEG) 1. PNG. PNG JPEG 2000 Part 1. CMYK. JPEG: Topic of investigation.02 JP2 (JPEG 2000 Part I) lossy TIFF 6. TIFF LZW JPEG. to a very slight degree: JPEG JP2 (JPEG 2000 Part I) lossless PNG L. Consequences for the Functionality (for master storage) are Only the most relevant functions listed in the table below. greyscale.to consider the use of the digitalized material in the long term. palletized/indexed colour space JPEG: greyscale. JPEG: 8 bits per channel. JPEG. PNG. JPEG 2000 has more potential. TIFF LZW JPEG. This certainly applies when the objective of digitisation is to replace the original (objective I. From a long-term sustainability perspective the use of lossy compression algorithms is discouraged . substitution). Compliance class 2: 16 bits per channel. The preference. TIFF LZW: 4 GB JPEG 2000: 1 to max.2 Basic JFIF (JPEG) 1. palletizedfindexed colour space PNG: bitonal. rendered exclusively from the perspective of Long-term sustainability and the File Format Assessment Method. Although a file that is a qualitatively worse representation of the original can also be stored for tbe long-term. 2.0 with LZW compression Bit depths 3. RGB TIFF LZW: Bitonal. What must be considered in this respect is that a loss of quality which may be deemed acceptable today may no longer be acceptable in the future. Another issue that is neglected by the method is the loss of image quality caused by applying lossy compression methods. s. PNG. sRGB.file in the cultural sector. PNG: Topic of investigation. 38 bits per channel. For example. TIFF LZW. TIFF LZW: 1 to 16 bits per channel (theoretically to 32 bits per channel) JPEG 2000 Part 1: bitonal. it is importantcertainly if the original cannot be rescanned . JPEG JPEG 2000 Part 1 JPEG 2000 Part 1. for lossless formats. Functionality Lossless compression option Lossy compression option Lossy and loss less compression option Option to add bibliographic metadata Standard way to embed EXIF metadata Browser support Multiresolution options (suitability of the file as a high-resolution access master) Maximum size File Format JPEG 2000 Part 1. CIEL*a*b JPEG2000 Part 1. The ultimate advice. TIFF LZW (although not in a standard manner) TIFF LZW 44 Society for Imaging Science and Technology . is thus for JPEG 2000. sRGB.
1] Judith Reg. closely followed by PNG are the most obvious choices from the perspective of long-term sustainabiliry. The creation of visual lossless images might be considered though. including browsers. Other compression schemes that can be used within the TifF file are ITU_ G4. The above mentioned JPEG 2000 lossy and JPEG visual lossless versions are the obvious choices.sbtml. Judith Rog and Robert Gillesse. Compression and digital preservation. The small amount of information loss can be defended more easily in this case because there is no ubstirution. JPEG 2000 Part I lossless. "Standardisation" and "[mage Quality" are considered the most important when sub tirution of the original is the main reason for the long-term storage of the master file.gov/fonnatsfcontenVstill_qua Reason 1: Substitution The criteria "Long-term sustainability". The two options are then JPEG 2000 Part I lossy and JPEG with a higher level of compression. then the above-mentioned JPEG 2000 lossless and PNG formats are the two recommended options. Compression artefacts will be more visible in text files than in continuous tone originals such as photos.it is    (5)    liry. When selecting the amount of compression. A big advantage of the JPEG file format is the enormous distribution and the comprehensive software support. for example. ht!p:llwww.pdf. storage gain about 90%) and lPEG (PSDlO and higher. Reason 3: Master File is Access File 5 23 " 5 4 20 3 18 4 16 21 It is noteworthy that JPEG 2000 comes out on top in both the lossless as well as the lossy versions. TIFF with loss less ZlP compression was dropped from the study out of sheer shortage of time. However.kb. Archiving 2008 Final Program and Proceedings 45 . the question is whether the more efficient compression and extra options of JPEG 2000 outweighs the JPEG format for this purpose. resulting in a low score on the "Restrictions on the interpretation of the file format" characteristic). The lossless TIFF LZW is not a viable option due to the slight storage gain (30%) and the low score in the File Format Assessment Method (especially due to patents. JPEG 2000 los less 53%) and the functionality are Factored in. htip:llwww_dighalpreservalion.G lossy TIFF LZW 108S- S 2 S 4 5 4 3 3 less S 1 S 1 The criteria "Storage savings" and "Image Quality" are considered the most important when the main reason for the long-term storage of the master files is not wanting to do redigitisation. lossy compression is a much less obvious choice for this objective. the scale tips in favour of JPEG 2000 lossless. In this case lossy compression. The table above does not make a distinction between the three reasons for the long-term storage of master files as mentioned in the introduction. Judith Rog. more viable option. References [. Reason 2: Redigitisatlon Is Not Desirable Standardlzation Storage Savings Image Quality Longtenn Sustalnabllity Functlonallty Score JP2 part 1 lossless 5 3 5 S JP2 Part 1 lossy 5 5 4 2 PNG lossles8 JPE. Alternative File Formats for Storing Master Images of Digihsation Projects (K8 2008) pg 10. In the recommendations below the importance of each of the five criteria are taken into account. When the storage savings (PNG 40%.nllhrd/dd/dd _links_ en_publicatieslpublica!ies/ Alter native%20File%20Fonnats%20for"io20SLoring%20Masters%202% 201. do they go together? (IS&T Archiving 2007 Final program and proceedings. In this case a larger degree of lossy compression is self-evident. which is comprehensively supported by software (including browsers) sud is widely distributed. Caroline van Wijk.Summary The table below summarizes all the above information in a matrix. Both IPEG 2000 Part I (compression ratio la. However. Due to the irreversible loss of image information. based on the current generation of monitors and the subjective experience of individual viewers. Output is in reproduced line pairs (or cycles) per millimeter. storage savings about 89%) offer options in this respect.kb.pdfllpage=50 . Recommendations The recommendations for a an alternative master image follow the three reasons for long term storage of these files described above. in the visual lossless mode. MTF (Modulation Transfer Function) is a measurement of detail reproduction of an optical system. 2007) pg 80-83. is 8. JPEG and ZW.:nl/hrd/ddldd_links_en_publicatieslpublicatiesiAlter naliveo/020File%20Formats%20for"1020Storing%20Masters%202% 201. if absolutely 00 image information may be lost. ln the latter case it must be understood that visual lossiessis a relative term . The figures only indicate the order of success in the various parts. Evaluating File Formats for Longterm Preservation (KB 2007). The final report is published on the KB web ite: htrp:llwww. The advanced JPEG 2000 compression technique enables more storage reduction without much loss of quality (superior to JPEG). The criteria • Storage savings" and '<Functionality" are considered the most important when access is the main reason for the long-term storage of the master file. Some of the criteria on the left hand side of the table are less relevant depending on these reasons. me type of material must be taken into account.