Revision IMS2603 Information Management in Organisations Lecture 19 Media Formats Last week s lectures looked at MARC as a specific instance of complex metadata representation and at Content Management Systems as systems that both generate metadata and rely on metadata for Administrative Retrieval Identification and Description of objects. 2 This week Outline The two lectures this week raise issues about different formats used for [analog and] computer representation of media and long-term preservation of materials, both analog and digital. Hard copy Bits n Bytes ASCII as coded characters and things Proprietary or Open Source Formats? Different formats - Word Processing & Spreadsheet wars PostScript and PDF Presentation formats [overhead projection images] Graphics images Moving images Sound files 3 4 Hard copy Bits n Bytes Generally geared to humans Bound and unbound Consider newspaper vs finest bound book Microformats, such as film and fiche Character sets needed for moveable character fonts Images are an issue? Specialist formats, such as ledgers, diaries, dictionaries, poetry, etc., etc. Inside the processes of computers all processes are carried out in machine, binary, non-human readable, coding that is a representation of other values, depending on the application 5 6 1
What is a computer? Two questions for discussion. Characteristics of Computer -> File types What are the key characteristics of a computer? What are the implications of those characteristics for file formats, for example? General purpose Less efficient than single purpose Needs an operating system to even start to do almost anything Software needed to make it do something useful. [but the applications are often pretty general purpose] File formats are linked to applications to try to gain efficiencies 7 8 ASCII as coded characters and things ASCII extended One of the earliest coding schemes. 8 bit encoding The first 128 characters were established as a coding set. Roman Characters [U/C & L/C], Arabic numerals. And a set of characters for primitive formatting [TAB, LF, CR, FF], handshaking [ACK, NAK, SYN], and cursor control [Left,Right,Up, Down] Even a sound [ASCII 7 = BELL] Despite the name, not a standard. i.e. not all devices used the codes for the same characters. Not a logical extension set to the first 128 characters Used for simple graphics [200 =, 210 = ], common European language symbols [e.g. 129 = ü, 147 = ô] Some maths symbols [172=¼, 227 = π] and Currency symbols [156 =, 157 = ] 9 10 Proprietary or Open Source Formats? Different formats Alternatives to public standards Proprietary formats may or may not be supported long-term. Open source formats may be subject to change, to uncontrolled development. There have been strongly competitive different formats at different times in the history of computing. MS Word vs WordPerfect was classic. Spreadsheet formats were another example. Whatever happened to Lotus 123? Is opening the format a way to actually beat the competiton? 11 12 2
PostScript and PDF PostScript and PDF Print-ready formats Adobe proprietary formats PS was/is common in UNIX environments application part of O/S. GhostScript application available in PC environment. PDF extremely common format now because the reader was made available for free? Is that a marketing ploy for the writing software? Rights and packages such as FinePrint PDF Factory. Long term future for PDF. 13 14 Presentation formats Graphics images How overhead slides are delivered. PowerPoint is now pretty ubiquitous. BMP Bit-Mapped image [Microsoft proprietary] TIFF Tagged Image File Format [Adobe] GIF [Graphics Interchange Format] is proprietary to Compuserve. JPEG [Joint Photographic Experts Group] is a proprietary format PNG [Portable Network Graphics] - a non-proprietary alternate to GIFs. For a brief explanation see http://academ.hvcc.edu/~kantopet/graphics/index.php?pa ge=image+files&parent=image+formats&printme=true 15 16 How big is an uncompressed full colour image? GIF Say a standard screen of 1280 * 1024 1,310,720 pixels In 24bit colour is 3,932,160 bytes. And what if you wanted full 25 fps movie quality? 98 394 000 bytes/sec or 787 Gbits/sec! [a good argument for compression of some sort] 8 bit colour 128 colours Can be as few as 2 bit [B&W, absolute contrast] No compression Good for Web images [most devices handle basic 128 colours OK] 1 byte per pixel = small files. 17 18 3
JPEG Examples of GIF & JPEG Thousands of colours Compressed [and a lossy compression] Suitable for presenting photographic images on the Web, but not for commercial printing. 19 20 PNG Moving images Pronounced PING Open standard Can be used like GIF, but supports thousands of colours Very suitable for Web BUT IE6 may not support it fully. Simple images can be handled by animated GIFs More complexity handled by Flash MPEG [Motion Picture Experts Group] Uses JPEG type compression of colours and image [spacial compression] Compresses by using key frame + changes for the next n [5?, 7?] frames. 21 22 MPEG-1 MPEG-2 MPEG-1: Delivers around 2Mbps of video and 250Kbps of 2-channel audio from CD- ROM 352x240 pixels at 30 fps on standard PC 40 minutes of video on CD-ROM (9 hours on DVD) Needs higher data rates of 3-15Mbps, but delivers better picture quality, interlaced video formats and multichannel audio Designed for broadcast digital video Typically standard for commercial DVDs 23 24 4
MPEG-4 AVI MPEG-4 is a video standard defined by the Moving Picture Experts Group (MPEG). The format became an international standard in 2000. Apple has backed MPEG-4 strongly and MPEG-4 plays well in the Quicktime player. Microsoft was involved in the early development of MPEG-4 but is no longer actively supporting the standard, choosing instead to concentrate on the Windows Media format. MPEG-4 can only be played in Windows Media Player with a special MPEG-4 decoder pack. Audio/Video Interleaved a common video file format (.avi). Quality can be very good at smaller resolutions but files tend to rather large. 25 26 Sound files MP3 WAV uncompressed audio format developed by IBM and Microsoft. Standard audio file used on PCs. AU Short for audio, a common format for sound files on UNIX machines. It is also the standard audio file format for the Java programming language RA [RAM] "Real Audio" file type from Real Networks. Usually produced by any of Real Networks' proprietary software. RAM is used for links to streaming audio A lossy compressed [1:12?] audio format. Because of compression it is popular for downloading across the internet and for use with memory-based audio players. Sound quality varies as a function of the recording/encoding bit rate. Anything recorded with less than 128 kbps data rate will sound degraded. For a high fidelity home audio system no less than 256 kbps data rate. [Anything less would mean too much compression was used at the cost of sound fidelity.] WAV MP3 27 28 OGG-VOBIS Shorten and FLAC format files What is Ogg Vorbis? Ogg Vorbis is an audio compression format. It is roughly comparable to other formats used to store and play digital music, such as MP3. It is different from these other formats because it is completely free, open, and unpatented. What do all the names mean? Ogg > Ogg is the name of Xiph.org's container format for audio, video, and metadata. Vorbis > Vorbis is the name of a specific audio compression scheme that's designed to be contained in Ogg. Note that other formats are capable of being embedded in Ogg What are SHN files? SHN stands for shorten. It is a lossless compression algorithm for digital music. It was developed by SoftSound and it compresses music files to 50-60% of their original size, with no loss in quality. See this FAQ. What are FLAC files? FLAC stands for free lossless audio codec. It is an open source, lossless compression algorithm for digital music. It compresses music files to 50-60% of their original size, with no loss in quality. More FLAC information can be found on the FLAC sourceforge site and in this etree FAQ. 29 30 5
CODECs Next All compressed files rely on a COder/DECoder to undertake the initial compression and the later reconstruction of the image and/or sound. Some of the CODECs are public, some are proprietary. This shopping list has been a way to lead into the next lecture. Preservation of materials, with particular emphasis on digital objects 31 32 6