Standard Alphanumeric Formats Unicode BCD ASCII EBCDIC Unicode Next slides 16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes Unicode Version 2.1 1998 Improves on version 2.0 Includes the Euro sign (20AC 16 = ) From the standard: contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. http://www.unicode.org
Category Alphabetic Characters & Other Symbols V2.0 (1996) 09 V2.1 (1998) 11 V3.0 (2000) 10236 V3.1 (2001) 11929 V3.2 (2002) 12945 Keyboard Input CJK Ideographs (1) 21204 21204 27786 71039 71039 Hangul Symbols Total Graphic Characters (2) Private-Use Characters Surrogates Code Points (3) Control Characters Non-Characters Total Assigned Characters (3) BMP (Plane 0) Supplementary Planes 15 & 16 38885 0 2 47400 38887 0 2 47402 49194 0 34 57709 94140 131072 66 1025 95156 131072 66 103671 Key ( scan ) codes are converted to ASCII ASCII code sent to host computer Received by the host as a stream of data Stored in buffer Processed Etc. Unassigned Characters (4) 18136 18134 7827 1011456 1010440 http://www.agfamonotype.com/software/uni_characset.asp Keyboard to binary Shift Key inhibits bit 5 in the ASCII code Key(s) ASCII code 6 5 4 3 2 1 0 Character a 1 1 0 0 0 0 1 a Shift a 1 0 0 0 0 0 1 A Figure 3.7 Keyboard operation
Control Key inhibits bits 5 & 6 in the ASCII code Key(s) Ctrl c c ASCII code 6 5 4 3 2 1 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 Character c ETX Control code Data Input Devices OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices OCR Data Input Devices Hello, world Page of text Optical scan 10110110 Computer file OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Bar Codes An automatic identification (Auto ID) technology that streamlines identification and data collection See: http://www.howstuffworks.com/upc.htm http://www.barcodegraphics.com/info_center/upc.htm 0 1 2 3 4 5 6 7 8 9 Standard UPC number (must have a zero to do zero-suppressed numbers) Reserved Random-weight items (fruits, vegetables, meats, etc.) Pharmaceuticals In-store marking for retailers (A store can set up its own codes, but no other store will understand them.) Coupons Standard UPC number Standard UPC number Reserved Reserved Add all of the digits in odd positions 6 + 9 + 8 + 0 + 0 + 9 = 32 Multiply by 3: 32 * 3 = 96 Add all of the digits in even positions 3 + 3 + 2 + 0 + 3 = 11 Add this sum to the value in step 2. 96 + 11 = 107 Determine the number that, when added to the number in step 4, is a multiple of 10. 107 + 3 = 110 The check digit is therefore 3.
Can I Decode the Bars? 0 = 3-2-1-1 1 = 2-2-2-1 2 = 2-1-2-2 3 = 1-4-1-1 4 = 1-1-3-2 5 = 1-2-3-1 6 = 1-1-1-4 7 = 1-3-1-2 8 = 1-2-1-3 Sum = 7 Start: 1-1-1 0 = 3-2-1-1 4 = 1-1-3-2 3 = 1-4-1-1 0 3 = (3-2-1-1) 3 Middle : 1-1-1-1-1 1 = 2-2-2-1 8 = 1-2-1-3 1 = 2-2-2-1 7 = 1-3-1-2 0 = 3-2-1-1 6 = 1-1-1-4 Stop: 1-1-1 9 = 3-1-1-2 Data Input Devices OCR optical character recognition Bar code readers Pointing devices Punched cards Voice/audio input Images / objects Pointing Devices Originally used for specifying coordinates (x, y) for graphical input Today used as general purpose device for graphical user interfaces (GUIs) pp. 69-86
Data Input Devices OCR optical character recognition Bar code readers Pointing devices Punched cards Voice/audio input Images / objects Punched Cards Invented by Herman Hollerith (founder of IBM) Each card holds 80 characters pp. 69-86 Image data Typical Save As Dialog Typically images are pictures that are optically scanned and saved as a bit map or in some other format Many formats gif, jpeg, Note: animated gifs often used on www
Types of images Bitmaps (raster images) Examples: photographs, pointing devices Continuous variation of color, shape, texture Entered via a scanner or video camera Object images Created with specialized drawing programs Set of graphical objects (lines, rectangles, etc.) Bitmap images Made of pixels Require a lot of memory (600 x 800 x 3 = 1.4 MB) Resolution defines the detail level of the image Involve little processing Formats GIF (limited to 256 colors) JPEG (up to 16 Million of colors; use compression) GIF format GIF image format Figure 3.11 GIF file format layout Figure 3.10 GIF screen layout
Object images Images made of geometrically definable shapes: example: MS Paint software. Efficient, can be manipulated, flexible, small size, etc. Object images: Postscript, PDF Postscript: Set of graphical statements Includes scalable fonts Advantages PDF Video images Voice/audio data Require large amounts of data Example: 640 x 480 x 30 frames/s x 3 colors bits = 27. Mb/s = 1.6 GB / minute Solutions: Reduce: Size, colors, sampling frequency Compress Complication: real-time streaming Input device: microphone Sound can be stored, manipulated Analog to digital format conversion (sampling) Must be represented numerically Sampling rate (usual 50 KHz) Digitize 10110010
Sampling Audio data formats Figure 3.15 Digitizing an audio waveform MIDI Used for storing instrument sound WAV Used to store sound snippets MP3 Derived from MPEG-2 High quality WAV data format Data compression Figure 3.16.WAV sound format Many algorithms Types: lossless, lossy Example of algorithm (compression 35%, not good for streaming) Store repeated characters as (char, # of occurrences) Replace repeated sequences by one value Examples: ZIP, GIF (losslessly) Losssy algorithms Can reduce the size by 10 times Example: MPEG2 (compression ratio 100:1)