Encoding and Encrypting Information

Information is often encoded numerically. By using some mathematics to encode information in an appropriate way, we can overcome problems encountered in dealing with the information. Different situations have different problems. We will consider three different ways to encode information to overcome three different types of problems. Each type we will introduce with one type of situation. 2

1. How can one give a code (such as a bar code) to a grocery store item in such a way to eliminate, or minimize, the chance that the item s price is registered correctly? You want to make sure the item is not mistaken for another. 2. How can one encode music or video on a cd or dvd in such a way that small imperfections (such as scratches) do not result in a loss of data? 3. How can you encrypt data, such as credit cards, so that you can transmit the data and have the intended recipient recover the data, but anybody else cannot recover it? 3

Identification Numbers Today we will talk about the first of the three situations described above. We will focus on two examples, UPC s and ISBN s. UPC stands for universal product code and ISBN stands for international standard book number. 4

The UPC Commercial products, such as grocery store items, are identified with a universal product code. This 12 digit numerical code uniquely identifies an item. It appears on the item both as a number and as a bar code. 5

The first string of digits (after the first) identify the manufacturer, and the second string of digits identify the item. The last digit is the one we will focus on. It is called the check digit. Its purpose is to detect errors. We shall see that if one single digit in the UPC is misread, then the resulting sequence will be determined to be invalid. 6

To see if a 12 digit sequence of digits is a valid UPC perform the following computation: multiply the first, third, fifth,..., digits by 3, and the others by 1. Add up the resulting numbers. If the sum is evenly divisible by 10, then the sequence is a valid UPC. For example, consider the sequence 7 18908 14447 3. weights digits 3 1 3 1 3 1 3 1 3 1 3 1 7 1 8 9 0 8 1 4 4 4 7 3 Sum product 21 1 24 9 0 8 3 4 12 4 21 3 110 7

The sum, 110, is evenly divisible by 10 since its last (right-most) digit is a 0. This shows that the sequence is a valid UPC. When an item is scanned, the scanner performs this computation. If the sequence is determined to be valid, then the item s price is retrieved from the store s computer. 8

How is the check digit determined? When a manufacturer produces an item and assigns a UPC, the first chunk of the sequence is the code for the manufacturer. They then select the second chunk to identify the item. Finally, they have to determine the check digit. How do they do this? We will see how through an example. 9

Suppose 0 25192 59452? is to be a UPC. What should be the check digit? We repeat the calculation to check for validity. weights digits 3 1 3 1 3 1 3 1 3 1 3 1 0 2 5 1 9 2 5 9 4 5 2? sum 0 2 15 1 27 2 15 9 12 5 6? 94 +? 10

This sum needs to be divisible by 10, and? must be a digit. The only digit that works is 6; the sum is then 100. So, the check digit is 6, and the full UPC is 0 25192 59452 6. Another way to find the check digit is to take the sum of 94, divide by 10. 10 goes into 94 nine times with 4 left over. Subtract the left over from 10 to get the check digit The numbers 3 and 1 used in the calculation are called weights. We shall see other weights in other identification number schemes. 11

To Find the Check Digit for a UPC To summarize, to find the check digit for a UPC, take the first part of the number, multiply each digit by the corresponding weight number (3 or 1), add up all the terms. Either divide the result by 10, and subtract the remainder from 10 to get the check digit. Alternatively, find the digit (0 through 9) which when added to the sum results in a number evenly divisible by 10. There is a unique digit which will make the calculation work above. 12

Question 1. Is 0 85391 77372 6 a valid UPC? 2. If 0 12569 50162 x is to be a valid UPC, what should be the value of x? 13

Answers 1. Is 0 85391 77372 6 a valid UPC? 3 1 3 1 3 1 3 1 3 1 3 1 Sum 0 8 5 3 9 1 7 7 3 7 2 6 0 8 15 3 27 1 21 7 9 7 6 6 110 The sum is evenly divisible by 10, so it is valid. 14

2. If 0 12569 50162 x is to be a valid UPC, what should be the value of x? 3 1 3 1 3 1 3 1 3 1 3 1 0 1 2 5 6 9 5 0 1 6 2 x Sum 0 1 6 5 18 9 15 0 3 6 6 x 69 + x x must then be 1 in order for the sum to be divisible by 10. The UPC is then 0 12569 50162 1. 15

The purpose of the check digit is to detect an error in one of the digits. If a UPC is read and any single digit is read incorrectly, then the resulting sequence will not be valid. For example, if 7 18908 14447 3 is read as 7 19908 14447 3 by misreading the third digit as 9 rather than 8, then when one performs the calculation, one gets weights digits 3 1 3 1 3 1 3 1 3 1 3 1 7 1 9 9 0 8 1 4 4 4 7 3 Sum 21 1 27 9 0 8 3 4 12 4 21 3 113 the resulting sum is not evenly divisible by 10, so the sequence is not valid. 16

A similar result will occur if any of the digits is changed. However, this scheme does not always detect multiple errors. For example, if 7 18908 14447 3 is read as 7 19608 14447 3, then the resulting calculation yields weights digits 3 1 3 1 3 1 3 1 3 1 3 1 7 1 9 6 0 8 1 4 4 4 7 3 Sum 21 4 27 6 0 8 3 4 12 4 21 3 110 and the sequence would be considered to be a valid UPC. This sort of scheme is then useful only when it is very unlikely to make multiple errors. 17

The ISBN An ISBN, or international standard book number, is a sequence of digits designed to uniquely identify a book. There are two types, ISBN-10 and ISBN-13. The former is being replaced by ISBN-13. The 10 and 13 refer to how many digits are used to represent a book. 18

ISBN-10 An example of an ISBN-10 is 0-387-94753-1. The first digit refers to the language of the book (0 = English). The second block refers to the publisher, the third block to the book itself, and the last digit is the check digit. Each book is then encoded with a 10 digit sequence. The purpose of the check digit is the same as for the UPC; to detect any error in a single digit of the ISBN. 19

To verify if a sequence is a valid ISBN, we perform the following calculation, which we illustrate with the ISBN 0-387-94753-1 10 9 8 7 6 5 4 3 2 1 0 3 8 7 9 4 7 5 3 1 Sum 0 27 64 49 54 20 28 15 6 1 264 If the sum is evenly divisible by 11, then the sequence is valid. Since 264 / 11 = 24, a whole number, the sequence is indeed valid. In other words, you multiply the first digit by 10, the second digit by 9, the third by 8, and so on, and then add all terms. The sequence is valid if the sum is evenly divisible by 11. 20

Use of the check digit allows us to detect single errors in an ISBN. For example, if we take the ISBN 0-387-94753-1 and change the 8th digit from 5 to 6, obtaining 0-387-94763-1, and perform the calculation to check validity, we get 10 9 8 7 6 5 4 3 2 1 0 3 8 7 9 4 7 6 3 1 Sum 0 27 64 49 54 20 28 18 6 1 267 Dividing 267 by 11 gives 24.27, not a whole number. So, 0-387-94763-1 is not a valid ISBN. 21

Finding the check digit for an ISBN is similar to that of a UPC. However, there is one difference that using 11, rather than 10, forces. For example, suppose that 0-14-010867 is to be the first part of an ISBN. What is the check digit? We compute weights digits 10 9 8 7 6 5 4 3 2 1 0 1 4 0 1 0 8 6 7? Sum 0 9 32 0 6 0 32 18 14? 111 +? If we divide 111 by 11, we see that 11 goes into 111 ten times with a remainder of 1. If we subtract the remainder from 11, we will get the check digit. However, this gives 10. To handle this case, the check digit is written as X. So, the full ISBN is 0-14-010867-X. 22

The somewhat more complicated scheme for the ISBN, which is not difficult for a computer to perform, does more than the UPC. Besides detecting single errors, it will also detect transposition errors. These are errors in which two digits are interchanged. For example, given the ISBN 0-387-94753-1, if we interchange the 5th and 6th digits, we will get 0-387-49753-1. If we perform the check to see if this is valid, we get 10 9 8 7 6 5 4 3 2 1 0 3 8 7 4 9 7 5 3 1 Sum 0 27 64 49 24 45 28 15 6 1 259 and since 259 / 11 = 23.54 is not a whole number, the sequence is not valid. 23

ISBN-13 Since 2007, books have given given a 13 digit ISBN, which is slowly replacing the 10 digit ISBN. For example, the 13 digit ISBN for the book whose 10 digit ISBN is 0-387-94753-1 is 978-0387947532. This scheme is very much like the UPC. To confirm that this 13 digit sequence is valid, we perform the following computation. 24

1 3 1 3 1 3 1 3 1 3 1 3 1 Sum 9 7 8 0 3 8 7 9 4 7 5 3 2 9 21 8 0 3 24 7 27 4 21 5 9 2 140 If the sum is evenly divisible by 10, then the sequence is valid. This scheme does not detect all transposition errors as the older ISBN scheme did. Its use started because more ISBNs were needed than the old scheme could provide. Perhaps the method to create them was changed to make it more compatible with other identification number schemes, such as UPC. 25