Sources of Evidence Part I. Foundations of Digital Forensics CSF: Forensics Cyber-Security Fall 2015 Nuno Santos
Summary Reasoning about sources of evidence Data representation and interpretation Number systems Endianness Text systems Data structures Abstraction layers 2
Today: finding and interpreting data 3
Remember were we are Today s class: Sources of data Last class: Evidence acquisition 4
Reasoning about sources of evidence 5
There s many places where to get evidence from Main transaction records These include all purchases, sales and other contractual arrangements at the heart of the business Main business records These include all of the above, but also all documents and data that are likely to be necessary to comply with legal and regulatory requirements Email traffic Emails potentially provide important evidence of formal and informal contacts Selected personal computers (PCs) The organization will need to be able to seize their PCs and make a proper forensic image Selected mobile phones / tablets/pdas etc. These devices can hold substantial amounts of data Back-up media Back-up archives are extremely important sources of evidence, as they can show if live files have been tampered with. They can also provide data which has been deleted from the live system Telephone Recordings Many companies routinely record conversations between their staff and customers Selected data media Most computer users archive all or part of their activities on external storage media Access control logs Access control systems can be configured to maintain records of when usernames and passwords were issued, when passwords were changed, when access rights were changed and/or terminated Configuration, event, error and other internal files and logs All computers contain files which help to define how the operating system and various individual programs are supposed to work Internet activity logs Individual PCs maintain records of recent web access in the form of the history file and the cache held in the temporary internet files folder Anti-virus logs These record the detecting and destruction of viruses and trojans Intrusion detection logs Larger computer systems often use intrusion detection systems as part of their security measures 6
Lots of different technologies to master Networked systems Diversity of hardware components 7 Variety of technologies for similar platforms Multiple generations of hardware
A simple way to reason about evidence sources Data is stored and processed in computers Data can be exchanged between computers through networks 8
A simple way to reason about evidence sources Data are groups of 1 s and 0 s file Typical data abstractions: In computers: the file In networks: the message Can be stored in persistent or volatile memory message 9
Data representation and interpretation 10
We found a piece of digital evidence: What s this? A JPEG image? A TCP/IP packet? The master boot record of a hard disk? An encrypted document? 11 A piece of an application-specific log? The dump of a FAT file system?
From a piece of data to information In digital forensics, we want to extract information out of the observed data Data: is the plural of the word datum and are basically just facts; these facts are have not been processed or dealt with and are in their rawest form Information: is the knowledge communicated or received concerning a particular fact or circumstance; it is usually the product of analyzing data Example: - Data: disk image - Information: deleted files 12
Need to understand how data is represented The computer stores everything as 1 s and 0 s: the way we interpret groups of bits depends upon the context As a forensic analyst, you will be working with different data representation schemes To properly interpret evidence, we need to understand the fundamentals of how data is represented 13
Representation of data 1. Number systems 2. Endianness 3. Text systems 4. Data structures 5. Abstraction layers 14
Number systems There are four mostly relevant number systems: Decimal: base 10 Binary: base 2 Octal: base 8 Hexadecimal: base 16 15
Binary 01001101 b 01001101 2 Number system representations Octal 115o note: trailing charter is a lowercase oh 115 8 Hexadecimal 0x4D note: leading character is a zero 4Dh 4D 16 16
Base 10 Uses digits 0~9 Based on powers of 10 Decimal number system 10 5 10 4 10 3 10 2 10 1 10 0 100,000 10,000 1000 100 10 1 3 2 7 1 9 4 17 3 * 10 5 = 300,000 2 * 10 4 = 20,000 7 * 10 3 = 7,000 1 * 10 2 = 100 9 * 10 1 = 90 4 * 10 0 = 4 ------------------------------- TOTAL = 327,194
Binary number system Base 2 Uses digits 0~1 Based on powers of 2 2 5 2 4 2 3 2 2 2 1 2 0 32 16 8 4 2 1 1 1 0 1 0 1 1 * 2 5 = 32 1 * 2 4 = 16 0 * 2 3 = 0 1 * 2 2 = 4 0 * 2 1 = 0 1 * 2 0 = 1 ------------------------------- 110101 2 = 53 10 Base 10 Base 2 0 0 1 1 2 10 3 11 4 100 5 101 6 110 7 111 8 1000 9 1001 10 1010 11 1011 12 1100 13 1101 14 1110 15 1111 18
Octal number system Base 8 Uses digits 0~7 Based on powers of 8 8 4 8 3 8 2 8 1 8 0 4096 512 64 8 1 7 0 2 6 5 7 * 8 4 = 28,672 0 * 8 3 = 0 2 * 8 2 = 128 6 * 8 1 = 48 5 * 8 0 = 5 ------------------------------- 70265 8 = 28,853 10 Base 10 Base 8 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 17 19
Hexadecimal number system Base 16 20 Uses digits 0~9 and A, B, C, D, E, F Based on powers of 16 16 5 16 4 16 3 16 2 16 1 16 0 1,048,576 65,536 4096 256 16 1 3 F 7 A 0 E 3 * 16 5 = 3,145,728 F * 16 4 = 983,040 7 * 16 3 = 28,672 A * 16 2 = 2560 0 * 16 1 = 0 E * 16 0 = 14 ------------------------------- 3F7A0E 16 = 10,451,470 10 Base 10 Base 16 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 A 11 B 12 C 13 D 14 E 15 F
Number system comparison Decimal Binary Octal Hexadecimal 0 0 0 0 1 1 1 1 2 10 2 2 3 11 3 3 4 100 4 4 5 101 5 5 6 110 6 6 7 111 7 7 8 1000 10 8 9 1001 11 9 10 1010 12 A 11 1011 13 B 12 1100 14 C 13 1101 15 D 14 1110 16 E 15 1111 17 F 21
Endianness Numbers can be stored as a sequence of one or more bytes Endianness deals with the order in which the sequence of bytes is stored Two different methods for storing data have appeared: Endian First byte Last byte Notes Arch Big Little Most significant Least significant Least significant Similar to a number written on paper Most significant Similar to arithmetic calculation order SPARC, Power, PowerPC, MIPS x86, ARM 22
Big-endian vs. little-endian The illustration shows an example using the data word "0A 0B 0C 0D", which is a set of four bytes written using left-to-right positional, hexadecimal notation, and the four memory locations with addresses a, a+1, a+2 and a+3 23
Why is it important for digital forensics? For single byte stored values, the issue of endianness does not arise: the values are the same in both systems But to correctly evaluate a number from a sequence of bytes, we must know which system was used to store the values Otherwise, we can obtain wrong results! 24
Different interpretations of a 16-bit number In the sequence below, the two bytes highlighted represent a 16bit integer (8bit x 2 = 16bits or 2 bytes) In a big-endian system, the value would be calculated as: Big-endian calculation: 0x0123 = 291 In a little-endian system, the value would be calculated as: Little-Endian Calculation: 0x2301 = 8961 25
Another example The first two lines of a tcpdump file are different when created on an Intel or on a Sun computer The table below shows these lines in both architectures representing the date: Sat, 10 May 2003 08:37:01 GMT) The different byte order on both systems is clearly visible 26
Where does the name come from? The term endian comes from the novel Gulliver s Travels by Jonathan Swift. In this fictitious world there were two island nations, Lilliput and Blefuscu. They were mortal enemies because the emperor of Lilliput had decreed that boiled eggs were to be cracked at the "little end", whereas on Blefuscu they had always cracked their eggs at the "big end". It illustrates the fact that something quite simple can be done in two completely different ways 27
Text representations Text values stored in a computer can be in several formats Most common ones: ASCII Unicode (various types) By far, the most common is ASCII 28
ASCII encoding ASCII ("ask-key ) is the common code text representation American Standard Code for Information Interchange Proposed by ANSI in 1963, and finalized in 1968 Assigns a numerical value to characters in American English E.g., the letter 'A' is equal to 0x41, and '&' is equal to 0x26 Some values are control, such the 0x07 bell sound The largest defined value is 0x7E which means that 1 character is encoded in 7-bits When 8-bit byte became the norm, it was decided to use 7-bit ASCII characters + 1-parity bit to detect transmission errors 29
Over time, this table had limitations: E.g., there was the need to accommodate European languages or mathematical symbols Extended ASCII table Appeared the Extended ASCII Character Set 8-bit character encoding scheme that includes the standard 7-bit ASCII characters as well as others representing additional special, mathematical, graphic, and foreign characters 30
Unicode ASCII is nice and simple if you use American English, but it is quite limited for the rest of the world Their native symbols cannot be represented Unicode helps solve this problem by using more than 1 byte to store the numerical version of a symbol The version 4.0 Unicode standard supports over 96,000 characters, which requires 4-bytes per character instead of the 1 byte that ASCII requires 31
Tradeoffs in Unicode encoding There are three ways of storing a Unicode character: UTF-32: uses a 4-byte value for each character UTF-16: most used characters in 2-byte value, lesser-used 4-bytes UTF-8: uses 1, 2, or 4 bytes (most frequently used in 1 byte) Tradeoff between number of characters that can be represented, and space and processing efficiency UTF-8 is frequently used because it has the least amount of wasted space and because ASCII is a subset of it UTF-8 32
String representation Text Hello World Binary (ASCII) 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100 Hex (ASCII) 48 65 6C 6C 6F 20 57 6F 72 6C 64 Text Binary Octal Hex H e l l o 01001000 110 48 01100101 145 65 01101100 154 6C 01101100 154 6C 01101111 157 6F 00100000 040 20 33
Data structures A data structure describes how data are laid out: it is broken up into fields, each field has a size and name 34
Pointers in data structures Data structures may have pointers Example: detail of a basic disk with four partitions; the partition table entries below 1 st partition relative sectors 1 st partition total sectors 000001B0: 80 01 000001C0: 01 00 07 FE BF 09 3F 00-00 00 4B F5 7F 00 00 00 000001D0: 81 0A 07 FE FF FF 8A F5-7F 00 3D 26 9C 00 00 00 000001E0: C1 FF 05 FE FF FF C7 1B - 1C 01 D6 96 92 00 00 00 000001F0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 35
Exercise: Reverse engineering an IP packet Identify the fields of the following IP packet: 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 IP Header 36
Exercise: Reverse engineering an IP packet 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 45 00: 4 says we we are using Ipv4, 5 is the number of 32-bit words in header (not using option field), 00 shows that we are not using Differentiated services 00 1d: the length of the entire datagram; includes the UDP length, data and IP header. The IP header is of 20 bytes when we are not using option field 7b bd: identification of fragments of an original IP datagram 00 00: These values corresponds to Flag Field and Fragment Field 80 11: 80 shows the TTL, 11 shows we are using UDP in our datagram( 17 in decimal for UDP) 3a e5: these bytes show the checksum of IP-Header c0 a8 01 a6: source IP Address c0 a8 01 37: destination IP address 37
Exercise: Reverse engineering an IP packet Identify the UDP payload fields 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 UDP Header 38
UDP payload fields Exercise: Reverse engineering an IP packet 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 23 82: these values show the Source port 23 83: these bytes points to the Destination port number 00 09: these four bytes show the length for UDP datagram 33 a9: UDP Checksum 01: the data 39
Abstraction layers Data can be organized in different abstraction layers file file system partition block device Data encapsulation in the TCP/IP protocol stack Abstraction layers of a typical storage stack When performing forensics investigation: We can focus on abstraction layers independently Get information of higher layers, by looking at lower layers 40
Helper tools 41
Conclusions A simple way to reason about sources of evidence is to model them as networked computers To properly interpret digital data, it is fundamental to understand how computers represent the data Several aspects need to be considered when interpreting data: number system, endianness, text encoding, data structure format, and abstraction layer 42
References Primary bibliography [Casey11] Section 15.3 To learn more Bryan Carrier, File System Analysis, 2005, Chapter 2 43
Next class File systems 44