Sources of Evidence. CSF: Forensics Cyber-Security. Part I. Foundations of Digital Forensics. Fall 2015 Nuno Santos

Similar documents
File Systems and Volumes

UNIT 7A Data Representation: Numbers and Text. Digital Data

CMPS 10 Introduction to Computer Science Lecture Notes

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures

umber Systems bit nibble byte word binary decimal

Chapter 4: Computer Codes. In this chapter you will learn about:

CIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1

CIS-331 Final Exam Fall 2015 Total of 120 Points. Version 1

Hardware: Logical View

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Representation of Information

Network Layer/IP Protocols

Computer Networks A Simple Network Analyzer Decoding Ethernet and IP headers

File System Interpretation

1.1. INTRODUCTION 1.2. NUMBER SYSTEMS

Networking Background

precise rules that govern communication between two parties TCP/IP: the basic Internet protocols IP: Internet protocol (bottom level)

CIS-331 Exam 2 Spring 2016 Total of 110 Points Version 1

Memory Addressing, Binary, and Hexadecimal Review

CIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1

ECE 461 Internetworking Fall Quiz 1

CIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1

Digital Fundamentals

EE 610 Part 2: Encapsulation and network utilities

Internet Protocol version 6

Chapter 20 Network Layer: Internet Protocol 20.1

Bits. Binary Digits. 0 or 1

Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration. Xeno Kovah

CIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1

CIS-331 Final Exam Spring 2015 Total of 115 Points. Version 1

CIS-331 Final Exam Spring 2018 Total of 120 Points. Version 1

ECCouncil Computer Hacking Forensic Investigator (V8)

Overview. Exercise 0: Implementing a Client. Setup and Preparation

Digital Computers and Machine Representation of Data

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Bits and Bytes and Numbers

Technical concepts. Some basics of computers today. Comp 399

RMIT University. Data Communication and Net-Centric Computing COSC 1111/2061. Lecture 2. Internetworking IPv4, IPv6

National 5 Computing Science Software Design & Development

Chapter 11 : Computer Science. Information Representation. Class XI ( As per CBSE Board) New Syllabus

Topic Notes: Bits and Bytes and Numbers

Introduction to Volume Analysis, Part I: Foundations, The Sleuth Kit and Autopsy. Digital Forensics Course* Leonardo A. Martucci *based on the book:

Communication and Networks. Problems

,879 B FAT #1 FAT #2 root directory data. Figure 1: Disk layout for a 1.44 Mb DOS diskette. B is the boot sector.

Digital Fundamentals

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Chapter 5 OSI Network Layer

Lecture C1 Data Representation. Computing and Art : Nature, Power, and Limits CC 3.12: Fall 2007

Chapter 5. A Closer Look at Instruction Set Architectures

Computer Networks A Simple Network Analyzer PART A undergraduates and graduates PART B graduate students only

Exam Questions v8

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

Lecture 1: What is a computer?

Representing Characters, Strings and Text

CS 261 Fall Binary Information (convert to hex) Mike Lam, Professor

15110 Principles of Computing, Carnegie Mellon University - CORTINA. Digital Data

Business Data Networks and Security 10th Edition by Panko Test Bank

Vendor: ECCouncil. Exam Code: EC Exam Name: Computer Hacking Forensic Investigator Exam. Version: Demo

BINARY SYSTEM. Binary system is used in digital systems because it is:

ECE4110 Internetwork Programming. Introduction and Overview

Course Schedule. CS 221 Computer Architecture. Week 3: Plan. I. Hexadecimals and Character Representations. Hexadecimal Representation

The Building Blocks: Binary Numbers, Boolean Logic, and Gates. Purpose of Chapter. External Representation of Information.

Exam Questions EC1-349

ECE 158A: Lecture 7. Fall 2015

CHAPTER 5 A Closer Look at Instruction Set Architectures

MODULE: NETWORKS MODULE CODE: CAN1102C. Duration: 2 Hours 15 Mins. Instructions to Candidates:

Computer Networks Prof. S. Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 28 IP Version 4

Hackveda Training - Ethical Hacking, Networking & Security

Distributed Systems 8. Remote Procedure Calls

Five classic components

When an instruction is initially read from memory it goes to the Instruction register.

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Planning for Information Network

Chapter 2 Number Systems and Codes Dr. Xu

Understand the factors involved in instruction set

Final Labs and Tutors

IPv6 Protocols and Networks Hadassah College Spring 2018 Wireless Dr. Martin Land

TSIN02 - Internetworking

1.1 Information representation

The x86 Microprocessors. Introduction. The 80x86 Microprocessors. 1.1 Assembly Language

Overview. Exercise 0: Implementing a Client. Setup and Preparation

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

CIS-331 Final Exam Spring 2016 Total of 120 Points. Version 1

Chapter 7 Internet Protocol Version 4 (IPv4) Kyung Hee University

Computer Organization

Homework 1 graded and returned in class today. Solutions posted online. Request regrades by next class period. Question 10 treated as extra credit

ECE2049: Embedded Computing in Engineering Design C Term Spring Lecture #3: Of Integers and Endians (pt. 2)

Chapter 1 Preliminaries

4.0.1 CHAPTER INTRODUCTION

Chapter 6 Addressing the Network- IPv4

Introduction to Numbering Systems

AQA GCSE Computer Science PLC

LING 388: Computers and Language. Lecture 5

Chapter 2 - Part 1. The TCP/IP Protocol: The Language of the Internet

ECE 358 Project 3 Encapsulation and Network Utilities

IPv6 is Internet protocol version 6. Following are its distinctive features as compared to IPv4. Header format simplification Expanded routing and

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Bits and Bit Patterns

Number Systems Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Number Representation

Chapter 5. A Closer Look at Instruction Set Architectures

Transcription:

Sources of Evidence Part I. Foundations of Digital Forensics CSF: Forensics Cyber-Security Fall 2015 Nuno Santos

Summary Reasoning about sources of evidence Data representation and interpretation Number systems Endianness Text systems Data structures Abstraction layers 2

Today: finding and interpreting data 3

Remember were we are Today s class: Sources of data Last class: Evidence acquisition 4

Reasoning about sources of evidence 5

There s many places where to get evidence from Main transaction records These include all purchases, sales and other contractual arrangements at the heart of the business Main business records These include all of the above, but also all documents and data that are likely to be necessary to comply with legal and regulatory requirements Email traffic Emails potentially provide important evidence of formal and informal contacts Selected personal computers (PCs) The organization will need to be able to seize their PCs and make a proper forensic image Selected mobile phones / tablets/pdas etc. These devices can hold substantial amounts of data Back-up media Back-up archives are extremely important sources of evidence, as they can show if live files have been tampered with. They can also provide data which has been deleted from the live system Telephone Recordings Many companies routinely record conversations between their staff and customers Selected data media Most computer users archive all or part of their activities on external storage media Access control logs Access control systems can be configured to maintain records of when usernames and passwords were issued, when passwords were changed, when access rights were changed and/or terminated Configuration, event, error and other internal files and logs All computers contain files which help to define how the operating system and various individual programs are supposed to work Internet activity logs Individual PCs maintain records of recent web access in the form of the history file and the cache held in the temporary internet files folder Anti-virus logs These record the detecting and destruction of viruses and trojans Intrusion detection logs Larger computer systems often use intrusion detection systems as part of their security measures 6

Lots of different technologies to master Networked systems Diversity of hardware components 7 Variety of technologies for similar platforms Multiple generations of hardware

A simple way to reason about evidence sources Data is stored and processed in computers Data can be exchanged between computers through networks 8

A simple way to reason about evidence sources Data are groups of 1 s and 0 s file Typical data abstractions: In computers: the file In networks: the message Can be stored in persistent or volatile memory message 9

Data representation and interpretation 10

We found a piece of digital evidence: What s this? A JPEG image? A TCP/IP packet? The master boot record of a hard disk? An encrypted document? 11 A piece of an application-specific log? The dump of a FAT file system?

From a piece of data to information In digital forensics, we want to extract information out of the observed data Data: is the plural of the word datum and are basically just facts; these facts are have not been processed or dealt with and are in their rawest form Information: is the knowledge communicated or received concerning a particular fact or circumstance; it is usually the product of analyzing data Example: - Data: disk image - Information: deleted files 12

Need to understand how data is represented The computer stores everything as 1 s and 0 s: the way we interpret groups of bits depends upon the context As a forensic analyst, you will be working with different data representation schemes To properly interpret evidence, we need to understand the fundamentals of how data is represented 13

Representation of data 1. Number systems 2. Endianness 3. Text systems 4. Data structures 5. Abstraction layers 14

Number systems There are four mostly relevant number systems: Decimal: base 10 Binary: base 2 Octal: base 8 Hexadecimal: base 16 15

Binary 01001101 b 01001101 2 Number system representations Octal 115o note: trailing charter is a lowercase oh 115 8 Hexadecimal 0x4D note: leading character is a zero 4Dh 4D 16 16

Base 10 Uses digits 0~9 Based on powers of 10 Decimal number system 10 5 10 4 10 3 10 2 10 1 10 0 100,000 10,000 1000 100 10 1 3 2 7 1 9 4 17 3 * 10 5 = 300,000 2 * 10 4 = 20,000 7 * 10 3 = 7,000 1 * 10 2 = 100 9 * 10 1 = 90 4 * 10 0 = 4 ------------------------------- TOTAL = 327,194

Binary number system Base 2 Uses digits 0~1 Based on powers of 2 2 5 2 4 2 3 2 2 2 1 2 0 32 16 8 4 2 1 1 1 0 1 0 1 1 * 2 5 = 32 1 * 2 4 = 16 0 * 2 3 = 0 1 * 2 2 = 4 0 * 2 1 = 0 1 * 2 0 = 1 ------------------------------- 110101 2 = 53 10 Base 10 Base 2 0 0 1 1 2 10 3 11 4 100 5 101 6 110 7 111 8 1000 9 1001 10 1010 11 1011 12 1100 13 1101 14 1110 15 1111 18

Octal number system Base 8 Uses digits 0~7 Based on powers of 8 8 4 8 3 8 2 8 1 8 0 4096 512 64 8 1 7 0 2 6 5 7 * 8 4 = 28,672 0 * 8 3 = 0 2 * 8 2 = 128 6 * 8 1 = 48 5 * 8 0 = 5 ------------------------------- 70265 8 = 28,853 10 Base 10 Base 8 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 17 19

Hexadecimal number system Base 16 20 Uses digits 0~9 and A, B, C, D, E, F Based on powers of 16 16 5 16 4 16 3 16 2 16 1 16 0 1,048,576 65,536 4096 256 16 1 3 F 7 A 0 E 3 * 16 5 = 3,145,728 F * 16 4 = 983,040 7 * 16 3 = 28,672 A * 16 2 = 2560 0 * 16 1 = 0 E * 16 0 = 14 ------------------------------- 3F7A0E 16 = 10,451,470 10 Base 10 Base 16 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 A 11 B 12 C 13 D 14 E 15 F

Number system comparison Decimal Binary Octal Hexadecimal 0 0 0 0 1 1 1 1 2 10 2 2 3 11 3 3 4 100 4 4 5 101 5 5 6 110 6 6 7 111 7 7 8 1000 10 8 9 1001 11 9 10 1010 12 A 11 1011 13 B 12 1100 14 C 13 1101 15 D 14 1110 16 E 15 1111 17 F 21

Endianness Numbers can be stored as a sequence of one or more bytes Endianness deals with the order in which the sequence of bytes is stored Two different methods for storing data have appeared: Endian First byte Last byte Notes Arch Big Little Most significant Least significant Least significant Similar to a number written on paper Most significant Similar to arithmetic calculation order SPARC, Power, PowerPC, MIPS x86, ARM 22

Big-endian vs. little-endian The illustration shows an example using the data word "0A 0B 0C 0D", which is a set of four bytes written using left-to-right positional, hexadecimal notation, and the four memory locations with addresses a, a+1, a+2 and a+3 23

Why is it important for digital forensics? For single byte stored values, the issue of endianness does not arise: the values are the same in both systems But to correctly evaluate a number from a sequence of bytes, we must know which system was used to store the values Otherwise, we can obtain wrong results! 24

Different interpretations of a 16-bit number In the sequence below, the two bytes highlighted represent a 16bit integer (8bit x 2 = 16bits or 2 bytes) In a big-endian system, the value would be calculated as: Big-endian calculation: 0x0123 = 291 In a little-endian system, the value would be calculated as: Little-Endian Calculation: 0x2301 = 8961 25

Another example The first two lines of a tcpdump file are different when created on an Intel or on a Sun computer The table below shows these lines in both architectures representing the date: Sat, 10 May 2003 08:37:01 GMT) The different byte order on both systems is clearly visible 26

Where does the name come from? The term endian comes from the novel Gulliver s Travels by Jonathan Swift. In this fictitious world there were two island nations, Lilliput and Blefuscu. They were mortal enemies because the emperor of Lilliput had decreed that boiled eggs were to be cracked at the "little end", whereas on Blefuscu they had always cracked their eggs at the "big end". It illustrates the fact that something quite simple can be done in two completely different ways 27

Text representations Text values stored in a computer can be in several formats Most common ones: ASCII Unicode (various types) By far, the most common is ASCII 28

ASCII encoding ASCII ("ask-key ) is the common code text representation American Standard Code for Information Interchange Proposed by ANSI in 1963, and finalized in 1968 Assigns a numerical value to characters in American English E.g., the letter 'A' is equal to 0x41, and '&' is equal to 0x26 Some values are control, such the 0x07 bell sound The largest defined value is 0x7E which means that 1 character is encoded in 7-bits When 8-bit byte became the norm, it was decided to use 7-bit ASCII characters + 1-parity bit to detect transmission errors 29

Over time, this table had limitations: E.g., there was the need to accommodate European languages or mathematical symbols Extended ASCII table Appeared the Extended ASCII Character Set 8-bit character encoding scheme that includes the standard 7-bit ASCII characters as well as others representing additional special, mathematical, graphic, and foreign characters 30

Unicode ASCII is nice and simple if you use American English, but it is quite limited for the rest of the world Their native symbols cannot be represented Unicode helps solve this problem by using more than 1 byte to store the numerical version of a symbol The version 4.0 Unicode standard supports over 96,000 characters, which requires 4-bytes per character instead of the 1 byte that ASCII requires 31

Tradeoffs in Unicode encoding There are three ways of storing a Unicode character: UTF-32: uses a 4-byte value for each character UTF-16: most used characters in 2-byte value, lesser-used 4-bytes UTF-8: uses 1, 2, or 4 bytes (most frequently used in 1 byte) Tradeoff between number of characters that can be represented, and space and processing efficiency UTF-8 is frequently used because it has the least amount of wasted space and because ASCII is a subset of it UTF-8 32

String representation Text Hello World Binary (ASCII) 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100 Hex (ASCII) 48 65 6C 6C 6F 20 57 6F 72 6C 64 Text Binary Octal Hex H e l l o 01001000 110 48 01100101 145 65 01101100 154 6C 01101100 154 6C 01101111 157 6F 00100000 040 20 33

Data structures A data structure describes how data are laid out: it is broken up into fields, each field has a size and name 34

Pointers in data structures Data structures may have pointers Example: detail of a basic disk with four partitions; the partition table entries below 1 st partition relative sectors 1 st partition total sectors 000001B0: 80 01 000001C0: 01 00 07 FE BF 09 3F 00-00 00 4B F5 7F 00 00 00 000001D0: 81 0A 07 FE FF FF 8A F5-7F 00 3D 26 9C 00 00 00 000001E0: C1 FF 05 FE FF FF C7 1B - 1C 01 D6 96 92 00 00 00 000001F0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 35

Exercise: Reverse engineering an IP packet Identify the fields of the following IP packet: 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 IP Header 36

Exercise: Reverse engineering an IP packet 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 45 00: 4 says we we are using Ipv4, 5 is the number of 32-bit words in header (not using option field), 00 shows that we are not using Differentiated services 00 1d: the length of the entire datagram; includes the UDP length, data and IP header. The IP header is of 20 bytes when we are not using option field 7b bd: identification of fragments of an original IP datagram 00 00: These values corresponds to Flag Field and Fragment Field 80 11: 80 shows the TTL, 11 shows we are using UDP in our datagram( 17 in decimal for UDP) 3a e5: these bytes show the checksum of IP-Header c0 a8 01 a6: source IP Address c0 a8 01 37: destination IP address 37

Exercise: Reverse engineering an IP packet Identify the UDP payload fields 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 UDP Header 38

UDP payload fields Exercise: Reverse engineering an IP packet 0000: 45 00 00 1d 7b bd 00 00-80 11 3a e5 c0 a8 01 a6 0010: c0 a8 01 37 23 82 23 83-00 09 33 a9 01 23 82: these values show the Source port 23 83: these bytes points to the Destination port number 00 09: these four bytes show the length for UDP datagram 33 a9: UDP Checksum 01: the data 39

Abstraction layers Data can be organized in different abstraction layers file file system partition block device Data encapsulation in the TCP/IP protocol stack Abstraction layers of a typical storage stack When performing forensics investigation: We can focus on abstraction layers independently Get information of higher layers, by looking at lower layers 40

Helper tools 41

Conclusions A simple way to reason about sources of evidence is to model them as networked computers To properly interpret digital data, it is fundamental to understand how computers represent the data Several aspects need to be considered when interpreting data: number system, endianness, text encoding, data structure format, and abstraction layer 42

References Primary bibliography [Casey11] Section 15.3 To learn more Bryan Carrier, File System Analysis, 2005, Chapter 2 43

Next class File systems 44