File Systems and Volumes Section II. Basic Forensic Techniques and Tools CSF: Forensics Cyber-Security MSIDC, Spring 2015 Nuno Santos
Summary! Data organization in storage systems! File deletion and recovery! Data hiding techniques 2
Recall from the last class! Goal of digital investigation: to obtain admissible evidence Case / incident resolution process 3
Now, let s get more technical! Section II: Basic Forensic Techniques and Tools! The big question that will drive us in upcoming classes How can we recover and interpret digital evidence from networked computer systems? 4
There s many potential sources of evidence! Main transaction records! These include all purchases, sales and other contractual arrangements at the heart of the business! Main business records! These include all of the above, but also all documents and data that are likely to be necessary to comply with legal and regulatory requirements! Email traffic! Emails potentially provide important evidence of formal and informal contacts! Selected personal computers (PCs)! The organization will need to be able to seize their PCs and make a proper forensic image! Selected mobile phones / tablets/pdas etc.! These devices can hold substantial amounts of data! Back-up media! Back-up archives are extremely important sources of evidence, as they can show if live files have been tampered with. They can also provide data which has been deleted from the live system! Telephone Recordings! Many companies routinely record conversations between their staff and customers! Selected data media! Most computer users archive all or part of their activities on external storage media! Access control logs! Access control systems can be configured to maintain records of when usernames and passwords were issued, when passwords were changed, when access rights were changed and/or terminated! Configuration, event, error and other internal files and logs! All computers contain files which help to define how the operating system and various individual programs are supposed to work! Internet activity logs! Individual PCs maintain records of recent web access in the form of the history file and the cache held in the temporary internet files folder! Anti-virus logs! These record the detecting and destruction of viruses and trojans! Intrusion detection logs! Larger computer systems often use intrusion detection systems as part of their security measures 5
Digital evidence location: networked computers! Data is stored and processed in computers file! Data exchanged between computers via networks! Where / what to look for will depend on the case message 6
Example: Modeling an arms trafficking scenario! Mr. Victor is a suspect of smuggling weapons on an online marketplace! He has a desktop computer and a mobile phone Mr. Victor s desktop and smartphone Mr. Victor s home network Cellular Net! What kind of evidence would we expect to find?! In computers?! In networks? E-commerce Web site Internet 7
Example: Modeling an arms trafficking scenario! Mr. Victor is a suspect of smuggling weapons on an online marketplace! He has a desktop computer and a mobile phone Mr. Victor s desktop and smartphone Mr. Victor s home network Cellular Net! What kind of evidence would we expect to find? Our focus today! In computers?! In networks? E-commerce Web site Internet 8
Analyzing evidence from computers! Within computers where do we find most evidence?! Persistent storage media! Hard disks! Solid state drives (SSDs)! Contain user-generated content! E.g., documents, images, videos, etc.! Contain meta-data created by the operating system! E.g., date and time information, user access events, etc. 9
Preliminary procedures before analysis Mr. Victor s desktop and smartphone! After extensive investigation, the police seized Mr. Victor s desktop and mobile device! Made disk images of desktop computer and mobile device! Disk images were given to forensic investigators for further analysis Mr. Victor s home network E-commerce Web site Internet Cellular Net 10
What s exactly a disk image?! Disk image is a linearized bit-copy of a given hard disk! Typically stored as a single (large) file Mr. Victor s desktop computer (100GB hard disk) 010101101011110000 Disk image file (100GB size)! A disk image can be stored elsewhere for future analysis 11
What do we see if we open a disk image? This is what we see This is what we d expect to see 12 Why this difference?
Data is organized on disk in layers of abstraction! Highest level of abstraction is closer to what the user sees:! File is the highest level! Block device is the lowest file file system partition / volume block device disk image! File systems and volumes bridge the gap between layers! Disk images collect a snapshot at the block device level! Contains file data and file system / volume meta-data! Forensic analysts must interpret data bottom up from disk images 13
Main challenges in forensic disk analysis! Find visible data! If we collect an image of a disk, how can we make any sense out of it and extract useful data files?! Find deleted data! If data files are deleted, is it still possible to recover them? How?! Find hidden data! If a suspect intentionally hides data in the storage system of a computer, where can we look for it? 14
Data organization in storage systems 15
From a disk image, how can we recover its files?! Challenge: abstraction layer is too low level! Disk image files provide snapshot at the disk drive level! How can we interpret the data images in order to view the files and directories in the way we are used to?! For that, we need to understand how the data is organized in storage systems 16
! Most forensic data is stored on hard disc drives! In commercial use since 1956 Hard disks 17
Hard disk basic terminology! Head! Device which reads and writes data on the disk! Track! Individual circles on disk platter where data are located! Cylinder! A column of tracks on a disk drive with 2 or more platters! Sector! An individual section of data on a track the smallest amount of data which can be written to the disk usually 512 bytes! Disk Capacity = #cylinders * #heads * #sectors * sector_size 18
Disk addressing scheme! Arrange every sector of the disk into a sequential array Sector / Block Address: 0 1 209 715 200 Assuming: sector size = 512 bytes, disk size = 100GB! Logical Block Address (LBA)! Independent from the physical geometry of the disk drive! First block on disk numbered 0, next is 1,! Most modern drives use this scheme 19
The disk is the lowest level of abstraction file file system partition / volume disk image block device! Then comes partitions / volumes 20
Partitions! The logical address space of a disk is usually split into collections of consecutive sectors called partitions! Partitions are used in many scenarios, including! Some file systems have max size smaller than hard disks! Many laptops put to sleep store memory on special partition! Separate partitions for booting multiple OSes 21
Partitions from the user s perspective Snapshot of Windows disk management tool 22
Partitioning methods! OS and hardware platform use different partitioning methods! Typical partition systems have tables; entries describe partitions! Table entry has the starting sector, ending sector, and the type of partition! Where is this table actually stored? 23
Partition table is meta-data to be stored on disk! The layout of the partition table on disk is dependent on the partition system employed! The most commonly encountered partition system is the DOS-style partition! DOS partitions are used with: Microsoft Windows, Linux, and IA32-based FreeBSD and OpenBSD systems 24
DOS partitioning scheme! A disk that is organized using DOS partitions has an MBR in the first 512-byte sector! MBR has a partition table with 4 entries, one per partition Master Boot Record A basic DOS disk with two partitions and the MBR 25
Expected layout when opening disk image Address Hex Dec. Description 0x000 0 Bootstrap code area 446 0x1BE 446 Partition Entry #1 16 0x1CE 462 Partition Entry #2 16 0x1DE 478 Partition Entry #3 16 0x1EE 494 Partition Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 Size (Bytes ) Includes the starting LBA and length of the partition Disk 1 MBR Partition 1 (ext3) Partition 2 (swap) Partition 3 (NTFS) Partition 4 (FAT32) 26
We ve covered the partition abstraction layer file file system partition / volume disk image block device! Then comes file systems 27
! How are files stored within a partition?! Problem:! Files are arbitrarily long sequences of bytes! Disks can only write / read fixed-sized sectors File systems! How to map files content to sectors?! Do we require all sectors to be allocated contiguously?! Files must have names. How to associate names to files?! These issues are addressed by file systems 28
The FAT file system! Simple file system popularized by MS-DOS! First introduced in 1977! Most devices today use the FAT32 spec from 1996! FAT12, FAT16, FAT32, etc.! Still quite popular today! Default format for USB sticks and memory cards! Used for EFI boot partitions! Name comes from the index table used to track directories and files named File Allocation Table (FAT) 29
FAT: Where file data is stored! File content is stored in data units named clusters Sectors Clusters 8 Sectors! Sector! Minimum storage size on a hard drive! One pie shaped arc of a platter! Common storage size of 512 Bytes! Established during low-level formatting! Numbered sequentially starting at 1! Cluster! Minimum storage size for a file as determined by file system! Common cluster size is 4096 Bytes (4KB) 8 Sectors 30
FAT: How file data is tracked! The high-level idea is: Clusters! For each file, keep track of:! Its name! The clusters that are allocated to it! The total file size 33& 34& bla bla Name:&file1.dat& Size:&4000&bytes& Clusters:& 8&Cluster"& 8&Cluster$& 35& 36& bla bla bla. 31
FAT: The directory and FAT data structures Clusters Directory entry structures FAT structure file1.dat& 4000&bytes& cluster&34& & 33& 32& 33& file1.dat& Clusters - Cluster 34 - Cluster 36 34& 35& 36& 37& & 36& EOF& 34& 35& The index in the FAT corresponds to a cluster number 36& 32
FAT: Directory entry points to file s first cluster Clusters Directory entry structures FAT structure file1.dat& 4000&bytes& cluster&34& & 33& 32& 33& 34& 36& 34& 35& file1.dat& - Cluster 34 - Cluster 36 36& 37& & EOF& 35& 36& 33
FAT: FAT entry points to next cluster of the file Clusters Directory entry structures FAT structure file1.dat& 4000&bytes& cluster&34& & 33& 32& 33& 34& 36& 34& 35& file1.dat& - Cluster 34 - Cluster 36 36& 37& & EOF& 35& An EOF in the FAT means that the file ending was reached 36& 34
FAT: Multiple files Clusters Directory entry structures FAT structure file1.dat& 4000&bytes& cluster&34& & 33& 32& 33& 34& EOF& 36& 34& file2.txt& 100&bytes& cluster&33& 35& 36& EOF& 37& 35& & 36& 35
FAT: Directory structure! There is a specific data area for the root directory! Subdirectories are stored in clusters like files are Data$Area Boot$Sector FAT Root$Directory Cluster$90 Cluster$200 dir1 90 File1.txt 200 Cluster$with$the$ new$content$ that$was$just$ created$in$the$ directory 201 EOF This$is$more$ data$that$ couldn t$fit$into$ the$first$cluster 36
Layout of a FAT file system! Layout of FAT16 on a volume! There are two additional variants: FAT12 and FAT32 Region for FAT data structures FAT2 for backup Marks blocks free or in-use Linked-list structure to manage large files Region for the directory entries of the root folder (fixed location) Stores basic info about the file system FAT version, location of boot files Total number of blocks Index of the root directory in the FAT Store file and directory data Each cluster is a fixed size Files may span multiple clusters 37
In forensics, need to understand the boot sector 38
! TSK forensic toolkit! Use the fsstat tool Tools to help interpret the boot sector 39
Summary: To find visible data from a disk image! Use adequate forensic tools to:! Interpret the partition table! Interpret the boot sector layout! Traverse the root directory! Navigate the subdirectories! Open the files 40
By the way! There s a lot more we can learn from the meta-data! E.g., files access times, partition names, file sizes, access permissions, etc.! There s more (and better) file systems out there! NTFS (Windows), EXT2 (Linux), HFS+ (Mac OS X)! There s important differences in storage technology! Especially between hard disks and SSDs 41
File deletion and recovery 42
What happens if a file is deleted?! The FAT file system maintains meta-data that allows for the retrieval of each file in the system 43
Meta-data cleared, but contents still there!! If a file is deleted, the file system s data structures are updated, but the data is still in the blocks! 44
File carving technique! Carving is a general term for extracting structured data out of raw data, based on format specific characteristics present in the structured data! E.g., recover deleted file from unallocated disk space 45
When is file carving useful?! When the data is there, but can t be correctly interpreted due to absent or damaged meta-data! Examples: 1. File system corruption 2. Device formatting 3. Unknown proprietary formats 4. Files removed or deleted (un- or intentionally) 46
Key intuition behind this technique! Identity a piece of data data from a poll of raw data! Applicable not only to deleted files, e.g.:! Individual packets from network traces! Malware code from compromised application 47
The challenge! Given a raw byte stream, how can we extract the data that belongs to a particular file? Can we locate and extract the content of this file? Raw data bytes weapons.pdf 48
File carving: General rules! Does not rely directly on the information present in file system structures! Normally identify common files by means of hashes (MD5) and keywords 49
Key insight: Leverage files internal structure! Some file formats have predefined header and footer! Include magic numbers (i.e. byte sequences in known positions)! For a GIF file:! Header: 0x47 0x49 0x46 0x 38 0x39 0x61! GIF89a! Footer: 0x3B Begins here Ends here 50
Another example: JPEG! JPEG predefines header and footer magic numbers! Header: \xff\xd8! Footer: \xff\xd9! Image data has variable size delimited by footer 51
Some formats specify the file size! Thus, rather than looking for footer, look for file size! For example, BMP files don t have footer! Signature:! 0x42 0x4D! BM 52
Structure-based carving! Recover files based on the internal layout of a file! E.g., identifier strings, header, footer, and size information! Known header and footers or maximum file size! JPEG: \xff\xd8 header and \xff\xd9 footer! GIF: \x47\x49\x46\x38\x37\x61 header, \x00\x3b footer! BMP: BM header and no footer! If the file format has no footer a maximum file size is used! Known header footers carvers:! Scalpel, Foremost and File finder (EnCase) 53
Content-based carving! Identify file content based on internal file contents! Content structure! Loose structure (HTML, XML)! Content characteristics! Character count! Text/Language recognition! White and Black listing of data! Statistical attributes! Information entropy 54
Here s an example of a poorly carved image file What happened here? 55
Looks easy? Not really: Fragmentation! Normally, files are broken up and stored into clusters! For file B, carving clusters sequentially yields correct results! But, data clusters may be out of order! Or be interleaved with clusters of other files 56
Assuming cluster continuity is not sufficient! Fragmentation statistics show that files are generally not fragmented, but the files that are most likely to be fragmented are those that are forensically important:! 16% of JPEGS! 17% of Word Docs! 22% of AVI! 58% of PST MS Outlook files! Fragmentation becomes more of a problem when:! The system is low on disk space! Files are appended to! Wear-level algorithms are used (e.g. SSDs) 57
Summary: To recover deleted files! Use forensic tools to analyze meta-data that can point to the location of the deleted file s clusters! E.g., Encase, TSK! Leverage file carving tools to look for deleted files in the unallocated disk space! E.g., foremost, scalpel 58
Data hiding techniques 59
Intentionally hide data 60
! HPA was added in ATA-4 spec Host protected area! Computer vendors can store data that would not be erased when a user formats the HDD! Can be detected by comparing output of ATA commands! An HPA can contain system files, hidden information, or both 61
! A file must allocate a full cluster, even if it needs part of it Slack space! The unused bytes in the last cluster are called slack space! If unused bytes not wiped, it may contain data from previous files or memory File contents Slack space 62
Conclusions! To read the contents of disk images, we must understand who data is organized into several layers of abstraction! Deleted files can often be recovered by looking into the file system s meta-data and / or using file carving! There are regions within the disk address space that are ignored by the file system where data can be hidden 63
References! Primary bibliography! Bryan Carrier, File System Analysis, 2005 64
Next class! Operating systems forensics 65