A Semi Preemptive Garbage Collector for Solid State Drives Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim Presented by Junghee Lee
High Performance Storage Systems Server centric services File,web & media servers,transaction processing servers Enterprise scale Storage Systems Information technology focusing on storage, protection, retrieval of data in large scale environments High Performance Google's massive server farms Storage Systems Storage Unit Hard Disk Drive 2
Spider: A Large scale Storage System Jaguar Peta scalecomputing computing machine 25,000 nodes with 250,000 cores and over 300 TB memory Spider storage system The largest center wide Lustre based file system Over 10.7 PB of RAID 6 formatted capacity 13,400 x 1 TB HDDs 192 Lustre I/O servers Over 3TB of memory (on Lustre I/O servers) 3
Emergence of NAND Flash based SSD NAND Flash vs. Hard Disk Drives Pros: Semi-conductor technology, no mechanical parts Offer lower access latencies μs for SSDs vs. ms for HDDs Lower power consumption Higher robustness to vibrations and temperature Cons: Limited lifetime 10K - 1M erases per block High cost About 8X more expensive than current hard disks Performance variability 4
Outline Introduction Background and Motivation NAND Flash and SSD Garbage Collection Pathological Behavior of SSDs Semi Preemptive Garbage Collection Evaluation Conclusion 5
NAND Flash based SSD fwrite Process Process (file, data) Application Block write (LBA, size) File System (FAT, Ext2, NTFS ) Block Device Driver OS Page write (bank, block, page) Block Interface (SATA, SCSI, etc) Device Memory SSD CPU (FTL) Flash Flash Flash Flash 6
NAND Flash Organization Plane 0 Package Register Read Die 0 Die 1 Block 0 Page 0 0.025 ms Write Plane 0 Plane 1 Plane 2 Plane 3 Plane 0 Plane 1 Plane 2 Plane 3 Page 63 Block 2047 0.200 ms Erased Page 0 Page 63 Erase 1.500 ms 7
Out Of Place Write Logical-to-Physical Address Mapping Table LPN0 PPN1 LPN1 PPN4 LPN2 PPN2 PPN3 LPN3 PPN5 P0 P1 P2 P3 P4 P5 P6 P7 Physical Blocks I V VI EV V V E E Write to LPN2 Invalidate PPN2 Write to PPN3 Update table 8
Garbage Collection Select Victim Block Move Valid Pages Erase Victim Block P0 P1 P2 P3 P4 P5 P6 P7 Physical Blocks E I VE I EI VE I V V EV EV 2 reads + 2 writes + 1 erase= 2*0.025 + 2*0.200 + 1.5 = 1.950(ms)!! 9
Pathological Behavior of SSDs Does GC have an impact on the foreground operations? If so, we canobservesuddenbandwidth sudden bandwidth drop More drop with more write requests More drop with more bursty workloads Experimental Setup SSD devices Intel (SLC) 64GB SSD SuperTalent (MLC) 120GB SSD I/O generator Used libaio asynchronous I/O library for block level testing 10
MB/s Bandwidth Drop for Write Dominant Workloads oads Experiments 280 260 240 220 200 180 160 140 120 Measured bandwidth for 1MB by varying read write ratio Intel SLC (SSD) 1MB Sequential 0 10 20 30 40 50 60 Time (Sec) MB/s 240 220 200 180 160 140 120 1MB Sequential SuperTalent MLC (SSD) 0 10 20 30 40 50 60 Time (Sec) 80% Write 20% Read 60% Write 40% Read 40% Write 60% Read 20% Write 80% Read 80% Write 20% Read 60% Write 40% Read 40% Write 60% Read 20% Write 80% Read Performance variability increases as we increase write-percentage of workloads. 11
Performance Variability for Bursty Workloads Experiments Measured SSD write bandwidth for queue depth (qd) is 8 and 64 Normalized I/O bandwidth with a Z distribution Intel SLC (SSD) SuperTalent MLC (SSD) Performance variability increases as we increase the arrival- rate of requests ests (bursty workloads). 12
Lessons Learned From the empirical study, we learned: Performance variability increasesas the percentageof writes in workloads increases. Performance variability increases with respect to the arrival rate of write requests. This is because: Any incoming requests during the GC should wait until the on going GC ends. GC is not preemptive 13
Outline Introduction Background and Motivation Semi Preemptive Garbage Collection Semi Preemption Further Optimization Level of Allowed Preemption Evaluation Conclusion 14
Technique #1: Semi Preemption W z Request GC R x W x R y W y E Time W z Preemptive GC W z Non-Preemptive GC R x Read page x Data transfer W x Write page x Meta data update E Erase a block 15
Technique #2: Merge R y Request GC R x W x R y W y E Time R x Read page x Data transfer W x Write page x Meta data update E Erase a block 16
Technique #3: Pipeline R z Request GC R x W x R y W y E Time R z R x Read page x Data transfer W x Write page x Meta data update E Erase a block 17
Level of Allowed Preemption Drawback of PGC : The completion time of GC is delayed May incur lack of free blocks Sometimes need to prohibit preemption States of PGC State 0 State 1 State 2 State 3 Garbage Read Write collection requests requests X O O O O O O X X X 18
Outline Introduction Background and Motivation Semi Preemptive Garbage Collection Evaluation Setup Synthetic Workloads Realistic Workloads Conclusion 19
Setup Simulator MSR s SSD simulator based on DiskSim Workloads Synthetic workloads Used the synthetic workload generator in DiskSim Realistic workloads Workloads Average request Read ratio Arrival rate size (KB) (%) (IOP/s) Write Financial 7.09 18.92 47.19 dominant Cello 706 7.06 19.63 74.24 Read dominant TPC-H 31.62 91.80 172.73 OpenMail 9.49 63.30 846.62 20
Performance Improvements for Synthetic Workloads oads Varied four parameters: request size, inter arrival time, sequentiality and read/write ratio Varied one at a time fixing i others Average e respons se time (m ms) 2.5 2.0 1.5 1.0 0.5 0 8 16 32 64 Request size (KB) 4.5 4.0 3.5 3.0 2.5 2.0 1.5 10 1.0 0.5 0 eviation St tandard d NPGC PGC NPGC std PGC std 21
Performance Improvement for Synthetic Workloads oads(co (con t) Bursty Random dominant Write dominant 10 5 3 1 0.8 0.6 0.4 0.2 Inter-arrival time (ms) Probability of sequential access 0.8 0.6 0.4 0.2 Probability of read access 22
Performance Improvement for Realistic Workloads oads Average Response Time N ormalized ave. res sponse tim me 1.0 0.8 0.6 0.4 0.2 0 Financial Cello TPC-H OpenMail Variance of Response Times No ormalized standard d deviatio on 1.0 0.8 0.6 0.4 0.2 0 Financial Cello TPC-H OpenMail Improvement of average Improvement of variance of response time by 6.5% and response time by 49.8% and 66.6% for Financial and Cello. 83.3% for Financial and Cello. 23
Conclusions Solid state drives Fast access speed Performance variation garbage collection Semi preemptive garbage collection Service incoming requests ests during ringgc Average response time and performance variation are reduced by up to 66.6% and 83.3% 24
Questions? Contact info Junghee Lee junghee.lee@gatech.edu Electrical and Computer Engineering Georgia Institute of Technology 25
26 Thank you!