The Active Block I/O Scheduling System (ABISS) Benno van den Brink Philips Research, Eindhoven, The Netherlands Werner Almesberger Buenos Aires, Argentina 1
ABISS Extension to Storage subsystem Allow applications to prioritize their I/O Provides a guaranteed hard disk I/O Make sure the data is there when the application needs it Provide system-wide control Make better use of available performance Outline Introduction and Overview Implementation Measurements Future work and Conclusions 2
Introduction HDDs in CE equipment Real-time streams: hick-ups are not acceptable Multiple streams 'PC world' solution: Best Effort + performance overkill Large buffers in application (cost, latency) Performance (bandwidth, CPU) much higher than average need over-dimension your system Cost, power consumption,... Just-in-time: make system guarantee bitrate for RT streams Handle BE traffic in remaining time (make sure there is!) 3
Overall architecture User space Kernel ABISS daemon ABISS application VFS Page/buffer cache elevator Block I/O Device driver disk 4
implementation User-space daemon Application POSIX API (VFS) Configuration interface (ioctl) Scheduler API File system driver Allocator Scheduler cores Scheduler library Page cache / Page IO Request queue(s) Elevator Block device layer Block device driver 5
scheduler Application moves playout point Application Scheduler caches data location Application reads data Scheduler Location map get_block File system driver and VFS Playout buffer Scheduler may upgrade requests Elevator Block IO Scheduler prefetches data 6
Reading block position store on-disk location of RT files File block Disk block Length Location map (per file) ABISS redirects the file system's get_block function? If file is mapped, use location map If file is not mapped, use original get_block File system driver's get_block 7
Playout buffer Application playout point Moves freely Advances at the requested rate (or less) Kernel playout point Page is no longer used Page is accessible and up to date Page is being loaded Pending read request 8
Rate control Playoutpoints differ by more than batch size Add credit Done at rate r Credit Credit limit Timer expires No Load more? Yes credit >= 1 page? Yes Reduce credit by one page and move buffer 1 page No Set timer Set timer when credit reaches 1 page Reduce credit 9
Buffer size Application Read size 'jitter' Kernel Elevator + disk Read batching Application-dependent buffering Read size or work area Application jitter Kernel latency Read batching IO latency Operating system and hardware dependent buffering 10
User-space daemon Keeps track of systemwide use Decides whether bandwidth can be given e.g. Quotas ABISS daemon Service request Cons. check Application Admission control ABISS framework Scheduler Set-up data structs generate response 11
elevator Elevator Areas (2, read and write) Footprint Priority queues (8) Sort Back Cursor RB tree by start sector Sort queue FIFO queue Overlaps 3 2 Front Current LIFO queue 1 12
Scope of elevator and scheduler ABISS daemon ABISS ABISS anticipatory test test foo test 13
API: request ABISS service on file static struct abiss_attach_msg msg; static struct abiss_sched_test_prm prm; fd = open("name", O_RDONLY); msg.header.type = abiss_attach; prm.ra_bytes = app_buffer; prm.fill_bps = byterate; msg.sched_prm = &prm;... if (ioctl(fd, ABISS_IOCTL, &msg) < 0) /* handle error */; FILE *abiss_fopen(const char *path, const char *mode, const int byterate, const int app_buffer) 14
API: update playout point static struct abiss_position_msg msg; got = read(fd, buffer, BUFFER_SIZE); msg.header.type = abiss_position; msg.pos = 0; msg.whence = SEEK_CUR; ioctl(fd, ABISS_IOCTL, &msg); abiss_fread(void *ptr, size_t size, size_t nmemb, FILE *fp) 15
Measurement H/W setup CF Slot USB IEEE 1394 Single Voltage Power Supply Fast Ethernet CardBus (PCMCIA) SDRAM IDE (2x) PCI Mini PCI Graphics Controller Legacy I/O (floppy, printer, Game etc) Transmeta Crusoe (800Mhz) DDRAM 16
Measurements, description Four RT streams; 1 MB/s 64 kb read size 105 MB files Playout buffer 564 kb 20 pages batch size Measure time between read request and data retrieval Loop with BE file copy 175 MB file Read size 128 kb Different elevators: ABISS RT ABISS BE Anticipatory Deadline CFQ Noop 17
Measurements, graphs 18
BE performance Elevator BE reader performance [Mbytes/s] One RT reader Four RT readers ABISS RT, 10 pg batch 7.7 0.3 RT, 20 pg batch 8.0 2.5 RT, 40 pg batch 8.7 4.0 RT, 80 pg batch 9.4 5.8 RT, 160 pg batch 9.5 6.6 BE 7.7 1.5 Anticipatory 7.8 2.7 Deadline 7.9 1.8 CFQ 7.9 1.8 Noop 7.9 2.0 19
Future work Ext3 filesystem Add RT writing Investigate buffering requirements Allocator Different schedulers, e.g. Power for portable players Investigate Asynchronous I/O with elevator priorities 20
Conclusions ABISS framework Allows for different, run-time switchable services 'test' scheduler implemented User-space daemon Evaluated performance 'test' scheduler Guarantee bandwidth without need for large app. buffers Favorable comparison to existing elevators http://abiss.sourceforge.net/ 21
Questions? 22
ABISS Application User space API/VFS Scheduler & Allocator File System Driver Kernel Elevator Block Device Layer Device Driver Hardware 23
scheduler Time-driven requests Scheduler File System Driver Memory Management etc. Application-driven requests Elevator Page Cache & Block Device Layer Scheduler assigns priority 24
system overview (detailed) Application Middleware POSIX API (VFS) Configuration interface (ioctl) Allocator API Allocator cores Allocator library Scheduler API File system driver MM, etc. New Scheduler cores Scheduler library Changed Page cache / Page IO Request queue(s) Elevator Block device layer Block device driver 25
opening a file Open file, then request ASFS (library uses ASFS_IOCTL) Application Middleware POSIX API (VFS) Configuration interface (ioctl) Scheduler initialized playout buffer and asks file system driver for file location Playout buffer Location map Scheduler API Scheduler cores Scheduler library File system driver Page cache / Page IO Block device layer Block device driver 26
reading a file read() system call Application Middleware Reading a file (read) File -> disk block translation Playout buffer Location map Elevator asks scheduler for priority Scheduler API Scheduler cores Scheduler library POSIX API (VFS) File system driver Page cache / Page IO Page(s) in buffer cache Not in cache Request queue(s) Elevator Block device layer Block device driver 27
preloading pages Move playout point (through middleware and ioctl) Scheduler moves playout window, according to rate (timer) Application Middleware POSIX API (VFS) Configuration interface (ioctl) Playout buffer Location map Scheduler API Scheduler cores Scheduler library File system driver Scheduler requests new pages entering playout buffer Elevator asks scheduler for priority Page cache / Page IO Request queue(s) Elevator Block device layer Block device driver 28
playout buffer Playout buffer/window Buffer for movement/delay/batching Playout point Work area Read-ahead pages Page is loaded and locked in memory Page is being loaded Page will be loaded later Page not used anymore 29
elevator Priority assigned by scheduler Priority 0 (highest, RT only) Priority 1 Priority 7 (lowest, default) Real Time File System Driver Best Effort Device Driver Elevator may reorder requests inside each priority 30
Credit & moving playout point No Enough credit?? Yes Playout point Application moves playout point Playout buffer Drop first page, shift window Move immediately Delayed movement Request new page or upgrade existing request Page cache Page arrives (in page cache) 31