Bringing&the&Performance&to&the& Cloud &
|
|
- Georgia Nelson
- 5 years ago
- Views:
Transcription
1 Bringing&the&Performance&to&the& Cloud & Dongsu&Han& KAIST & Department&of&Electrical&Engineering& Graduate&School&of&InformaAon&Security& &
2 The&Era&of&Cloud&CompuAng& Datacenters&at&Amazon,&Google,&Facebook& & 2
3 Customers&of&the&Cloud& Customers& rent &either&physical&or&virtual& machines&from&the&cloud.& Public&and&private&cloud:&External&or&internal& customers&enaaes& &&&&&& 3&
4 Scale&of&the& Cloud & Facebook:& hundreds)of)thousands)of)machines. ) MicrosoP:&1&million&servers&& Google&envisions&10&million&servers.&& Google&spends&about&$3&Billion&every&year&on&data& centers.*& * sdata-center-spend-slows &
5 Efficiency&is&important& What&if&we&can&increase&a&single&machine s& performance&by&e.g.,&10,&20,&40%?& Equipment&cost&savings& Energy&savings:&By&2012,&the&cost&of&power&for&the& data¢er&is&expected&to&exceed&the&cost&of&the& original&capital&investment.&[u.s&doe]& Reduced&complexity&in&system&design&as&#&of& machine&involved&decreases& 5&
6 How do we improve the Cloud efficiency (and performance)? We start with a single machine. What does a machine look like? What kind of workload does it handle? 6&
7 What&does&a&machine&look&like&today?& General&purpose&hardware&(x86&architecture)& MulAcore:&4&~60&cores&(tens&of&CPU&cores)& MulAple&10hGigabit&Ethernet&(becoming&the&norm)& & 7
8 A&Typical&Cluster&ConfiguraAon& Front-end Web tier Back-end Cache tier Service tier (e.g., Advertising) Storage tier 8&
9 A&Typical&Cluster&ConfiguraAon& Front-end Web tier Back-end Cache tier Front-end interaction happens over HTTP (TCP). Back-end interaction can use any protocol. 9&
10 A&Typical&Cluster&ConfiguraAon& Web tier Up&to&3x&improvement&in& performance&&& Cache tier Up&to&7x&improvement&in& performance&&& 1. Improving the performance of a cache server [NSDI 14] Joint work with H. Lim, M. Kaminsky, and D. Andersen 2. Improving the performance of a Web server [NSDI 14] w/ E. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, and K. Park 10&
11 InhMemory&KeyhValue&Cache& Request& Keyhvalue&Cache& Database& Keyhvalues&are&stored&in&DRAM&( Memcache )& 11&
12 InhMemory&KeyhValue&Cache& Request& What is the workload of a K-V cache like? Keyhvalue&Cache& [SOSP]&2007,&2009,&2011&[NSDI]&2013a,&2013b&[EuroSys]&2012& [SIGCOMM]&2012&[SOCC]&2010,&2012&[SIGMETRICS]&2012&[ATC]&2013& 12& Database&
13 Workload&of&a&Keyhvalue&cache& CDF 5 different memcached pools Facebook&KhV&size&distribuAon&[SIGMETRICS2012]& 13&
14 Diverse&Workload&[SIGMETRICS2012]& Read intensive Write intensive 14&
15 Fast&InhMemory&KeyhValue&Cache& Must&handle&small,&variablehlength&items&efficiently& Support&both&readh&and&writehintensive&workloads& & 15&
16 Current&KhV&Cache&Performance& 80& 70& 60& 50& 40& 30& 20& 10& 0& 1.4& Workload:&YCSBhB&(95%&GET,&5%&PUT)& Throughput&(M&operaAons/sec)& <&1& 4.4& 8.9& 77.8& Memcached& RAMCloud& MemC3& Masstree& (Ideal)& Endhtohend&performance&using&UDP& 16&
17 ReadhIntensive&Workload& 80& 70& 60& 50& 40& 30& 20& 10& 0& 1.5& Workload:&YCSBhB&(95%&GET,&5%&PUT)& Throughput&(M&operaAons/sec)& 3.1& 6.3& 23.5& 77.8& Memcached& RAMCloud& MemC3& Masstree& (Ideal)& Endhtohend&performance&using&our&opAmized&network&stack& 17&
18 ReadhIntensive&Workload& 80& 70& 60& 50& 40& 30& 20& 10& 0& 1.5& Workload:&YCSBhB&(95%&GET,&5%&PUT)& Throughput&(M&operaAons/sec)& 3.1& Problem(1:( Large(gap( 6.3& 23.5& 77.8& Memcached& RAMCloud& MemC3& Masstree& (Ideal)& Endhtohend&performance&using&our&opAmized&network&stack& 18& Maximum& packets/sec& asainable&using& UDP&protocol&
19 WritehIntensive& Workload:&YCSBhA&(50%&GET,&50%&PUT)& Throughput&(M&operaAons/sec)& 80& 70& 60& 50& 40& 30& 20& 10& 0& Problem(2:( Performance(collapses( under(heavy(write( <&1& <&1& <&1& 19& 11.7& 77.8& Memcached& RAMCloud& MemC3& Masstree& (Ideal)& Endhtohend&performance&using&our&opAmized&network&stack& Maximum& packets/sec& asainable&using& UDP&protocol&
20 MICA&Performance&Preview& Workload:&YCSBhA&(50%&GET,&50%&PUT)& Throughput&(M&operaAons/sec)& 80& 70& 60& 50& 40& 30& 20& 10& 0& MICA:(70.1(Mops( <&1& <&1& <&1& 20& About(14(ns(per(request( Maximum& 77.8& packets/sec& asainable&using& UDP&protocol& 11.7& Memcached& RAMCloud& MemC3& Masstree& (Ideal)& Endhtohend&performance&using&our&opAmized&network&stack&
21 MICA&Approach& MICA&redesigns&a&KhV&cache&in&a&holisAc&way.&& Server& Client& NIC& CPU& CPU& Memory& 2.&Request& direcaon& 1.&Parallel& data&access& 3.&Keyhvalue& data&structures& 21&
22 Parallel&Data&Access& Server&node& Client& NIC& CPU& CPU& Memory& 2.&Request& direcaon& 1.&Parallel& data&access& 3.&Keyhvalue& data&structures& Modern&CPUs&have&many&cores&(8,&12,& )& We&must&exploit&CPU¶llelism&efficiently.& 22&
23 Concurrent&Read/Write&(CRCW)& Any&core&can&read/write&any&part&of&memory& & CPU&core& CPU&core& Memory& +)&Can&distribute&load&to&mulAple&cores& Memcached,&RAMCloud&[SOSP],&MemC3&[NSDI],& Masstree&[EuroSys]& h)&limits&scalability&with&mulaple&cores& Lock&contenAon& Expensive&cacheline&transfer&caused&by& concurrent&writes&on&the&same&memory&locaaon& 23&
24 MICA&Scales&Well&with&Many&Cores& Throughput&(Mops)& 1( YCSBhA& Skewed,&50%&GET& MemC3& Memcached& RAMCloud& 0.1( 2& 4& 8& 16& Number&of&CPU&cores& 24&
25 MICA&Scales&Well&with&Many&Cores& Throughput&(Mops)& 100( YCSBhA& Skewed,&50%&GET& 10( 1( MICA& Masstree& MemC3& Memcached& RAMCloud& 0.1( 2& 4& 8& 16& Number&of&CPU&cores& 25&
26 MICA s¶llel&data&access& ParAAon&data&using&the&hash&of&keys& Exclusive&Read/Write&(EREW)& Only&one&core&accesses&a&parAcular&parAAon& CPU&core& ParAAon& & CPU&core& ParAAon& +)&Avoids&synchronizaAon/interhcore& communicaaon&[hhstore,voltdb]& h)&can&be&slow&under&skewed&key&popularity& A&popular&item&cannot&be&served&by&mulAple&cores& 26&
27 EREW&Outperforms&CRCW& Throughput&(Mops)& 80& 77.0& 76.8& 70& 60& 50& 70.1& Skewed & &Same&as&YCSB s& Zipf&key&popularity&with&skewness=0.99& 70.2& 70.9& 70.3& 40& 30& 20& 10& 0& Uniform,&50%&GET& Uniform,&95%&GET& Skewed,&50%&GET& Skewed,&95%&GET& EREW& 35.4& 19.6& CRCW& 27&
28 Skew&Does&Not&Hurt&(Much)& Throughput&(Mops)& 8& 6& 4& 2& Some&cores&can&process&more& requests&under&skewed&workloads& Uniform& Skewed& & 0& 0& 1& 2& 3& 4& 5& 6& 7& 8& 9& 10& 11& 12& 13& 14& 15& Perhcore&performance&breakdown& Hot&parAAons&contain&a&few&popular&keys,& making&cpu&cache&very&effecave& Core&#& 28&
29 Request&DirecAon& Server&node& Client& NIC& CPU& CPU& Memory& 2.&Request& direcaon& 1.&Parallel& data&access& 3.&Keyhvalue& data&structures& EREW&requires&correct&request&direcAon.& A&request&must&be&sent&to&the&core/parAAon&that& handles&the&requested&key.& 29&
30 Common&Request&DirecAon&Scheme& FlowFbased(affinity( Server&node& Client& Client& NIC& CPU& CPU& Using&5htuple&to&assign&packets&to&cores& Useful&for&flowhbased&protocols&(e.g.,&TCP)& Does¬&work&well&with&MICA s&erew& A&client&can&request&keys&from&different&parAAons& 30&
31 MICA s&request&direcaon& ObjectFbased(affinity( MICA&overcomes&commodity&NICs &limited& programmability&by&using&client&assistance& & Server&node& Client& Client& Key&2& Key&1& Key&1& Key&2& NIC& CPU& CPU& Key s¶aon&id&is&put&into& packet&header&(udp&dest&port)& Programmed&to&use&the&embedded&parAAon& ID&to&direct&packets&to&cores& Uses&Intel&Data&Plane&Development&Kit&(DPDK)&for& lowhoverhead&burst&packet&i/o&bypassing&os&kernel& 31&
32 NIC&HW&for&Request&DirecAon& Throughput&(Mops)& 80& 70& 60& 50& Uniform& Skewed& 77.0& 70.1& 40& 30& 20& 10& 34.8& 25.6& 0& SoPwarehonly& Clienthassisted&hardwarehbased& Using&EREW&for¶llel&data&access& 32&
33 KeyhValue&Data&Structures& Server&node& Client& NIC& CPU& CPU& Memory& & 2.&Request& direcaon& 1.&Parallel& data&access& 3.&Keyhvalue& data&structures& Significant&impact&on&keyhvalue&processing&speed& New&design&required&to&support&both&read&and& write&operaaons&at&high&speeds& 33&
34 MICA s&keyhvalue&data&structures& Each&parAAon&has&two&data&structures:& Circular&log&store&& Lossy&concurrent&hash&index& Omised&in&this&talk:&numerous&opAmizaAons& Garbage&collecAon& LRU&approximaAon& NUMAhaware&memory&allocaAon,&memory&mapping& memory&prefetching,&cachehfriendly&data&structures& Concurrency&support&(for& CREW )& 34&
35 Circular&Log&Store& Allocates&space&for&keyhvalue&items&of&any&length& Simple&garbage&collecAon&and&free&space& defragmentaaon& New&item&is& appended&at&tail& Head& Insufficient&space& for&new&item?& Tail& (fixed&log&size)& Evict&oldest&item& at&head&(fifo)& Tail& Head& 35&
36 Lossy&Concurrent&Hash&Index& Indexes&items&in&the&circular&log&with&a&seth associaave&hash&index& hash(key)& Circular& log& bucket&0& bucket&1& & bucket&nh1& Hash& index& Key,Val& Full&bucket?&Evict&oldest&entry&from&it&(FIFO)& Allows&fast&indexing&of&new&keyhvalue&items& 36&
37 KeyhValue&Data&Structure&Comparison& Throughput&(Mops)& 80& 70& 60& 50%&GET& 95%&GET& EREW,&Skewed,&Small&Items& 70.1& 70.2& 50& 40& 30& 29.6& 20& 10& 0& 6.9& ParAAoned&Masstree& MICA& 37&
38 Throughput&Comparison& Throughput&(Mops)& 80& 70& 60& 50& 40& Uniform,&50%&GET& Uniform,&95%&GET& Skewed,&50%&GET& Skewed,&95%&GET& 77.0& 76.8& 70.1& 70.2& 77.8& 77.8& 77.8& 77.8& 30& 20& 10& 0& 23.5& 17.4& 11.9& 11.7& 6.1& 6.3& 0.7& 1.5& 2.2& 3.1& 0.7& 1.5& 0.6& 0.5& 0.9& 0.9& Memcached& RAMCloud& MemC3& Masstree& MICA& (Ideal)& Endhtohend&performance&using&our&opAmized&network&stack& 38&
39 ThroughputhLatency&(on&Ethernet)& Average&latency&(μs)& 140& 120& 100& 80& 60& 40& 20& 140& 120& 100& 80& Memcached&UDP& MICA& 60& Two(orders(of(magnitude(( 40& 20& 0& 0& 0.1& 0.2& 0.3& 0& 0& 20& 40& 60& 80& Throughput&(Mops)& Memcached&UDP&using&standard&socket&I/O& 39&
40 Summary& MICA&takes&a&holisAc&approach&to&designing& fast&inhmemory&keyhvalue&caches.& Efficient¶llel&data&access&& Hardwarehbased&request&direcAon& OpAmized&data&structures&for&keyhvalue&caching& MICA&consistently&achieves&high&performance& under&diverse&workloads.& 40&
41 Typical&Server&Cluster&ConfiguraAon& Web tier Up&to&3x&improvement&in& performance&&& Cache tier Up&to&7x&improvement&in& performance&&& 1. Improving the performance of a cache server [NSDI 14] Joint work with H. Lim, M. Kaminsky, and D. Andersen 2. Improving the performance of a Web server [NSDI 14] E. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, and K. Park 41&
42 Workload&for&Userhfacing&Servers Measurement of TCP flows in commercial cellular backbone [Woo,mobisys 13] 100%& 80%& 90%& CDF 60%& 40%& 20%& 0%& 0& 1K& 4K& 8K& 16K& 32K& 64K&128K&256K&512K&1M&Total& Flow&Size&(Bytes)& Over 90% (50%) of TCP flows are smaller than 64 KB (4 KB). 42
43 Web&Server&Performance Large&transfers:&easy&to&fill&up&10&Gbps&& Small&transacAons:&1.2&Gbps&under&SpecWeb) Kernel&is¬&designed&well&for&mulAcore&systems.& ConnecKons/sec((x(10 5 ) 2.5&& 2.0&& 1.5&& 1.0&& 0.5&& 0.0&& TCP(ConnecKon(Setup(Performance Linux: Intel Xeon E Intel 10Gbps NIC Performance Meltdown 1& 2& 4& 6& 8& Number(of(CPU(Cores 43
44 Performance&Analysis&of&a&Web&Server 83% of CPU usage spent inside kernel! CPU&usage&on&Linuxh3.10& Web&server&(Lighspd)&serving&a&64&byte&file 17%& 45%& 34%& 4%& Kernel& Packet&I/O& TCP/IP& ApplicaAon& 44
45 Inefficiencies&in&Kernel&TCP/IP&Stack& 1.&Lack&of&connecAon&locality& & & & ApplicaAon& thread CPU Packet/& TCP&processing&& (ksopirqd) TCP send/recv buffer Core 0 Core 1 packet 45&
46 Inefficiencies&in&Kernel&TCP/IP&Stack& 1.&Lack&of&connecAon&locality& & while (1) { epoll_wait( ) fd = accept(listen_fd, NULL); & read(fd, buf, 1024); & write(fd, buf, 1024); } while (1) { epoll_wait( ) fd = accept(listen_fd, NULL); read(fd, buf, 1024); write(fd, buf, 1024); } ApplicaKon(thread ApplicaKon(thread Core 0 Core 1 AccepthAffinity[EUROSYS 12]:&connecAon&affinity&(only)& Linux&SO_REUSEPORT&opAon&(v3.9.4):&perhcore&listen&socket& 46&
47 Inefficiencies&in&Kernel&TCP/IP&Stack& 2.&Shared&file&descriptor&space& &&&&&fd&=&accept(listen_fd,&null)& Socket API Kernel Socket layer Virtual file system layer e.g., vfs_open() File System VFS overhead: Creates inode for each socket file descriptor. Finds the lowest available integer [POSIX] 47&
48 Inefficiencies&in&Kernel&TCP/IP&Stack& 3.&System&call&overhead&(frequent&and&expensive)& & & & & & while (1) { epoll_wait( ) fd = accept(listen_fd, NULL); read(fd, buf, 1024); write(fd, buf, 1024); } 4.&Inefficient&perhpacket&processing& Perhpacket&memory&allocaAon/deallocaAon&overhead& MegaPipe[OSDI 12]:&parAally&address&problems&1,2,3&& & &All&prior&work&reuses&kernel s&tcp/ip.& 48&
49 mtcp&approach mtcp:&a&highhperformance&userhlevel&tcp& design&for&mulacore&systems& Cleanhslate&approach&to&divorce&kernel s& complexity& 1. Leverage&userhlevel&packet&I/O& 2. Support&mulAcorehaware&flow&processing& 3. Provide&a&userhlevel&socket&API& 49
50 mtcp&overview& ApplicaAon&thread User process ApplicaAon&thread User-space memory mtcp&socket&interface mtcp&thread mtcp&epollhlike&& event&system mtcp&thread Kernel Userhlevel&Packet&I/O&library Device&Driver Core 0 Core 1 50&
51 mtcp&overview& Core-affinity (1) Per-core file descriptor, listen socket (2) Kernel bypass, No system call (3) Batched packet processing (4) Per-core Event Q Socket buffer ApplicaAon&thread mtcp&socket&interface mtcp&thread Userhlevel&Packet&I/O&library Device&Driver ApplicaAon&thread mtcp_accept() mtcp&epollhlike&& event&system mtcp&thread Event Q Socket buffer Core 0 Core 1 Packet queue Receiver-side scaling (H/W) 51&
52 mtcp&overview& Core-affinity (1) Per-core file descriptor, listen socket (2) Kernel bypass, No system call (3) Batched packet processing (4) ApplicaAon&thread mtcp&socket&interface mtcp&thread ApplicaAon&thread mtcp&epollhlike&& event&system mtcp&thread Userhlevel&Packet&I/O&library Device&Driver Core 0 Core 1 Packet queue 52&
53 mtcp&design& Highly&scalability&on&mulAcore&systems& 25x&faster&than&latest&Linux&version& 3x&faster&than&MegaPipe& Easy&to&use;&lisle&porAng&effort& Modified&29&lines&of&the&Apache&library&(out&of&66,493)&& EvaluaAon& HTTP&server/client:&lighspd,&Apache& Web&Replayer:&replays&cellular&backbone&traffic&(Korea)& SSL&Proxy& 53&
54 MulAcore&Scalability 64B&message&per&each&connecAon& Heavy&connecAon,&small&packet&processing&overhead& 25x&Linux,&5x&REUSEPORT,&3x&MegaPipe&[OSDI&2012]& TransacKons/sec((x(10 5 ) 15& 12& 9& 6& 3& 0& 0& 1 2& 4& 6& 8& Number(of(CPU(Cores Linux& REUSEPORT& MegaPipe& mtcp&
55 Message&Benchmark Scaling&by&message&size&& Persistent&connecAon&with&64&byte&messages Throughput((Gbps) 10&& 8&& 6&& 4&& 2&& 0&& Linux& REUSEPORT& MegaPipe& mtcp& Link&saturaAon 64B& 256B& 1KiB& 4KiB& 8KiB& Message(Size TransacKons/sec((x(10 5 ) 40& 30& 20& 10& 0& 1& 2& 8& 32& 64& 128& #(Messages/connecKon
56 Web&Server&(Lighspd)&Performance SpecWeb2009&staAc&file&workload&(738B&average)& 3.2x&faster&than&Linux,&1.5x&faster&than&MegaPipe& Throughput&(Gbps) 5& 4& 3& 2& 1& 0& Throughput&(Gbps)& TransacAons/sec& Linux& REUSEPORT& MegaPipe& mtcp& 4& 3.5& 3& 2.5& 2& 1.5& 1& 0.5& 0& TransacAons/s&(x&10 5 )
57 Summary mtcp:&a&highhperformance&userhlevel&tcp&stack&for& mulacore&systems& Efficiently&uAlize&mulAcore&resources&by& EliminaAng&system&call&overhead& Reducing&context&switch&cost&by&event&batching& Using&perhcore&resource&management& Using&cachehaware&threading& Achieve&high&performance&scalability& Small&message&transacAons:&3x&to&25x& ExisAng&applicaAons:&&33%&(SSLShader)&to&320%&(lighspd)&
58 Conclusion& Despite&many&efforts&from&academia&and&industry,& there&sall&exists&lots&of&room&for&innovaaons&for& Cloudhbased&systems&and&services.& EssenAal&building&blocks&for&Cloud&services&can& benefit&from&a&holisac,&mulacorehaware&design& that&leverages&the&underlying&h/w&and&that& carefully&considers&the&workload.& More&research&is&ahead&in&bringing&new& applicaaons&to&the&cloud.&& 58&
59 Reference& [DPDK]& hsp:// technology/packethprocessinghishenhancedhwithhsopwarehfromhintelh dpdk.html& [FacebookMeasurement]&Berk&AAkoglu,&Yuehai&Xu,&Eitan& Frachtenberg,&Song&Jiang,&and&Mike&Paleczny.&Workload&analysis&of&a& largehscale&keyhvalue&store.&in&proc.)sigmetrics)2012.) [Masstree]&Yandong&Mao,&Eddie&Kohler,&and&Robert&Tappan&Morris.& Cache&CraPiness&for&Fast&MulAcore&KeyhValue&Storage.&In&Proc.) EuroSys)2012.) [MemC3]&Bin&Fan,&David&G.&Andersen,&and&Michael&Kaminsky.&MemC3:& Compact&and&Concurrent&MemCache&with&Dumber&Caching&and& Smarter&Hashing.&In&Proc.)NSDI)2013.) [Memcached]&hsp://memcached.org/& [RAMCloud]&Diego&Ongaro,&Stephen&M.&Rumble,&Ryan&Stutsman,&John& Ousterhout,&and&Mendel&Rosenblum.&Fast&Crash&Recovery&in& RAMCloud.&In&Proc.)SOSP)2011.) 59&
MegaPipe: A New Programming Interface for Scalable Network I/O
MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han in collabora=on with Sco? Marshall Byung- Gon Chun Sylvia Ratnasamy University of California, Berkeley Yahoo! Research tl;dr?
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationMICA: A Holistic Approach to Fast In-Memory Key-Value Storage
MICA: A Holistic Approach to Fast In-Memory Key-Value Storage Hyeontaek Lim, Carnegie Mellon University; Dongsu Han, Korea Advanced Institute of Science and Technology (KAIST); David G. Andersen, Carnegie
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationLight: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University
Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack Dan Li ( 李丹 ) Tsinghua University Data Center Network Performance Hardware Capability of Modern Servers Multi-core CPU Kernel
More informationPASTE: A Network Programming Interface for Non-Volatile Main Memory
PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationFlexNIC: Rethinking Network DMA
FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]
More informationPASTE: Fast End System Networking with netmap
PASTE: Fast End System Networking with netmap Michio Honda, Giuseppe Lettieri, Lars Eggert and Douglas Santry BSDCan 2018 Contact: @michioh, micchie@sfc.wide.ad.jp Code: https://github.com/micchie/netmap/tree/stack
More informationKernel Bypass. Sujay Jayakar (dsj36) 11/17/2016
Kernel Bypass Sujay Jayakar (dsj36) 11/17/2016 Kernel Bypass Background Why networking? Status quo: Linux Papers Arrakis: The Operating System is the Control Plane. Simon Peter, Jialin Li, Irene Zhang,
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay 1 George Prekas 2 Ana Klimovic 1 Samuel Grossman 1 Christos Kozyrakis 1 1 Stanford University Edouard Bugnion 2
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationSoftware Routers: NetMap
Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014 Slides from the NetMap: A Novel Framework for
More informationA Proxy-based Query Aggregation Method for Distributed Key-Value Stores
A Proxy-based Query Aggregation Method for Distributed Key-Value Stores Daichi Kawanami, Masanari Kamoshita, Ryota Kawashima and Hiroshi Matsuo Nagoya Institute of Technology, in Nagoya, Aichi, 466-8555,
More informationLight & NOS. Dan Li Tsinghua University
Light & NOS Dan Li Tsinghua University Performance gain The Power of DPDK As claimed: 80 CPU cycles per packet Significant gain compared with Kernel! What we care more How to leverage the performance gain
More informationDemystifying Network Cards
Demystifying Network Cards Paul Emmerich December 27, 2017 Chair of Network Architectures and Services About me PhD student at Researching performance of software packet processing systems Mostly working
More informationResearch on DPDK Based High-Speed Network Traffic Analysis. Zihao Wang Network & Information Center Shanghai Jiao Tong University
Research on DPDK Based High-Speed Network Traffic Analysis Zihao Wang Network & Information Center Shanghai Jiao Tong University Outline 1 Background 2 Overview 3 DPDK Based Traffic Analysis 4 Experiment
More informationTLDK Overview. Transport Layer Development Kit Ray Kinsella February ray.kinsella [at] intel.com IRC: mortderire
TLDK Overview Transport Layer Development Kit Ray Kinsella February 2017 Email : ray.kinsella [at] intel.com IRC: mortderire Contributions from Keith Wiles & Konstantin Ananyev Legal Disclaimer General
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationMaster s Thesis (Academic Year 2015) Improving TCP/IP stack performance by fast packet I/O framework
Master s Thesis (Academic Year 2015) Improving TCP/IP stack performance by fast packet I/O framework Keio University Graduate School of Media and Governance Kenichi Yasukata Master s Thesis Academic Year
More information소프트웨어기반고성능침입탐지시스템설계및구현
소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationlibvnf: building VNFs made easy
libvnf: building VNFs made easy Priyanka Naik, Akash Kanase, Trishal Patel, Mythili Vutukuru Dept. of Computer Science and Engineering Indian Institute of Technology, Bombay SoCC 18 11 th October, 2018
More informationMuch Faster Networking
Much Faster Networking David Riddoch driddoch@solarflare.com Copyright 2016 Solarflare Communications, Inc. All rights reserved. What is kernel bypass? The standard receive path The standard receive path
More informationException-Less System Calls for Event-Driven Servers
Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto Talk overview At OSDI'10: exception-less system calls Technique targeted at highly threaded servers
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationTLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev
TLDK Overview Transport Layer Development Kit Keith Wiles April 2017 Contributions from Ray Kinsella & Konstantin Ananyev Notices and Disclaimers Intel technologies features and benefits depend on system
More informationFlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto
FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single
More informationTowards a Software Defined Data Plane for Datacenters
Towards a Software Defined Data Plane for Datacenters Arvind Krishnamurthy Joint work with: Antoine Kaufmann, Ming Liu, Naveen Sharma Tom Anderson, Kishore Atreya, Changhoon Kim, Jacob Nelson, Simon Peter
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationStackMap: Low-Latency Networking with the OS Stack and Dedicated NICs
StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs Kenichi Yasukata 1, Michio Honda 2, Douglas Santry 2, and Lars Eggert 2 1 Keio University 2 NetApp Abstract StackMap leverages the
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationMegaPipe: A New Programming Interface for Scalable Network I/O
MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han +, Scott Marshall +, Byung-Gon Chun *, and Sylvia Ratnasamy + + University of California, Berkeley *Yahoo! Research Abstract We
More informationExploring mtcp based Single-Threaded and Multi-Threaded Web Server Design
Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design A Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Technology by Pijush Chakraborty (153050015)
More informationSoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua
More informationMega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores
Mega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores Kai Zhang Kaibo Wang 2 Yuan Yuan 2 Lei Guo 2 Rubao Lee 2 Xiaodong Zhang 2 University of Science and Technology of China
More informationFaRM: Fast Remote Memory
FaRM: Fast Remote Memory Problem Context DRAM prices have decreased significantly Cost effective to build commodity servers w/hundreds of GBs E.g. - cluster with 100 machines can hold tens of TBs of main
More informationDeveloping Stateful Middleboxes with the mos API KYOUNGSOO PARK & YOUNGGYOUN MOON
Developing Stateful Middleboxes with the mos API KYOUNGSOO PARK & YOUNGGYOUN MOON ASIM JAMSHED, DONGHWI KIM, & DONGSU HAN SCHOOL OF ELECTRICAL ENGINEERING, KAIST Network Middlebox Networking devices that
More informationAn Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout
An Implementation of the Homa Transport Protocol in RAMCloud Yilong Li, Behnam Montazeri, John Ousterhout Introduction Homa: receiver-driven low-latency transport protocol using network priorities HomaTransport
More informationAssignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland
Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland t Your task Write a simple file server Client has to be implemented in Java Server has to be implemented
More informationAn Analysis of Linux Scalability to Many Cores
An Analysis of Linux Scalability to Many Cores 1 What are we going to talk about? Scalability analysis of 7 system applications running on Linux on a 48 core computer Exim, memcached, Apache, PostgreSQL,
More informationIntroduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware
White Paper Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware Steve Pope, PhD Chief Technical Officer Solarflare Communications
More informationsocketservertcl a Tcl extension for using SCM_RIGHTS By Shannon Noe - FlightAware
socketservertcl a Tcl extension for using SCM_RIGHTS By Shannon Noe - FlightAware Presented at the 24th annual Tcl/Tk conference, Houston Texas, October 2017 Abstract: Horizontal scaling is used to distribute
More informationEXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab
EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationMeltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies
Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about
More informationA Scalable Event Dispatching Library for Linux Network Servers
A Scalable Event Dispatching Library for Linux Network Servers Hao-Ran Liu and Tien-Fu Chen Dept. of CSIE National Chung Cheng University Traditional server: Multiple Process (MP) server A dedicated process
More informationProblem Set: Processes
Lecture Notes on Operating Systems Problem Set: Processes 1. Answer yes/no, and provide a brief explanation. (a) Can two processes be concurrently executing the same program executable? (b) Can two running
More informationNetSlices: Scalable Mul/- Core Packet Processing in User- Space
NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated
More informationMaximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD
Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.
More informationLecture 19: File System Implementation. Mythili Vutukuru IIT Bombay
Lecture 19: File System Implementation Mythili Vutukuru IIT Bombay File System An organization of files and directories on disk OS has one or more file systems Two main aspects of file systems Data structures
More informationMemshare: a Dynamic Multi-tenant Key-value Cache
Memshare: a Dynamic Multi-tenant Key-value Cache ASAF CIDON*, DANIEL RUSHTON, STEPHEN M. RUMBLE, RYAN STUTSMAN *STANFORD UNIVERSITY, UNIVERSITY OF UTAH, GOOGLE INC. 1 Cache is 100X Faster Than Database
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no
More informationThread. Disclaimer: some slides are adopted from the book authors slides with permission 1
Thread Disclaimer: some slides are adopted from the book authors slides with permission 1 IPC Shared memory Recap share a memory region between processes read or write to the shared memory region fast
More informationOpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect
OpenOnload Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. OpenOnload Acceleration Software Accelerated
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationDPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.
DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr. Product Manager CONTRAIL (MULTI-VENDOR) ARCHITECTURE ORCHESTRATOR Interoperates
More informationmos: An open middlebox platform with programmable network stacks
mos: An open middlebox platform with programmable network stacks Shinae Woo ABSTRACT Though the growing popularity of software-based middleboxes raises new requirements for network stack functionality,
More informationTitan: Fair Packet Scheduling for Commodity Multiqueue NICs. Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017
Titan: Fair Packet Scheduling for Commodity Multiqueue NICs Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017 Ethernet line-rates are increasing! 2 Servers need: To drive increasing
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationHyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University
HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM
More informationFaRM: Fast Remote Memory
FaRM: Fast Remote Memory Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, Miguel Castro Microsoft Research Abstract We describe the design and implementation of FaRM, a new main memory distributed
More informationMay 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to
A Sleep-based Our Akram Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) May 1, 2011 Our 1 2 Our 3 4 5 6 Our Efficiency in Back-end Processing Efficiency in back-end
More informationPASTE: A Networking API for Non-Volatile Main Memory
PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets
More informationBringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.
Bringing the Power of ebpf to Open vswitch Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.io 1 Outline Introduction and Motivation OVS-eBPF Project OVS-AF_XDP
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationAn Intelligent NIC Design Xin Song
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational
More informationWhat s Wrong with the Operating System Interface? Collin Lee and John Ousterhout
What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout Goals for the OS Interface More convenient abstractions than hardware interface Manage shared resources Provide near-hardware
More informationChapter 8: I/O functions & socket options
Chapter 8: I/O functions & socket options 8.1 Introduction I/O Models In general, there are normally two phases for an input operation: 1) Waiting for the data to arrive on the network. When the packet
More informationSlides on cross- domain call and Remote Procedure Call (RPC)
Slides on cross- domain call and Remote Procedure Call (RPC) This classic paper is a good example of a microbenchmarking study. It also explains the RPC abstraction and serves as a case study of the nuts-and-bolts
More informationCSE398: Network Systems Design
CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 23, 2005 Outline
More informationBen Walker Data Center Group Intel Corporation
Ben Walker Data Center Group Intel Corporation Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation.
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationProblem Set: Processes
Lecture Notes on Operating Systems Problem Set: Processes 1. Answer yes/no, and provide a brief explanation. (a) Can two processes be concurrently executing the same program executable? (b) Can two running
More informationPROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8
PROCESSES AND THREADS THREADING MODELS CS124 Operating Systems Winter 2016-2017, Lecture 8 2 Processes and Threads As previously described, processes have one sequential thread of execution Increasingly,
More informationntop Users Group Meeting
ntop Users Group Meeting PF_RING Tutorial Alfredo Cardigliano Overview Introduction Installation Configuration Tuning Use cases PF_RING Open source packet processing framework for
More informationAchieve Low Latency NFV with Openstack*
Achieve Low Latency NFV with Openstack* Yunhong Jiang Yunhong.Jiang@intel.com *Other names and brands may be claimed as the property of others. Agenda NFV and network latency Why network latency on NFV
More informationPerformance Objects and Counters for the System
APPENDIXA Performance Objects and for the System May 19, 2009 This appendix provides information on system-related objects and counters. Cisco Tomcat Connector, page 2 Cisco Tomcat JVM, page 4 Cisco Tomcat
More informationCustom UDP-Based Transport Protocol Implementation over DPDK
Custom UDPBased Transport Protocol Implementation over DPDK Dmytro Syzov, Dmitry Kachan, Kirill Karpov, Nikolai Mareev and Eduard Siemens Future Internet Lab Anhalt, Anhalt University of Applied Sciences,
More informationDisclaimer: this is conceptually on target, but detailed implementation is likely different and/or more complex.
Goal: get a better understanding about how the OS and C libraries interact for file I/O and system calls plus where/when buffering happens, what s a file, etc. Disclaimer: this is conceptually on target,
More informationURDMA: RDMA VERBS OVER DPDK
13 th ANNUAL WORKSHOP 2017 URDMA: RDMA VERBS OVER DPDK Patrick MacArthur, Ph.D. Candidate University of New Hampshire March 28, 2017 ACKNOWLEDGEMENTS urdma was initially developed during an internship
More informationNew Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao
New Approach to OVS Datapath Performance Founder of CloudNetEngine Jun Xiao Agenda VM virtual network datapath evolvement Technical deep dive on a new OVS datapath Performance comparisons Q & A 2 VM virtual
More informationIBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT
IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT 215-4-14 Authors: Deep Chatterji (dchatter@us.ibm.com) Steve McDuff (mcduffs@ca.ibm.com) CONTENTS Disclaimer...3 Pushing the limits of B2B Integrator...4
More informationCS118 Discussion Week 2. Taqi
CS118 Discussion Week 2 Taqi Outline Any Questions for Course Project 1? Socket Programming: Non-blocking mode Lecture Review: Application Layer Much of the other related stuff we will only discuss during
More informationNetchannel 2: Optimizing Network Performance
Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development
More informationSolid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Solid State Storage Technologies Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu NVMe (1) The industry standard interface for high-performance NVM
More information<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle
Boost Linux Performance with Enhancements from Oracle Chris Mason Director of Linux Kernel Engineering Linux Performance on Large Systems Exadata Hardware How large systems are different
More informationEbbRT: A Framework for Building Per-Application Library Operating Systems
EbbRT: A Framework for Building Per-Application Library Operating Systems Overview Motivation Objectives System design Implementation Evaluation Conclusion Motivation Emphasis on CPU performance and software
More informationNetwork device drivers in Linux
Network device drivers in Linux Aapo Kalliola Aalto University School of Science Otakaari 1 Espoo, Finland aapo.kalliola@aalto.fi ABSTRACT In this paper we analyze the interfaces, functionality and implementation
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationAn Experimental review on Intel DPDK L2 Forwarding
An Experimental review on Intel DPDK L2 Forwarding Dharmanshu Johar R.V. College of Engineering, Mysore Road,Bengaluru-560059, Karnataka, India. Orcid Id: 0000-0001- 5733-7219 Dr. Minal Moharir R.V. College
More informationA Look at Intel s Dataplane Development Kit
A Look at Intel s Dataplane Development Kit Dominik Scholz Chair for Network Architectures and Services Department for Computer Science Technische Universität München June 13, 2014 Dominik Scholz: A Look
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More information