Eran Borovik and Ben-Ami Yassour Abstract. 1 Introduction
|
|
- Rosalyn Wilkins
- 5 years ago
- Views:
Transcription
1 Adding Advanced Storage Controller Functionality via Low-Overhead Virtualization Muli Ben-Yehuda, Michael Factor, Eran Rom, and Avishay Traeger IBM Research Haifa Eran Borovik and Ben-Ami Yassour Abstract Historically, storage controllers have been extended by integrating new code, e.g., file serving, database processing, deduplication, etc., into an existing base. This integration leads to complexity, co-dependency and instability of both the original and new functions. Hypervisors are a known mechanism to isolate different functions. However, to enable extending a storage controller by providing new functions in a virtual machine (VM), the virtualization overhead must be negligible, which is not the case in a straightforward implementation. This paper demonstrates a set of mechanisms and techniques that achieve near zero runtime performance overhead for using virtualization in the context of a storage system. 1 Introduction Additional functions, such as file serving or database, are often added to existing storage systems to meet new requirements. Historically, this has been done via code integration, or by running the new function on a gateway or virtual storage appliance(vsa[37]). Code integration generally performs best. However, the new function must run on the same OS version, the controller s main functionality is vulnerable to bugs due to lack of isolation, resource management is complicated for software which assumes a dedicated system, and development complexity increases in particular when the new function already exists as independent software. The gateway approach offers isolation but adds both latency and hardware costs. A hypervisor can isolate the new function, allow for differing OS versions, and simplify development. However, until now the high performance overhead of virtualization (in particular virtualized I/O) has made this approach impractical. In this paper, we show how to use server-based virtualization to integrate new functions into a storage system with near zero performance cost. Our approach is in line with the VSA approach, but we run the VM directly on the storage system. While our work was done using KVM [14], our insights are not KVM-specific. We do take advantage of the factthatkvm usesanasymmetricmodelinwhichsome of the code is virtualized (the new features) while other code (the original storage system) runs on bare metal, unaware of the existence of the hypervisor. There are three sources of performance overhead. Base overheads include aspects such as virtual memory management or process switching. External communication with storage clients is important when the new function is a filter ontopoftheoriginalstoragesystem,e.g.,afile server. Finally, internal communication overheads are incurred to tie the new function to the original controller. To reduce base overhead, we use two main techniques. First, westatically allocatecpu coresto theguesttoensure that the function has sufficient resources. Second, we statically allocate memory for the VM, backing that area with larger pages to reduce translation overheads. The straightforward implementation of external communication is expensive because the hypervisor intervenes when physical events occur (e.g., interrupts or device accesses). Each such intervention entails an expensive exit from the guest code to the hypervisor. The highest-performing approach for reducing this overhead is device assignment, which eliminates exits for device access. Thus, to reduce these costs, we assign the network device directly to the guest using an SR-IOVenabled adapter [23] which allows the guest to send requests directly to the device. To eliminate exits for interrupts, we use instead of interrupts, a well-known technique in storage systems. To reduce the cost of internal communication, we modified KVM s para-virtual block driver to poll as well, eliminating exits due to PIOs and interrupt injections. This provides for a fast, exit-less, zero-copy transport. By using these techniques, we show no measurable difference in network latency between bare metal and virtualized I/O and under 5% difference in throughput. For internal communication, micro-benchmarks show 6.6µs latency overhead, read throughput of 357K IOPS, and write throughput of 284K IOPS; roughly seven times better than a base KVM implementation. In addition, an I/O intensive filer workload running in KVM incurs less than.4% runtime performance overhead compared to bare metal integration. Our main contributions are: a detailed, benchmark-driven analysis of virtualization overheads in a storage system context, a set of approaches to removing overheads, and a demonstration of how these approaches enable running new storage features in a VM with essentially zero runtime performance overhead.
2 Therestofthepaperisorganizedasfollows. Section2 provides background on KVM and VM I/O. We take an incremental approach to show our performance improvements; Section 3 describes the experimental environment. Sections 4 and 5 present a performance analysis and describe optimizations related to the external and internal communication interfaces, respectively. Base overheads are shown together with macro-benchmark results areinsection6. Section7describesrelatedworkandwe conclude in Section 8. 2 x86 I/O Virtualization Primer We now provide some background information on KVM (the hypervisor used in this paper) and virtual machine I/O. There are two main options for where a hypervisor resides. Type 1 hypervisors run directly on the hardware, whereas type 2 hypervisors are hosted by an OS. KVM takes a hybrid approach that combines the benefits of both. It is a Linux kernel module that leverages Intel VT-x or AMD-V CPU features for running unmodified virtual machines, thereby creating a single host kernel/hypervisor that runs both processes and virtual machines. Such a hybrid architecture allows the storage controller software to run unmodified on bare metal while also running additional functionality in virtual machines. There are three main methods for accessing I/O devices in VMs. In the first, emulation, the hypervisor emulates a specific device in software [35]. The OS runninginthevm(guestos) usesitsregulardevicedrivers to access the emulated device. This method requires no changes to the guest, but suffers from poor performance. In the second method, para-virtualization [4], the guest OS runs specialized code to cooperate with the hypervisor to reduce overheads. For example, KVM s paravirtualized drivers use virtio [26], which presents a ring buffer transport(vring) and device configuration as a PCI device. Drivers such as network, block, and video are implemented using virtio. In general, the guest OS driver places pointers to buffers on the vring and initiates I/O via a Programmed I/O (PIO) command. The hypervisor directly accesses the buffers from the guest OS s memory (zero-copy). Para-virtualized devices perform better than emulated devices, but require installing hypervisorspecific drivers in the guest OS. The third method, device assignment [6, 17, 39], gives thevmaphysicaldevicethatitcansubmiti/ostowithout the hypervisor s involvement. An I/O Memory Management Unit(IOMMU) provides address translation and memory protection [6, 7, 38]. Interrupts, however, are routed to the guest OS via the hypervisor. Assigning a device to the VM means that no other OS can access it (including the hypervisor or other guests). However, technologies such as Single Root I/O Virtualization(SR- IOV)[23] allow devices to be assigned to multiple OSs. 3 Experimental Setup We take an incremental approach to showing how to eliminate the virtualization overheads. For our experiments we used two servers, each with two quadcore 2.93GHz EPT-enabled Intel Xeon 55 processors, 16GB of RAM and an Emulex OneConnect 1Gb Ethernet adapter. The servers were connected with a 1Gb cable. One serveractedas a loadgeneratorandthe other was our(emulated) storage controller platform. We used RHEL 5.4 with the RedHat el5 kernel for both the load server and the guest. The controller server used the RedHat kernel for bare metal runs and Ubuntu 9.1 with a vanilla kernel for KVM runs. The newer kernel was necessary for running KVM. The controller server was run with four cores enabled, unless otherwise specified. For VM-based experiments, two cores and 2GB of memory were assigned to the guest; all four cores were used by the host in the bare metal cases. We used an 8GB ramdisk for the storage back-end in the experiments described in Section 5 and 6. This allowed us to measure I/O performance without physical disks becoming the bottleneck. We accessed the ramdisk via a loopback device, which allowed us to assign disk I/O handling to specific cores, similar to the way a storage controller functions. All results shown are the averages of at least 5 runs, with standard deviations below 5%. 4 Network Communication Performance Enabling the guest to interact with the outside world requiresi/o access. As discussed in Section 2, each of the three common approaches to I/O virtualization has benefits and drawbacks. We identified device assignment the best performing option as the most suitable approach for adding new functionality to storage controllers. KVM s initial device assignment implementation, however, did not provide the necessary performance. In the remainder of this section, we analyze device assignment and discuss a set of optimizations which allowed us to achieve near bare-metal performance. Virtualization overhead is mainly due to events that are trapped by the hypervisor, causing costly exits [1, 5, 16]. Theoverheadisafactorofthefrequencyofexitsandthe timeittakesthehypervisortohandletheexitandresume running the guest. To examine the performance impact of virtualization for our intended use and ways to reduce it, we focused on networking micro-benchmarks. Our goalistominimizetheamountoftimethatthehypervisor needstorun,byminimizingthenumberofexits. Thefirsttechniquethatweusedtoimprovetheguest s performance is related to the handling of the hlt (halt) andmwaitx86instructions. WhentheOSdoesnothave any work to do it can call these instructions to enter a
3 power saving mode. Most hypervisors will trap these commands and will run other tasks on the core. In our case, however, the new function should always run. We therefore instructed the guest OS to enter an idle loop when there is no work to be done by enabling a kernel boot parameter (idle=poll). This improves performance, as the guest is always running. The second technique that we used is related to interrupt handling. Most of the guest exits related to device assignment are caused by interrupts[2, 5, 18]. Every external interrupt causes at least two guest exits: first, when the interrupt arrives(causing the hypervisor to gain control and to inject the interruptto the guest) and when the guest signals completion of the interrupt handling(causing the host to gain control and to emulate the completionfortheguest). Theguestcanconfiguretheadapterto use two different interrupt-delivery modes: MSI, which is the newer message based interrupt protocol, or the legacy INTX protocol. The KVM implementation we used incurred additional overhead when using MSI interrupts, due to additional exits for masking and unmasking adapter interrupts. Since most of the virtualization overhead comes from interrupts, our approach is to run the adapter in rather than interrupt-driven mode. In Linux today, most network adapters use NAPI [3, 31], a hybrid approach to reducing interrupt overhead which switches between and interrupt-driven operation depending on the network traffic. However, even with NAPI, we have seen interrupt rates of 7K interrupts persecond. Sincesuchahighinterruptratecanincurprohibitive overhead and interrupts are not necessary for our intended use case, we decided to forgo interrupts and use. Our driver creates a new thread for the functions. The adapter we use has three types of events: packet received, packet sent, and command completion. Since there is no way to know when a packet will be received, our driver continuously polls for packets received; packet sent and command completion indications are handled by the same thread every so often. Using a constantly thread means that we dedicate most of a core for this functionality. While this might seem expensive from the resources perspective, it proved critical to achieve the desired performance. A single core could also be used to poll multiple devices by integrating their threads into a single thread, or by scheduling different threads on the same core. We did not experiment with this configuration. Next we evaluate the performance of the driver using network micro-benchmarks. Table 1 depicts the averagedurationtimeofapingfloodcommandgoingfrom aclientmachinetothesystemundertest. Thesystemunder test replies to pingsusing our drivereither in modeor in INTX mode. The driverrunseitherin the host (bare-metal), or in the guest with halt disabled, Transactions/second Throughput (Gb/s) Bare-metal halt INTX Polling Table 1: Ping average latency (µs) host msi guest msi guest intx host poll guest poll Number of netperf processes Figure 1: netperf request-response throughput Message size (bytes) host msi guest msi guest intx host poll guest poll Figure 2: netperf TCP send throughput orintheguestwith haltenabled(guesthalt). Figure 1 shows the results for several netperf request-response configurations, measuring round-trip time using1bytepackets. guestmsi andguestintx stand for the guest using MSI and INTX interrupt delivery, respectively. host msi stands for the host using interrupts inmsimode;guestpollandhostpollstandfortheguest and host using mode, respectively. As expected, mode achieves better performance than interrupt modeinthehost(i.e,onbaremetal). Sinceinguestmode thecost ofinterruptsis muchhigher,thegainfromusing is more significant than in the bare-metal case. Using MSI interrupts in guest mode has significant impact on the performance with this KVM version since there are frequent exits due to interrupt masking calls by the guest. Figure 2 shows the results of a single-threaded netperf send TCP throughput test (system under test is sending) in the same configurations as the previous figure: host using and INTX interrupts, guest using
4 5 µs 15.9 µs vcpu Thread Core Physical Execution Post 24% [12µs] Execution Pre 18.4% [9.2µs] vcpu Thread Benchmark Layer Application 2.5% [.4µs] Layer Block 4.4% [.7µs] FE virtio-block 3.1% [.5µs] Wait DIO 83.6% [13.3µs] Polling Thread Host Core 1 Physical BE virtio-block 6.9% [1.1µs] BDRV Layer QEMU 6.9% [1.1µs] System Call AIO 35.8% [5.7µs] Completion AIO 6.3% [1µs] Main Thread QEMU Core 1 Physical vcpu Polling Core 2 Physical Completion I/O 6.9% [1.1µs] Throughput (Gb/s) host msi host poll guest poll Application Layer [1µs] 2% VFS Layer [2.4µs] 4.8% Wait [8.3µs] 16.6% DIO 2.6% [1.3µs] Layer Block 1.8% [.9µs] FE virtio-block AIO Completion [5.2µs] 1.5% Other Work [.8µs] 1.5% virtio-block BE [1.7µs] 3.4% QEMU BDRV Layer [2.2µs] 4.4% Idle Time [44µs] 88% Message size (bytes) AIO System Call [9.1µs] 18.2% Figure 3: netperf TCP receive throughput, INTX, and MSI interrupts. Here the contribution of is less noticeable, since the TCP stack batches network processing. On bare metal, provides better performance than interrupt mode. In guest mode, the advantage of is much more significant. Figure 3 shows the results of a single-threaded netperf receive TCP throughput test (system under test is receiving). Here it is surprising to see that for the bare-metal case the performance of (host poll) is less than that of interrupts (host msi). The reason is that in the netperf throughput test the sender is the bottleneck. When the receiver is working in mode, it sends many more acknowledgment packets to the sender. For example in the case of 1K messages, the receiver sends approximately 1 times as many ACKs. Since the sender is already the bottleneck, sending more ACKs generates more load on the sender, which reduces sender throughput. While this issue is noticeable for this microbenchmark, in practice, the handling time of a packet by the receiver is much larger, hence in most cases the sender is not the bottleneck. Polling achieves the same performance in the guest poll and host poll cases, which indicates that the virtualization-induced runtime overhead is negligible. To verify that the reduced performance for the receivetestisanartifactoftcp,weranthesametest using UDP. With UDP, all setups guest or bare metal, interrupts or achieve the same performance. Because the sender is the bottleneck, once the TCP ACK effect is removed, performance is not affected by the receiver s mode of operation. 5 Internal Communication Performance Of the three methods for accessing I/O devices described in Section 2, we use para-virtualization for internal communication. Para-virtualization performs better than emulated devices, and because we supply the VM image that runs in the controller, we can easily use custom drivers. Further, our goal is to transmit I/O requests to a controller process running on the host, so device assignment is less Legend: I/O Completion [1.9µs] 3.8% User Space Kernel Space Host User Space Figure 4: Unmodified KVM para-virtualized block I/O path. Physical Core VFS Layer [1µs] 6.3% Legend: User Space Poll [7µs] 44% Poll (continued) Kernel Space Poll [14.8µs] 93.1% Host User Space Figure 5: Para-virtualized block I/O path with. practical. For example, we cannot assign the drives because the storage controller must own them, and not the guest OS. One may also consider using external communication to access the controller via iscsi or Fibre Channel, but this adds unnecessary communication overheads. We use ramdisk as the backing store for our analysis to prevent the disks from dominating latencies or becomingabottleneck. Inaddition,weusedirectI/Otoprevent caching effects that mask virtualization overheads. Latencies presented are the average over 1 minutes. Section 5.1 describes the vanilla KVM para-virtualized block I/O, and Section 5.2 describes our optimizations. 5.1 KVM Para-virtualized Block I/O Figure 4 depicts the unmodified para-virtualized block I/O path in KVM, along with associated latencies for major code blocks when executing a 4KB direct I/O write request. The guest application initiates an I/O, which is handledbytheguestkernelasusual. ThedirectI/Owaitingtime(DIOWait,16.6%ofthetotal),consistsofworld
5 switches and context switches between threads inside the guest. Though we have drawn it as one block, it is interleaved with other code running on the same core. The para-virtualized block driver (virtio-block front-end) iterates over the requests in the elevator queue and places each request s I/O descriptors on the vring, a queue residing in guest memory that is accessible by the host. The driver then issues a programmed I/O(PIO) command whichcausesaworldswitchfromguesttohost. Control is transferred to the KVM kernel module to handle the exit. The post- and pre-execution times (24% and 18.4%, respectively) account for the work done by both KVM and QEMU to change contexts between the guest and QEMU process (including the exit/entry). KVM identifies the cause of the exit and, in this case, passes control to the QEMU virtio-block back-end(be). It extracts the I/O descriptors from the vring without copies and passes the requests to the block driver layer (QEMU BDRV), which initiates asynchronous I/Os to the block device. The guest may now resume execution. An event-driven dedicated QEMU thread receives I/O completions and forwards them to the virtio-block BE. The BE updates the vring with completion information andcallsuponkvm to inject an interruptintothe guest, for which KVM must initiate a world switch. When the guest resumes, its kernel handles the interrupt as normal, andthenaccessesits APIC tosignal the endofinterrupt, causing yet another exit. Locks to synchronize the two QEMU threads incur additional overhead. 5.2 Para-virtualized Block Optimizations To reduce virtualization overhead, we added a thread to QEMU as depicted in Figure 5. The thread pollsthevring(1)fori/orequestscomingfromtheguest and (2) for I/O completions coming from the host kernel. The thread invokes the virtio-block BE code on incoming I/Os and completions. This thread does not necessarily need to reside in QEMU; if the storage controller is -based, its thread may be used. As discussed in Section 4, we added a thread to the guest which polls the networking device. We utilize this same thread to poll the vring for I/O completions. When it detects an I/O completion event, it invokes the guest I/O completion stack, which would normally be called by the interrupt handler. By using on both sides of the vring, we avoid all I/O-relatedexits, and thusalso avoid all of the pre- and post-guest execution code. We also avoid locking the queue, since now only the threadaccesses it. For the 4KB direct I/O write, this improves the latency from 5µs to 15.9µs. Comparing Figures 4 and 5, we see that better utilizes the CPU for I/O-related work. Additionally, components that we didn t directly optimize(such as the VFS layer, for example) are more efficient thanks to better cache utilization and less cache pollution due to fewer context switches. We performed two additional code optimizations in QEMU to reduce latencies, whose impact is already included in the above discussion. When accessing a guest s memory, QEMU must first translate the address using a page-table like data structure. This handles cases where the guest s memory can be remapped(for example, when dealing with PCI devices). In our case, the memory layout is static, rendering the translation unnecessary. Removing unnecessary lookups improved performance by 4.6% for 4KB reads and 4.2% for 4KB writes. The second optimization is to use a memory pool for internal QEMU request structures. This saved 3% for 4KB reads and2.5%for4kbwrites. 5.3 Overall Performance Calculation AstoragecontrollerrunninganewfunctioninaVMthat uses interrupts for its internal communication would have a rather significant performance penalty. Looking at Figure 4, the corresponding storage controller implementation would look similar, except that the AIO calls would be replaced by asynchronous calls to the controller code. We consider any work done from the time the application submits the I/O until it reaches the controller to be virtualization overhead (work that would not be done if running directly on the host). In the unmodified case, the overheadis49µs(wesubtractonlythelatencyoftheapplication layer from the total). If our techniques were integrated into a controller, we would calculate the latency overhead as follows, based on Figure 5. We begin with the total, 15.9µs, and subtracttheapplicationlayer,aswedidinthepreviouscase. Further, we subtract the QEMU BDRV layer, and the AIO system call and completion, because these would be replaced by the controller code, and are therefore not considered virtualization overhead. The final overhead is therefore 7.7µs before the two QEMU optimizations, and 6.6µs after. To put the overheads in context, we estimate our performance impact on the fastest latencies published using the SPC-1 benchmark since 29 [34]. The fastest result was 13µs, and our virtualization technique would add approximately 5% overhead to this case(the baseline case would add approximately 38%). The average of the 27 controllers fastest latencies is 482µs, and in this average case, our virtualization techniques would add only 1.4%(the baseline would add over 1%). Our improvements affect throughput in addition to latency. To measure these effects, we ran microbenchmarks consisting of multi-threaded 4KB direct I/Os. For multi-threaded 4KB direct I/Os, we improved read IOPs by a factor of 7.3x (from 48.8K to 357.5K), andwriteiops by6.5x(from43.8kto 284.1K).
6 Throughput (MB/s) (1) BM base (2) BM (3) base (4) 3+Huge Pages (5) 4+Idle Figure 6: File server workload with 6 cores 6 File Server Workload (6) 5+Driver We next tested the end-to-end performance of running a server in a VM on a storage controller. We ran a file server in our VM, and used dbench [36] v4. to generate 4KB NFS read requests which all arrived at the 1Gb NIC, went through the local file system and block layers, through the para-virtualized block interface, and were satisfied by the ramdisk on the host side. We always allocated two cores to the controller function, and either two or four to the file server (as specified). In the virtualized cases, all file server cores were given to the VM.All coreswerefullyutilizedforallcases. Figure 6 shows the results when running with six cores: twoforthecontrollerfunctionandfourforthefile server. Bars 1 and 2 show the bare metal case without and with, respectively. Roughly the same performance is attained in both cases. The third bar shows the baseline measurement for the guest, which is a significant degradation as compared to the bare metal cases. We identified three main causes for this performance drop. First, we noticed a large number of page faults on the host caused by the running VM. We mitigated this using the Linux kernel s HugePages mechanism, which backs a given process with 2MB pages instead of 4KB pages. This allows the OS to store fewer TLB page entries, resulting in fewer TLB faults and fewer EPT table lookups. HugePages improved performance by 1.5%, as showninthefourthbaroffigure6. A featureinarecent Linux kernel release makes the use of HugePages automatic [1]. The second issue affecting performance was haltexits,describedinsection4. Weavoidtheseexitsby setting the guest scheduler to poll. This further improved performance by 7.3% (fifth bar in Figure 6). The final performance improvement was to add driver, for both the network and block interfaces(described in Sections 4 and 5.2). This further improved performance by 19.7%, and brings the guest s performance to be statistically indistinguishable from bare metal. Next, we ran the same workload, but this time allocated only two cores to the file server (four cores total). This may be a more common deployment when running multiple server VMs on a single physical host, for example, because there are less cores available for each VM. The bare metal results are depicted in Figure 7(a). The first bar shows the bare metal baseline performance of MB/s. We see in the second bar that performance Throughput (MB/s) Throughput (MB/s) (1) BM base (1) base (2) BM (a) Bare Metal (2) opt (b) (3) (3) 2+ Priority 37.5 (4) 6+ Priority (4) 3+ Affinity Figure 7: File server workload with 4 cores (5) 7+ Affinity drops to MB/s when using. This is because the host now has only two cores, and the thread utilizes a disproportionate amount of CPU resources as compared to the file server. We remedied this by reducing the CPU scheduling priority of the thread(bar 3), and by setting the CPU affinities of the thread and some of the file server processes so that they share the same core (bar 4). These two changes bring the performance back to baseline performance. In the guest case, depicted in Figure 7(b), the baseline (bar 1) is approximately 36% lower than the bare metal case. Bar 2 includes the HugePages and idle optimizations previously described, and bar 3 adds driver. Similar to the bare metal case, we adjusted the thread scheduling priority and the affinities of the relevantprocesses(bars4and5). Thisbringsustoresults that are statistically indistinguishable from bare metal. In all cases, tuning was not difficult, and a wide range of values provided the achieved performance. 7 Related Work Several works explored the idea of running VMs on storage controllers. The IBM DS83 storage controller uses logical partitions (LPARs) to enable the creation of two fault-isolated and performance-isolated virtual storage systems on one physical controller[12]. Pivot3 [24] and ParaScale[22] are integrated virtualization and scaleout SAN storage platforms that are geared to data centers. Fido[8] investigated using shared memory to implement zero-copy inter-vm communication in Xen in the context of enterprise-class server appliances. Our focus is different in that we investigate external communication, zero-copy communication with the controller software, and various techniques and methods to reduce overheads caused by I/O virtualization. Block Mason [21] used building blocks implemented in VMs to extend block storage functionality. VMware
7 VSA [37] pools the internal storage resources of several servers in a shared storage pool, using dedicated virtual machines running on each server. Several works explored off-loading I/O to dedicated cores [3,15,16,19]. The closest to ours is VPE [19], which adds host-side to KVM s virtio network stack. The VPE thread polls the network device for incoming packets and polls the guest device driver for new requests. However, the guest incurs exit overheads for interrupts and I/O completions since its driver does not poll. Dedicating cores for improving I/O performance hasalsobeenexploredintcp onloading[25,32,33]. There have been several works that investigated reducing interrupt overhead. The Linux kernel uses NAPI to disable interrupts of incoming packets as long as there are packets to be processed [3, 31]. A hybrid approach is to use interrupts under low load, and when more throughput is needed [11]. With interrupt coalescing, a single interruptis generatedfor a givennumberof eventsorin apre-definedtime period[2,27]. A seriesof works compared these techniques qualitatively and quantitatively [28, 29]. Rather than for fixed intervals or according to arrival rates, QAPolling uses the system state as determined by applications receive queues [9]. The Polling Watchdog uses a hardware extension to trigger interrupts only when fails to handle a message inatimelymanner[2]. ELI (ExitLess Interrupts [13]) is a recently-published software-only approach for handling interrupts within guest virtual machines directly and securely. ELI removes the host from the interrupt handling paths, thereby allowing guests to reach 97% 1% of bare-metal performance for I/O-intensive workloads. 8 Conclusions and Future Work We have shown how to use a hypervisorto host and isolate new storage system functions with negligible runtime performance overhead. The techniques we demonstrated such as, dedicated cores, avoiding page lookups, etc.,whilenotgeneralpurposeareagoodfittoourusage scenario and have a significant payback. There are several possible extensions. First, ELI [13] is a promising new approach for exitless interrupts which would remove the need to poll in the guest. We are investigating incorporating it into our system. Second, if we stay with, we can explore ways to better utilize the cores, e.g., to on-board the TCP stack to a core. Third, we can also benchmark these techniques when running multiple VMs. Finally, we can examine how to leverage the fact that we have virtualized the new storage function s implementation to take advantage of features such as VM migration to improve performance and availability. Acknowledgments Thank you to our shepherd Arkady Kanevsky and the reviewers for their helpful comments. Thanks to Zorik Machulsky and Michael Vasiliev for assisting with the hardware. The research leading to the results presented in this paper is partially supported by the European Communitys Seventh Framework Programme([FP7/21-213]) under grant agreement number (IOLanes). References [1] K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Architectural Support for Programming Languages& Operating Systems(ASPLOS), 26. [2] I. Ahmad, A. Gulati, and A. Mashtizadeh. vic: Interrupt coalescing for virtual machine storage device IO. In USENIX Annual Technical Conf., 211. [3] N. Amit, M. Ben-Yehuda, D. Tsafrir, and A. Schuster. viommu: efficient IOMMU emulation. In USENIX Annual Technical Conf., 211. [4] P.Barham,B.Dragovic,K.Fraser,S.Hand,T.Harris,A.Ho,R.Neugebauer,I.Pratt,andA.Warfield. Xen and the art of virtualization. In ACM Symposium on Operating Systems Principles (SOSP). ACM SIGOPS, October 23. [5] M. Ben-Yehuda, M. D. Day, Z. Dubitzky, M. Factor, N. Har El, A. Gordon, A. Liguori, O. Wasserman, and B. Yassour. The Turtles project: Design and implementation of nested virtualization. In USENIX Symposium on Operating Systems Design& Implementation(OSDI), 21. [6] M. Ben-Yehuda, J. Mason, J. Xenidis, O. Krieger, L. van Doorn, J. Nakajima, A. Mallick, and E. Wahlig. Utilizing IOMMUs for virtualization in Linux and Xen. In Ottawa Linux Symposium(OLS), July 26. [7] M. Ben-Yehuda, J. Xenidis, M. Ostrowski, K. Rister, A. Bruemmer, and L. van Doorn. The price of safety: Evaluating IOMMU performance. In the 27 Ottawa Linux Symposium, June 27. [8] A. Burtsev, K. Srinivasan, P. Radhakrishnan, L. N. Bairavasundaram, K. Voruganti, and G. R. Goodson. Fido: Fast inter-virtual-machine communication for enterprise appliances. In USENIX Annual Technical Conf., June 29. [9] X. Chang,J. Muppala,W. Kong,P. Zou,X. Li, and Z. Zheng. A queue-based adaptive scheme to improve system performance in gigabit ethernet networks. In IEEE International Performance Computing and Communications Conf., 27. [1] J. Corbet. Transparent hugepages in http: //lwn.net/articles/423584/, January 211.
8 [11] C. Dovrolis, B. Thayer, and P. Ramanathan. Hip: Hybrid interrupt- for the network interface. ACM SIGOPS Operating Systems Review, 35(4):5 6, October 21. [12] B. Dufrasne, A. Baer, P. Klee, and D. Paulin. IBM system storage DS8: Architecture and implementation. Technical Report SG , IBM, October 29. [13] A. Gordon, N. Amit, N. Har El, M. Ben-Yehuda, A. Landau, D. Tsafrir, and A. Schuster. ELI: Baremetal performance for I/O virtualization. In Architectural Support for Programming Languages& Operating Systems(ASPLOS), 212. [14] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the Linux virtual machine monitor. In the 27 Ottawa Linux Symposium, volume1,june27. [15] S. Kumar, H. Raj, K. Schwan, and I. Ganev. Rearchitecting VMMs for multicore systems: The sidecore approach. In Workshop on Interaction between Opearting Systems& Computer Architecture (WIOSCA), 27. [16] A. Landau, M. Ben-Yehuda, and A. Gordon. SplitX: Split guest/hypervisor execution on multicore. In USENIX Workshop on I/O Virtualization (WIOV), 211. [17] J. LeVasseur, V. Uhlig, J. Stoess, and S. Götz. Unmodified device driver reuse and improved system dependability via virtual machines. In USENIX Symposium on Operating Systems Design& Implementation(OSDI), 24. [18] J. Liu. Evaluating standard-based self-virtualizing devices: Aperformancestudyon1GbENICswith SR-IOV support. In IEEE International Parallel & Distributed Processing Symposium(IPDPS), 21. [19] J. Liu and B. Abali. Virtualization engine (VPE): Using dedicated cpu cores to accelerate I/O virtualization. In the 23rd international conference on Supercomputing, June 29. [2] O. Maquelin, G. R. Gao, H. H. J. Hum, K. B. Theobald, and X. Tian. Polling watchdog: Combining and interrupts for efficient message handling. ACM SIGARCH Computer Architecture News, 24(2): , May [21] D. T. Meyer, B. Cully, J. Wires, N. C. Hutchinson, and A. Warfield. Block mason. In USENIX Workshop on I/O Virtualization(WIOV), 28. [22] ParaScale. Cloud computing and the parascale platform: An inside view to cloud storage adoption. White Paper, 21. [23] PCI-SIG. Single root I/O virtualization 1.1 specification, January specifications/iov/single_root/. [24] Pivot3. Pivot3 serverless computing technology overview. White Paper, April 21. [25] G. Regnier, S. Makineni, I. Illikkal, R. Iyer, D. Minturn, R. Huggahalli, D. Newell, L. Cline, and A. Foong. Tcp onloading for data center servers. IEEE Computer, 37(11):48 48, 24. [26] R. Russell. virtio: Towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Systems Review, 42(5):95 13, July 28. [27] K. Salah. To coalesce or not to coalesce. International Journal of Electronics and Communications, 61(4): , 27. [28] K. Salah, K. El-Badawi, and F. Haidari. Performance analysis and comparison of interrupthandling schemes in gigabit networks. Journal of Computer Communications, 3(17): , 27. [29] K. Salah and A. Qahtan. Implementation and experimental performance evaluation of a hybrid interrupt-handling scheme. Journal of Computer Communications, 32(1): , 29. [3] J. Salim. When NAPI Comes To Town. In Linux 25 Conf., August 25. [31] J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond Softnet. In Anual Linux Showcase& Conf., 21. [32] L. Shalev, V. Makhervaks, Z. Machulsky, G. Biran, J. Satran, M. Ben-Yehuda, and I. Shimony. Loosely coupled TCP acceleration architecture. In The 14th IEEE Symposium on High-Performance Interconnects, August 26. [33] L. Shalev, J. Satran, E. Borovik, and M. Ben- Yehuda. IsoStack highly efficient network processing on dedicated cores. In USENIX Annual Technical Conf., June 21. [34] SPC. Storage Performance Council, 27. www. storageperformance.org. [35] J. Sugerman, G. Venkitachalam, and B. Lim. Virtualizing I/O devices on VMware workstation s hosted virtual machine monitor. In USENIX Annual Technical Conf.. USENIX Association, 21. [36] A. Tridgell and R. Sahlberg. DBENCH. dbench.samba.org/, 211. [37] VMware. Virtual storage appliance. vmware.com/products/datacenter-virtualization/ vsphere/vsphere-storage-appliance/overview.html. [38] P. Willmann, S. Rixner, and A. L. Cox. Protection strategies for direct access to virtualized I/O devices. In USENIX Annual Technical Conf.. USENIX Association, June 28. [39] B. Yassour, M. Ben-Yehuda, and O. Wasserman. Direct device assignment for untrusted fullyvirtualized virtual machines. Technical report, IBM Research Report H-263, 28.
IBM Research Report. Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines
H-0263 (H0809-006) September 20, 2008 Computer Science IBM Research Report Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines Ben-Ami Yassour, Muli Ben-Yehuda, Orit Wasserman IBM
More informationVirtualization History and Future Trends
Virtualization History and Future Trends Christoffer Dall - Candidacy Exam - January 2013 Columbia University - Computer Science Department IBM Mainframe VMs VMware Workstation x86 Hardware Support Virtual
More informationScalable I/O A Well-Architected Way to Do Scalable, Secure and Virtualized I/O
Scalable I/O A Well-Architected Way to Do Scalable, Secure and Virtualized I/O Julian Satran Leah Shalev Muli Ben-Yehuda Zorik Machulsky satran@il.ibm.com leah@il.ibm.com muli@il.ibm.com machulsk@il.ibm.com
More informationBare-Metal Performance for x86 Virtualization
Bare-Metal Performance for x86 Virtualization Muli Ben-Yehuda Technion & IBM Research Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Boston University, 2012 1 / 49 Background:
More informationA Novel Approach to Gain High Throughput and Low Latency through SR- IOV
A Novel Approach to Gain High Throughput and Low Latency through SR- IOV Usha Devi G #1, Kasthuri Theja Peduru #2, Mallikarjuna Reddy B #3 School of Information Technology, VIT University, Vellore 632014,
More informationvpipe: One Pipe to Connect Them All!
vpipe: One Pipe to Connect Them All! Sahan Gamage, Ramana Kompella, Dongyan Xu Department of Computer Science, Purdue University {sgamage,kompella,dxu}@cs.purdue.edu Abstract Many enterprises use the cloud
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationPerformance Considerations of Network Functions Virtualization using Containers
Performance Considerations of Network Functions Virtualization using Containers Jason Anderson, et al. (Clemson University) 2016 International Conference on Computing, Networking and Communications, Internet
More informationELI: Bare-Metal Performance for I/O Virtualization
: Bare-Metal Performance for I/O Virtualization Abel Gordon 1 Nadav Amit 2 Nadav Har El 1 Muli Ben-Yehuda 21 Alex Landau 1 Assaf Schuster 2 Dan Tsafrir 2 1 IBM Research Haifa 2 Technion Israel Institute
More informationVirtualization, Xen and Denali
Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two
More informationOn the DMA Mapping Problem in Direct Device Assignment
On the DMA Mapping Problem in Direct Device Assignment Ben-Ami Yassour Muli Ben-Yehuda Orit Wasserman benami@il.ibm.com muli@il.ibm.com oritw@il.ibm.com IBM Research Haifa On the DMA Mapping Problem in
More informationThe Price of Safety: Evaluating IOMMU Performance
The Price of Safety: Evaluating IOMMU Performance Muli Ben-Yehuda 1 Jimi Xenidis 2 Michal Ostrowski 2 Karl Rister 3 Alexis Bruemmer 3 Leendert Van Doorn 4 1 muli@il.ibm.com 2 {jimix,mostrows}@watson.ibm.com
More informationWhat is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks
LINUX-KVM The need for KVM x86 originally virtualization unfriendly No hardware provisions Instructions behave differently depending on privilege context(popf) Performance suffered on trap-and-emulate
More informationAdvanced Operating Systems (CS 202) Virtualization
Advanced Operating Systems (CS 202) Virtualization Virtualization One of the natural consequences of the extensibility research we discussed What is virtualization and what are the benefits? 2 Virtualization
More informationTask Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen
Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Aalto University School of Electrical Engineering Degree Programme in Communications Engineering Supervisor:
More informationIBM Research Report. The Turtles Project: Design and Implementation of Nested Virtualization
H-0282 (H1001-004) January 9, 2010 Computer Science IBM Research Report The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuda 1, Michael D. Day 2, Zvi Dubitzky 1, Michael
More informationNested Virtualization and Server Consolidation
Nested Virtualization and Server Consolidation Vara Varavithya Department of Electrical Engineering, KMUTNB varavithya@gmail.com 1 Outline Virtualization & Background Nested Virtualization Hybrid-Nested
More informationKnut Omang Ifi/Oracle 20 Oct, Introduction to virtualization (Virtual machines) Aspects of network virtualization:
Software and hardware support for Network Virtualization part 2 Knut Omang Ifi/Oracle 20 Oct, 2015 32 Overview Introduction to virtualization (Virtual machines) Aspects of network virtualization: Virtual
More informationSpring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand
Introduction to Virtual Machines Nima Honarmand Virtual Machines & Hypervisors Virtual Machine: an abstraction of a complete compute environment through the combined virtualization of the processor, memory,
More informationChapter 5 C. Virtual machines
Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing
More informationRDMA-like VirtIO Network Device for Palacios Virtual Machines
RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network
More informationSystem Virtual Machines
System Virtual Machines Outline Need and genesis of system Virtual Machines Basic concepts User Interface and Appearance State Management Resource Control Bare Metal and Hosted Virtual Machines Co-designed
More informationOS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.
Virtualization Basics Motivation OS Virtualization CSC 456 Final Presentation Brandon D. Shroyer Types of Virtualization Process virtualization (Java) System virtualization (classic, hosted) Emulation
More informationImproving Performance by Bridging the Semantic Gap between Multi-queue SSD and I/O Virtualization Framework
Improving Performance by Bridging the Semantic Gap between Multi-queue SSD and I/O Virtualization Framework Tae Yong Kim *, Dong Hyun Kang *, Dongwoo Lee *, Young Ik Eom * * College of Information and
More informationRedesigning Xen's Memory Sharing Mechanism for Safe and Efficient I/O Virtualization Kaushik Kumar Ram, Jose Renato Santos, Yoshio Turner
Redesigning Xen's Memory Sharing Mechanism for Safe and Efficient I/O Virtualization Kaushik Kumar Ram, Jose Renato Santos, Yoshio Turner HP Laboratories HPL-21-39 Keyword(s): No keywords available. Abstract:
More informationIBM Research Report. The Turtles Project: Design and Implementation of Nested Virtualization
H-0282 (H1001-004) January 9, 2010 Computer Science IBM Research Report The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuda 1, Michael D. Day 2, Zvi Dubitzky 1, Michael
More informationCOMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy
COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:
More informationPCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate
NIC-PCIE-1SFP+-PLU PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate Flexibility and Scalability in Virtual
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 16: Virtual Machine Monitors Geoffrey M. Voelker Virtual Machine Monitors 2 Virtual Machine Monitors Virtual Machine Monitors (VMMs) are a hot
More informationModule 1: Virtualization. Types of Interfaces
Module 1: Virtualization Virtualization: extend or replace an existing interface to mimic the behavior of another system. Introduced in 1970s: run legacy software on newer mainframe hardware Handle platform
More informationImplementation and Analysis of Large Receive Offload in a Virtualized System
Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System
More informationIOMMU: Strategies for Mitigating the IOTLB Bottleneck
Author manuscript, published in "WIOSCA 2010 - Sixth Annual Workshorp on the Interaction between Operating Systems and Computer Architecture (2010)" IOMMU: Strategies for Mitigating the IOTLB Bottleneck
More information24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.
24-vm.txt Mon Nov 21 22:13:36 2011 1 Notes on Virtual Machines 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Tannenbaum, 3.2 Barham, et al., "Xen and the art of virtualization,"
More informationMultiprocessor Scheduling. Multiprocessor Scheduling
Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:
More informationMeasurement-based Analysis of TCP/IP Processing Requirements
Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the
More informationVirtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized
Starting Point: A Physical Machine Virtualization Based on materials from: Introduction to Virtual Machines by Carl Waldspurger Understanding Intel Virtualization Technology (VT) by N. B. Sahgal and D.
More informationVirtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization
Starting Point: A Physical Machine Virtualization Based on materials from: Introduction to Virtual Machines by Carl Waldspurger Understanding Intel Virtualization Technology (VT) by N. B. Sahgal and D.
More informationvnetwork Future Direction Howie Xu, VMware R&D November 4, 2008
vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008 Virtual Datacenter OS from VMware Infrastructure vservices and Cloud vservices Existing New - roadmap Virtual Datacenter OS from VMware Agenda
More informationvrio: Efficient Paravirtual Remote I/O Yossi Kuperman
vrio: Efficient Paravirtual Remote I/O Yossi Kuperman vrio: Efficient Paravirtual Remote I/O Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Sciencie in
More informationVALE: a switched ethernet for virtual machines
L < > T H local VALE VALE -- Page 1/23 VALE: a switched ethernet for virtual machines Luigi Rizzo, Giuseppe Lettieri Università di Pisa http://info.iet.unipi.it/~luigi/vale/ Motivation Make sw packet processing
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationPlatform Device Assignment to KVM-on-ARM Virtual Machines via VFIO
2014 International Conference on Embedded and Ubiquitous Computing Platform Device Assignment to KVM-on-ARM Virtual Machines via VFIO Antonios Motakis, Alvise Rigo, Daniel Raho Virtual Open Systems Grenoble,
More informationLecture 5: February 3
CMPSCI 677 Operating Systems Spring 2014 Lecture 5: February 3 Lecturer: Prashant Shenoy Scribe: Aditya Sundarrajan 5.1 Virtualization Virtualization is a technique that extends or replaces an existing
More informationNova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack
Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack Ian Jolliffe, Chris Friesen WHEN IT MATTERS, IT RUNS ON WIND RIVER. 2017 WIND RIVER. ALL RIGHTS RESERVED. Ian Jolliffe 2 2017
More informationLinux and Xen. Andrea Sarro. andrea.sarro(at)quadrics.it. Linux Kernel Hacking Free Course IV Edition
Linux and Xen Andrea Sarro andrea.sarro(at)quadrics.it Linux Kernel Hacking Free Course IV Edition Andrea Sarro (andrea.sarro(at)quadrics.it) Linux and Xen 07/05/2008 1 / 37 Introduction Xen and Virtualization
More informationAmazon EC2 Deep Dive. Michael #awssummit
Berlin Amazon EC2 Deep Dive Michael Hanisch @hanimic #awssummit Let s get started Amazon EC2 instances AMIs & Virtualization Types EBS-backed AMIs AMI instance Physical host server New root volume snapshot
More informationKVM Virtualized I/O Performance
KVM Virtualized I/O Performance Achieving Leadership I/O Performance Using Virtio- Blk-Data-Plane Technology Preview in Red Hat Enterprise Linux 6.4 Khoa Huynh, Ph.D. - Linux Technology Center, IBM Andrew
More informationImproving CPU Performance of Xen Hypervisor in Virtualized Environment
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 5 Issue 3; May-June 2018; Page No. 14-19 Improving CPU Performance of
More informationW H I T E P A P E R. What s New in VMware vsphere 4: Performance Enhancements
W H I T E P A P E R What s New in VMware vsphere 4: Performance Enhancements Scalability Enhancements...................................................... 3 CPU Enhancements............................................................
More informationOptimizing and Enhancing VM for the Cloud Computing Era. 20 November 2009 Jun Nakajima, Sheng Yang, and Eddie Dong
Optimizing and Enhancing VM for the Cloud Computing Era 20 November 2009 Jun Nakajima, Sheng Yang, and Eddie Dong Implications of Cloud Computing to Virtualization More computation and data processing
More informationConfiguring SR-IOV. Table of contents. with HP Virtual Connect and Microsoft Hyper-V. Technical white paper
Technical white paper Configuring SR-IOV with HP Virtual Connect and Microsoft Hyper-V Table of contents Abstract... 2 Overview... 2 SR-IOV... 2 Advantages and usage... 2 With Flex-10... 3 Setup... 4 Supported
More informationVirtual Machine Virtual Machine Types System Virtual Machine: virtualize a machine Container: virtualize an OS Program Virtual Machine: virtualize a process Language Virtual Machine: virtualize a language
More informationVirtual Machine Monitors (VMMs) are a hot topic in
CSE 120 Principles of Operating Systems Winter 2007 Lecture 16: Virtual Machine Monitors Keith Marzullo and Geoffrey M. Voelker Virtual Machine Monitors Virtual Machine Monitors (VMMs) are a hot topic
More informationEvolution of the netmap architecture
L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/
More informationMission-Critical Enterprise Linux. April 17, 2006
Mission-Critical Enterprise Linux April 17, 2006 Agenda Welcome Who we are & what we do Steve Meyers, Director Unisys Linux Systems Group (steven.meyers@unisys.com) Technical Presentations Xen Virtualization
More informationI/O and virtualization
I/O and virtualization CSE-C3200 Operating systems Autumn 2015 (I), Lecture 8 Vesa Hirvisalo Today I/O management Control of I/O Data transfers, DMA (Direct Memory Access) Buffering Single buffering Double
More informationVirtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Virtual Machines Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today's Topics History and benefits of virtual machines Virtual machine technologies
More informationDynamic Translator-Based Virtualization
Dynamic Translator-Based Virtualization Yuki Kinebuchi 1,HidenariKoshimae 1,ShuichiOikawa 2, and Tatsuo Nakajima 1 1 Department of Computer Science, Waseda University {yukikine, hide, tatsuo}@dcl.info.waseda.ac.jp
More informationOperating Systems 4/27/2015
Virtualization inside the OS Operating Systems 24. Virtualization Memory virtualization Process feels like it has its own address space Created by MMU, configured by OS Storage virtualization Logical view
More informationAbstract. Testing Parameters. Introduction. Hardware Platform. Native System
Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler
More information10 Steps to Virtualization
AN INTEL COMPANY 10 Steps to Virtualization WHEN IT MATTERS, IT RUNS ON WIND RIVER EXECUTIVE SUMMARY Virtualization the creation of multiple virtual machines (VMs) on a single piece of hardware, where
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationLive Virtual Machine Migration with Efficient Working Set Prediction
2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore Live Virtual Machine Migration with Efficient Working Set Prediction Ei Phyu Zaw
More informationLecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems
Lecture 7 Xen and the Art of Virtualization Paul Braham, Boris Dragovic, Keir Fraser et al. Advanced Operating Systems 16 November, 2011 SOA/OS Lecture 7, Xen 1/38 Contents Virtualization Xen Memory CPU
More informationXen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016
Xen and the Art of Virtualization CSE-291 (Cloud Computing) Fall 2016 Why Virtualization? Share resources among many uses Allow heterogeneity in environments Allow differences in host and guest Provide
More informationChapter 3 Virtualization Model for Cloud Computing Environment
Chapter 3 Virtualization Model for Cloud Computing Environment This chapter introduces the concept of virtualization in Cloud Computing Environment along with need of virtualization, components and characteristics
More informationCS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives
CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives Virtual Machines Resource Virtualization Separating the abstract view of computing resources from the implementation of these resources
More informationWhat s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1
What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationNetwork device virtualization: issues and solutions
Network device virtualization: issues and solutions Ph.D. Seminar Report Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Debadatta Mishra Roll No: 114050005
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationTodd Deshane, Ph.D. Student, Clarkson University Xen Summit, June 23-24, 2008, Boston, MA, USA.
Todd Deshane, Ph.D. Student, Clarkson University Xen Summit, June 23-24, 2008, Boston, MA, USA. Xen and the Art of Virtualization (2003) Reported remarkable performance results Xen and the Art of Repeated
More informationreferences Virtualization services Topics Virtualization
references Virtualization services Virtual machines Intel Virtualization technology IEEE xplorer, May 2005 Comparison of software and hardware techniques for x86 virtualization ASPLOS 2006 Memory resource
More informationThe Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36
The Challenges of X86 Hardware Virtualization GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization X86 operating systems are designed to run directly on the bare-metal hardware,
More informationRevisiting virtualized network adapters
Revisiting virtualized network adapters Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Università di Pisa, Italy rizzo@iet.unipi.it, http://info.iet.unipi.it/ luigi/vale/ Draft 5 feb 2013. Please do
More informationA Userspace Packet Switch for Virtual Machines
SHRINKING THE HYPERVISOR ONE SUBSYSTEM AT A TIME A Userspace Packet Switch for Virtual Machines Julian Stecklina OS Group, TU Dresden jsteckli@os.inf.tu-dresden.de VEE 2014, Salt Lake City 1 Motivation
More informationReducing CPU usage of a Toro Appliance
Reducing CPU usage of a Toro Appliance Matias E. Vara Larsen matiasevara@gmail.com Who am I? Electronic Engineer from Universidad Nacional de La Plata, Argentina PhD in Computer Science, Universite NiceSophia
More informationMaking Nested Virtualization Real by Using Hardware Virtualization Features
Making Nested Virtualization Real by Using Hardware Virtualization Features May 28, 2013 Jun Nakajima Intel Corporation 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationCLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University
CLOUD COMPUTING IT0530 G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University What is virtualization? Virtualization is way to run multiple operating systems and user applications on the same
More informationVirtualization. Michael Tsai 2018/4/16
Virtualization Michael Tsai 2018/4/16 What is virtualization? Let s first look at a video from VMware http://www.vmware.com/tw/products/vsphere.html Problems? Low utilization Different needs DNS DHCP Web
More informationMulti-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services
Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level s Kartik Gopalan, Rohith Kugve, Hardik Bagdi, Yaohui Hu Binghamton University Dan Williams, Nilton Bila IBM T.J. Watson Research
More informationNested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation
Nested Virtualization Update From Intel Xiantao Zhang, Eddie Dong Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationCS-580K/480K Advanced Topics in Cloud Computing. VM Virtualization II
CS-580K/480K Advanced Topics in Cloud Computing VM Virtualization II 1 How to Build a Virtual Machine? 2 How to Run a Program Compiling Source Program Loading Instruction Instruction Instruction Instruction
More informationVirtualisation: The KVM Way. Amit Shah
Virtualisation: The KVM Way Amit Shah amit.shah@qumranet.com foss.in/2007 Virtualisation Simulation of computer system in software Components Processor Management: register state, instructions, exceptions
More informationA Memory Management of Virtual Machines Created By KVM Hypervisor
A Memory Management of Virtual Machines Created By KVM Hypervisor Priyanka Career Point University Kota, India ABSTRACT virtualization logical abstraction of physical resource is possible which is very
More informationAchieve Low Latency NFV with Openstack*
Achieve Low Latency NFV with Openstack* Yunhong Jiang Yunhong.Jiang@intel.com *Other names and brands may be claimed as the property of others. Agenda NFV and network latency Why network latency on NFV
More informationNetchannel 2: Optimizing Network Performance
Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development
More informationNetwork Adapters. FS Network adapter are designed for data center, and provides flexible and scalable I/O solutions. 10G/25G/40G Ethernet Adapters
Network Adapters IDEAL FOR DATACENTER, ENTERPRISE & ISP NETWORK SOLUTIONS FS Network adapter are designed for data center, and provides flexible and scalable I/O solutions. 10G/25G/40G Ethernet Adapters
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and
More informationNested Virtualization Friendly KVM
Nested Virtualization Friendly KVM Sheng Yang, Qing He, Eddie Dong 1 Virtualization vs. Nested Virtualization Single-Layer Virtualization Multi-Layer (Nested) Virtualization (L2) Virtual Platform (L1)
More informationPCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)
NIC-PCIE-4SFP+-PLU PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based) Key Features Quad-port 10 GbE adapters PCI Express* (PCIe) 3.0, x8 Exceptional Low Power Adapters Network Virtualization
More informationSR-IOV Networking in Xen: Architecture, Design and Implementation
SR-IOV Networking in Xen: Architecture, Design and Implementation Yaozu Dong, Zhao Yu and Greg Rose Abstract. SR-IOV capable network devices offer the benefits of direct I/O throughput and reduced CPU
More informationIntel Virtualization Technology Roadmap and VT-d Support in Xen
Intel Virtualization Technology Roadmap and VT-d Support in Xen Jun Nakajima Intel Open Source Technology Center Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationImprove VNF safety with Vhost-User/DPDK IOMMU support
Improve VNF safety with Vhost-User/DPDK IOMMU support No UIO anymore! Maxime Coquelin Software Engineer KVM Forum 2017 AGENDA Background Vhost-user device IOTLB implementation Benchmarks Future improvements
More informationVirtualization and memory hierarchy
Virtualization and memory hierarchy Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationMotivation CPUs can not keep pace with network
Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology
More informationVirtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels
Virtualization Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels 1 What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory,
More informationI/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班
I/O virtualization Jiang, Yunhong Yang, Xiaowei 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More information