Filtering Metadata Lookups in Instruction-Grain Application Monitoring
|
|
- Jade Weaver
- 5 years ago
- Views:
Transcription
1 EDIC RESEARCH PROPOSAL 1 Filtering etadata Lookups in Instruction-Grain Application onitoring Yusuf Onur Kocberber Parallel Systems Architecture Lab (PARSA) Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland Abstract Dynamic Information Flow Tracking (DIFT) is a promising technique to perform instruction-grain application monitoring. DIFT detects software bugs by checking and analyzing every individual instruction at runtime. In softwareonly implementations of DIFT, performance degrades significantly (10-100x) because processor resources are shared between the application and the DIFT tool. Hardware-only implementations of DIFT eliminate the overhead, but focus on a specific monitoring tool or require invasive changes in the processor core. Log-Based Architectures (LBA) are flexible hardware frameworks to accelerate a wide range of instruction-grain DIFT tools. LBA leverage general-purpose multi-core chips with modest modifications to hardware but incur 3-5x slowdown. In this paper, we will introduce a custom metadata cache to eliminate the slowdown in LBA. By profiling metadata, we observed vast majority of metadata lookups are redundant. Our mechanism performs fast lookups to metadata and prevents invoking monitoring functionality by filtering events. I. INTRODUCTION S oftware debugging and verification are becoming challenging as computing systems are becoming faster and more complex. isbehaving systems negate all the design efforts that have been made to increase performance or reduce energy consumption. Bugs in complex software, often introduced by humans, are not only hard to catch, but also hard to recreate. This research plan has been approved: Proposal submitted to committee: September 30th, 2010; Candidacy exam date: October 7th, 2010; Candidacy exam committee: Giovanni De icheli, Babak Falsafi, Paolo Ienne. Date: Doctoral candidate: (name and signature) Thesis director: (name and signature) Thesis co-director: (if applicable) (name and signature) Doct. prog. director: (R. Urbanke) (signature) There have been tremendous efforts to remove or detect software bugs. These efforts include building static tools to analyze the software before execution, others include postmortem tools to analyze software after it crashes, and there are dynamic tools that monitor an application as it executes. Lifeguards are dynamic tools that perform instruction-grain application monitoring to catch problems as program executes. Instruction-grain monitoring collects very detailed information, such as memory references of the instruction or branch address computations. Collected information is critical to diagnose software problems such as memory access violations, data races and security exploits. oreover, dynamic monitoring prevents bugs, enables justin-time notifications and on-the-fly fixes. Dynamic Information Flow Tracking (DIFT) is a promising and widely known technique to detect software bugs at runtime [1]. The main idea of DIFT is to keep track of data status as application executes. For example, when DIFT is used for security, it marks the spurious data and tracks their propagation through the system. Lifeguards [2] are DIFT tools that monitor application s execution at instruction level. etadata are associated with every byte of memory and registers. As the execution of the application goes on, metadata are updated and checked by lifeguards. etadata checks ensure that an operation performed is safe or correct, (e.g., not a memory access violation). In this study, we focus on three lifeguards: (i) TaintCheck [3], which detects security exploitations, (ii) AddressCheck [4], which detects accesses to unallocated data, and (iii) emcheck [5], which detects accesses to both unallocated and uninitialized data. Unfortunately, instruction-grain lifeguards are slow because for every instruction executed in the application, the lifeguard should take an action according to its functionality. There are both software and hardware implementations of instruction-grain program monitoring. Software-only implementation of lifeguards, based on Dynamic Binary Instrumentation (DBI), does not require any modification to the existing system or recompilation. However, hardware resource sharing between the application and the monitoring tool, typically results in x slowdown.
2 EDIC RESEARCH PROPOSAL 2 DIFT etadata DIFT Logic Icache Decode Security Decode Reg File Reg File ALU ALU Dcache ain Core etadata Pipeline () ain Core Capture ain Core Analysis L2 L2 Compress L2 Log Buffer Decompress DRA DRA DRA (a) In-core DIFT (b) Off-core DIFT (c) Offloading DIFT Figure 1 Three design alternatives for hardware accelerated DIFT Hardware implementations remove resource sharing. Figure 1 shows three alternatives for hardware accelerated DIFT: (i) integrated in-core DIFT, (ii) an off-core coprocessor DIFT, and (iii) the multi-core based offloading DIFT. Integrated in-core DIFT performs checks in parallel with the processor pipeline. Slowdown is eliminated but it requires significant changes in the core. Off-core coprocessor DIFT uses a small (compared to the main core) specialized core without modifying the existing core, but dedicated hardware can only perform statically-defined security checks. In contrast, multi-core DIFT offers flexible, general-purpose framework. Log-Based Architectures (LBA), which are multi-core DIFT frameworks, use hardware as a substrate to perform instruction-grain monitoring. LBA is built on Chip ultiprocessors (CP), where application runs on one core and the application-monitoring lifeguard runs on another core. The application core and monitoring core communicate through the log buffer and two cores are detached from each other. Because application monitoring is performed on a general-purpose core (enhanced with log capturing), any lifeguard can be used without changing the existing hardware framework. LBA has performance overhead of 3-5x. Our aim is to further reduce the performance overheads. We propose a custom metadata cache, a lightweight hardware mechanism, to filter out the metadata lookups in instruction-grain application monitoring. The remainder of the paper is organized as follows. Section 2 discusses three design alternatives for hardware DIFT. Section 3 presents the idea of filtering metadata lookups and design of the custom metadata cache. Finally, Section 4 offers our conclusions. II. RELATED WORK Lifeguards monitor the application to check for possible misbehaviours. This work will focus on TaintCheck, AddrCheck and emcheck lifeguards. TaintCheck [3] detects overwrite-related security exploits. TaintCheck monitors all unverified input data (e.g., data coming from network) and marks memory locations as suspected or tainted. For every register and application byte, it keeps a single metadata bit that shows whether memory location is tainted or not. Data coming from unsecure channels are tainted and tainted status is propagated to the other locations, when the tainted data are used during the execution. A security exception is raised, when tainted data are used in critical ways, (e.g., as the program counter, as an instruction, as sensitive function or system call arguments). AddressCheck [4] detects accesses to unallocated data. emory allocation is tracked by intercepting malloc and free related system calls. AddressCheck maintains one accessibility bit per application byte and an exception is raised, if there is an access to unallocated data. Accessibility property is not propagated during the application s execution. [6] extends AddressCheck to detect the use of uninitialized data. emcheck maintains accessibility metadata, like AddressCheck, and extends the metadata with one initialization bit per application byte. Accessibility metadata are updated and checked as described above. A memory location is considered initialized, if a constant value is assigned to it. Initialized metadata are propagated for every instruction and the destination becomes uninitialized if at least one of the source operands of the instruction is uninitialized. Initialization metadata are cleared after free system calls. A. Integrated In-core Implementation of Lifeguards Integrated in-core implementation of lifeguards (hardwareonly lifeguards) performs metadata propagation and checks in parallel with the processor pipeline. Dedicated logic and storage are added in order to perform parallel checks. Figure 3 depicts the design of the hardware-only lifeguard implementation. Integrated approach eliminates two main sources of slowdown of the software-only approach; (i) metadata checks and updates and (ii) creating the state of the application. etadata checks and updates add minimal performance overhead because metadata are maintained by dedicated logic. Creating the state of the application is not needed because the lifeguard can access the state of the
3 EDIC RESEARCH PROPOSAL 3 Decoupling Queue Regs Core I-TLB D-TLB ain Core Decoupling Queue Queue Stall Instruction Tuple Reg File Security Decode ALU Check Logic Writeback Icache Dcache () L2 DRA Instruction Tuple Pc Instruction emory Address Valid L2 () Figure 3 Additional storage components (dark) in order to support hardware-only DIFT metadata, at any time of the execution. oreover, inter-core communication overhead is avoided because in-core DIFT approach does not need a separate core. However, in-core DIFT needs significant modifications to the existing core. For example, all pipeline stages must be modified to buffer the metadata associated with the pending instructions. Suh et al. [1] propose DIFT, which is an integrated in-core implementation of TaintCheck lifeguard. Operating system monitors input channels for spurious data and track all these data in the system by propagating and updating the metadata. As execution continues, metadata are checked transparently within the modified pipeline. If a suspected value is copied from one location to another, suspected value s metadata are also copied, otherwise the track of the spurious data will be lost. Although data are marked as unsafe, a security assertion will not necessarily occur for every event, which uses unsafe metadata. A security assertion will only be raised if the suspected data are used as an instruction or a jump target address. oreover, the DIFT technique also enables controlling the level of propagation according to different security policies and system sources. Suh et al. grouped instructions into four categories: (i) copy, (ii) computation, (iii) load, and (iv) store. According to the memory space and performance overheads, one can choose to track only a single group of instructions or can choose to track all at the same time. Suh et al. also target metadata storage overhead. Naïve implementation of metadata management uses 12.5% of the storage and bandwidth resources of the memory. However, as programs execute, a big part of the memory remains unchanged. This observation leads authors to propose an efficient metadata management system, which changes the granularity of the metadata. Every page is extended with two bits that show the granularity of the metadata in that page. For the performance evaluation, 17 benchmarks of SPEC CPU2000 suite are simulated. The security policy, which tracks copy, load and store instructions, incurs 0.2% Figure 2 Pipeline of coprocessor DIFT memory overhead, because 95% of pages are maintained at page-level granularity. There is no performance overhead for all but one benchmark, having 0.3% overhead. Another security policy which tracks all four instruction categories has the performance overhead of 0.8% on average and 6% in the worst case. DIFT has no performance overhead, but modifications to the core are significant. These modifications not only can have negative impact on design and verification time, but also can affect the clock frequency of the processor. oreover, the limited number of security policies is not feasible, when the amount of modifications needed is considered. B. Off-core Coprocessor implementation of Lifeguards Specialized processors or coprocessors are amenable to straightforward operations of DIFT. However, DIFT must synchronize with the main processor periodically to prevent the damage of the bugs or security attacks. Fine-grain (instruction-grain) synchronization between application and the monitoring tool is not practical, because monitoring latency will directly affect the overall system performance. Coarse-grain synchronization, on the other hand, decouples the monitoring functionality from the application until a synchronization point (e.g., system calls) is reached in the application. Decoupling enables detaching DIFT functionality from the main core. In general, DIFT does not need computationally complex operations. etadata are treated in small number of bytes and the necessary action includes some simple logic operations most of the time. For this reason, when decoupling is possible, specialized hardware is the best solution by means of performance and physical area. This approach will minimize not only the amount of storage and bandwidth needed, but also the number of components (e.g., ALUs) in a processor. Kannan et al. [7] propose an off-core DIFT coprocessor design, which decouples DIFT operations from the application. DIFT coprocessor is a specialized hardware to run TaintCheck lifeguard. Figure 2 shows the corresponding design. Application and lifeguard are synchronized only at system calls, so that the whole DIFT state and logic can be moved to the coprocessor. A small, FIFO queue between the main core and the coprocessor enables decoupled execution. ain processor inserts an instruction tuple to the
4 EDIC RESEARCH PROPOSAL 4 decoupling queue. An instruction tuple is decoded by the coprocessor and contains PC, instruction (opcode, operands, etc.), and memory addresses used. When the decoupling queue is full or a system call is encountered, the main processor should stall. If the decoupling queue is full because the coprocessor is having cache misses, the application performance will degrade. However, this event should be encountered at the same time with the application because we expect, at least the same locality in metadata since it represents the application data. The application slowdown will partially hide the coprocessor cache misses but misses will still cause memory contention between the coprocessor and the main processor. Dedicated coprocessor approach runs on FPGA using a fullsystem prototype and it executes SPECint2000 applications with less than 1% performance overhead. DIFT coprocessor is small and does not require any modification to the design, pipeline and the layout of the general-purpose core, or the cache hierarchy. The implementation of coprocessor showed amount of resources used is just 7% more compared to a RISC core. The main source of slowdown is the memory contention described earlier. An application, which behaves badly (e.g. has poor locality), can increase the slowdown up to 10%. On contrary, DIFT coprocessor approach is not flexible and it only supports a single lifeguard. There are various lifeguards, which are as important as TaintCheck. Even for the same lifeguard, if the metadata semantics are changed, the coprocessor should be redesigned. C. General Hardware-Accelerated Solutions for Lifeguards Previous hardware-assisted approaches lack either support for a variety of lifeguards or require significant changes in the existing processor. Leveraging multi-core for DIFT, known as Log-Based Architectures (LBA) [2], is a flexible hardware for instruction-grain application monitoring. Figure 4 shows the LBA system. While an instruction is committed in the application side, an event record corresponding to that instruction is captured and compressed in core1. Then, the record is delivered to the core2 through the log buffer, which is stored in L2 cache. Applicationmonitoring lifeguard fetches the event records from the log buffer one by one. When an event record is fetched, first, it is decompressed, and then dispatched to the core2. Because the lifeguard is also an application running on the core, it does not need to know about DIFT metadata or policies explicitly. Hence, a large variety of the lifeguards can be supported just by changing the running application on core2. In addition, the overhead of resource sharing and the need for recreating the hardware state are eliminated. The former does not exist, because the application-monitoring lifeguard does not share resources with the application. The latter is removed, because the event record has all the necessary information when the log is captured at the application side. However, instruction-grain nature of the monitoring still results in slowdowns, because nearly every instruction Application Core 1 Core 2 Capture Compress Operating System Log Buffer -TLB Decompress Lifeguard Analysis IT & IF Figure 4 Dual-core Log-Based Architecture System running on core1 (application core) needs an action at core2 (lifeguard core), which means several lifeguard instructions per application instruction. As a result, although LBA framework reduces the performance overhead significantly, there is still 3-5x slowdown. Chen et al. [8] propose three hardware mechanisms to reduce the overhead of the LBA framework. Proposed mechanisms are Unary Inheritance Tracking, Idempotent Filters, and etadata-tlb. Unary Inheritance Tracking (IT) aims at reducing the cost of metadata propagation event. Propagation tracking is one of the key sources of the overhead for the DIFT technique. For the lifeguards that need to track the flow of the data through registers extensively (e.g. TaintCheck, emcheck), metadata update and propagate events correspond to the large part of the execution time. IT tracks the inheritance of metadata for Unary Operations and delivers the update events to the lifeguard only when necessary. The destination operand for binary operations is set clean for all lifeguards, which IT is applicable to. However, the implementation of IT may change among different lifeguards depending on the semantics of metadata. For emcheck, a non-unary operation s destination operand is set clean and source operands are sent to the lifeguard to check their metadata. If the source operands are uninitialized, an error is issued at this point. Hence, cascading errors that will be caused by the same source are eliminated before propagating the metadata. For TaintCheck, checking only the unary operations is enough, for practical reasons, to detect security exploits. Inheritance Tracking can reduce the update events by 24-74%. Idempotent Filter (IF) targets lifeguards that perform metadata checks very frequently (e.g. AddrCheck, emcheck). any checks can be filtered because they are idempotent (redundant). For example, after allocating a memory location, following loads and/or stores to the same address do not need to be checked until the next free event. Idempotent Filter is designed as a cache and even with a
5 % of filtering opportunity EDIC RESEARCH PROPOSAL 5 small size (32 entries), it can filter nearly 50% of the check events of AddrCheck. etadata-tlb (-TLB) attacks the cost of metadata mapping. As we described earlier, metadata accesses are very common in the execution of the lifeguard, unless they are filtered. If an event needs to access metadata, first it should translate the application address to the metadata address. To achieve fast and efficient translation, -TLB, a TLB-like hardware structure, caches the latest translations of addresses. -TLB is accessed by a new instruction called LA (Load etadata Address). Hence, dynamic instruction count of lifeguards is reduced by 16-49%, because instead of using several instructions, the lifeguard uses a single instruction to translate the address. LBA framework has the performance overhead of 3x for AddrCheck, 3.5x for TaintCheck, and 8x for emcheck. emcheck is a heavy-weighted lifeguard, because of monitoring both allocation and initialization status of the application. However, LBA is the only platform (among the described) that supports variety of lifeguards, because LBA uses general-purpose cores without being aware of the DIFT metadata or policies explicitly. oreover, the modifications to the core are not significant. III. FILTERING ETADATA LOOKUPS LBA, as a general purpose framework, is an effective technique to perform instruction-grain application monitoring. Our aim is to further improve the framework and remove the performance overheads due to the instruction-grain nature of the lifeguards. Instruction-grain monitoring is slow. When an application is executed under LBA, almost all instructions, committed in application s pipeline, result in the execution of the corresponding event handler at the lifeguard side. An event handler is a set of instructions (e.g. around instructions on average), which take the necessary actions according to the lifeguard s functionality. Therefore, the number of instructions executed at the lifeguard side is roughly the number of instructions executed at the processor side multiplied by the average handler size. There are two ways to reduce the slowdown of LBA: (i) to reduce the average handler size and (ii) to reduce the number of events processed in the lifeguard. The average handler size is directly related to the lifeguard functionality and reducing the instruction count is possible only with a well accepted property that applies to all lifeguards. For example, -TLB reduces the number of instructions in each handler by removing metadata mapping operation from all of the event handlers for different lifeguards. Although nearly all handlers include address translation, reduced number of instructions is still not enough to overcome the performance overheads. Second option is to reduce the number of event handlers dispatched. etadata checks are frequent but AddressCheck emcheck TaintCheck Figure 5 Filtering effectiveness of metadata cache updates are not. An event handler can be filtered when the result of the handler execution will not change the metadata state or when the check operation will not assert any error. We propose a general, efficient and practical way of filtering out metadata lookups. A general technique should be compatible with different lifeguards metadata semantics. An efficient technique should have a high filtering rate (i.e., filtering should be the common case throughout the lifeguards execution). A practical technique should be implemented with a modest amount of hardware (i.e., less area compared to an L1 cache area in modern processors). A significant percentage of metadata lookups are redundant. Every lifeguard checks the state of metadata in order to monitor the application. etadata semantics imply that there is a clean value. For example, for AddrCheck lifeguard, if the memory location accessed is allocated, then the metadata value is clean. For TaintCheck, if the used value is not spurious, then it is clean and for emcheck, accessed data are clean when data are both allocated and initialized. Fortunately, metadata lookups return clean value for most of the checks. Although applications have bugs, the number of bugs is negligible, compared to the number of checks performed in an instruction-grain monitoring environment. This observation also states that the metadata lookups are redundant. If the metadata have the desired property (clean value), then there is no need to dispatch a handler to verify it. Fortunately, profiling experiments with Valgrind [6] show that majority of the accessed metadata are clean. Figure 5 depicts the filtering effectiveness of the metadata cache for three diverse lifeguards where each bar corresponds to the fraction of clean accesses in SPECint2000 benchmarks with ref input. As a result, our profiling results show that there is indeed a large percentage of filtering opportunity to be exploited. Fast hardware lookups with custom metadata cache. In order to filter out the redundant log entries, we need to know the status of the data corresponding to the log entry being dispatched at any point of execution. Therefore, status of registers and memory operands must be known when handlers are being dispatched. For this purpose, we want to be able to obtain metadata state just by performing a fast lookup in a small hardware structure.
6 EDIC RESEARCH PROPOSAL 6 ain Core Capture Compress L2 Log Buffer DRA Decompress CC Dispatch handler Fill Lookup? ain Core Analysis feasible because of their negative impact on design and verification time. Hardware coprocessor approach has the minimal modifications to the system but lacks flexibility. Finally, multi-core DIFT (LBA) framework is the desired solution by supporting diverse lifeguards with minimal overheads in the system. To eliminate performance overheads of LBA completely, we propose a custom metadata cache to filter metadata lookups in hardware. Profiling results show that vast majority of accesses to metadata are clean. Hence, custom metadata cache will eliminate the slowdown by filtering redundant checks of metadata for all lifeguards. Figure 6 Custom metadata cache (CC) for filtering Figure 6 shows the custom metadata cache we propose. is characterized as custom because application addresses are used to access the metadata cache. When an event record is fetched, a lookup is performed in the custom metadata cache to check whether event can be filtered or not. If the metadata value is clean, the corresponding event is filtered and we do not dispatch a handler. Conversely, if the metadata value is not clean, the event cannot be filtered and corresponding handler is dispatched. misses are also handled by the event handlers but the event handler performs only address calculations for the lifeguard s metadata mapping. Data operations like evicting, filling etc. are done in hardware. The lower hierarchy cache, for custom metadata cache, is L1 cache and serves fill and evict requests. Filtering opportunity of metadata lookups can only be extracted by having high hit rates. Our experiment shows that even with a memory bound benchmark (e.g. mcf), custom metadata cache hit rate is 95%. We expect to see even higher hit rates with other benchmarks. The only limitation is the miss penalty of the cache. The event handler dispatched in the event of a cache miss should not increase the average handler size of the lifeguard. Using a conventional L1 cache is also an option, however, required address translation, from application address to physical address, adds extra delay to the critical path, while our technique is built in to the dispatch logic and every lookup adds just an extra cycle. Furthermore, custom metadata cache is smaller than the conventional caches, because, for the three diverse lifeguards we studied, one byte of application s data can be represented by two-bit metadata. As a result, custom metadata cache can be smaller and faster than the conventional L1 cache, while maintaining the low miss rate like the L1 cache of application s core. IV. CONCLUSIONS In this work, we discussed three hardware design alternatives for instruction-grain application monitoring. Hardware mechanisms that are integrated in core are not V. REFERENCES [1] G. E., Lee, J. W., Zhang, D., and Devadas, S. Suh, "Secure program execution via dynamic information flow tracking," in Architectural Support For Programming Languages and Operating Systems, Boston, 2004, pp [2] S., Falsafi, B., Gibbons, P. B., Kozuch,., owry, T. C., Teodorescu, R., Ailamaki, A., Fix, L., Ganger, G. R., Lin, B., and Schlosser, S. W Chen, "Log-based architectures for general-purpose monitoring of deployed code," in Architectural and System Support For Improving Software Dependability, New York, 2006, pp [3] J Newsome and Song D, "Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software," in Network and Distributed System Security, San Diego, [4] N Nethercote, "Dynamic binary analysis and instrumentation," U. Cambridge, PhD Thesis [5] J. Seward and N. Nethercote, "Using Valgrind to detect undefined value errors with bit-precision," in USENIX Annual Technical Conference, Berkeley, 2005, pp [6] N Nethercote and J Seward, "Valgrind: a framework for heavyweight dynamic binary instrumentation," in Programming Language Design and Implementation, San Diego, 2007, pp [7] H. Kannan,. Dalton, and C Kozyrakis, "Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor," in Dependable Systems & Networks, Estoril, 2009, pp [8] S., Kozuch,., Strigkos, T., Falsafi, B., Gibbons, P. B., owry, T. C., Ramachandran, V., Ruwase, O., Ryan,., and Vlachos, E. Chen, "Flexible Hardware Acceleration for Instruction-Grain Program onitoring," in International Symposium on Computer Architecture, Washington, 2008, pp
Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor
Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help
More informationParallelizing Dynamic Information Flow Tracking
Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase Phillip B. Gibbons Todd C. Mowry Vijaya Ramachandran Shimin Chen Michael Kozuch Michael Ryan Carnegie Mellon University, Pittsburgh, PA,
More informationDecoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor
1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi Outline Motivation & Background Hardware DIFT
More informationHigh-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring
High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring Daniel Y. Deng and G. Edward Suh Computer Systems Laboratory, Cornell University Ithaca, New York 14850 {deng, suh}@csl.cornell.edu
More informationLogs and Lifeguards: Accelerating Dynamic Program Monitoring
Logs and Lifeguards: Accelerating Dynamic Program Monitoring S. Chen, B. Falsafi, P. B. Gibbons, M. Kozuch, T. C. Mowry, R. Teodorescu, A. Ailamaki, L. Fix, G. R. Ganger, S. W. Schlosser IRP-TR-06-05 INFORMATION
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More informationTRIPS: Extending the Range of Programmable Processors
TRIPS: Extending the Range of Programmable Processors Stephen W. Keckler Doug Burger and Chuck oore Computer Architecture and Technology Laboratory Department of Computer Sciences www.cs.utexas.edu/users/cart
More informationkguard++: Improving the Performance of kguard with Low-latency Code Inflation
kguard++: Improving the Performance of kguard with Low-latency Code Inflation Jordan P. Hendricks Brown University Abstract In this paper, we introduce low-latency code inflation for kguard, a GCC plugin
More informationAFRL-RI-RS-TR
AFRL-RI-RS-TR-2015-143 FLEXIBLE TAGGED ARCHITECTURE FOR TRUSTWORTHY MULTI-CORE PLATFORMS CORNELL UNIVERSITY JUNE 2015 FINAL TECHNICAL REPORT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED STINFO COPY
More informationThere are different characteristics for exceptions. They are as follows:
e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture
More informationRun-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering
Run-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering Daniel Lo, Tao Chen, Mohamed Ismail, and G. Edward Suh Cornell University Ithaca, NY 14850, USA {dl575, tc466, mii5, gs272}@cornell.edu
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationModule 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.
Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationChapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed
More informationHigh Performance SMIPS Processor
High Performance SMIPS Processor Jonathan Eastep 6.884 Final Project Report May 11, 2005 1 Introduction 1.1 Description This project will focus on producing a high-performance, single-issue, in-order,
More informationAries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX
Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554
More informationImproving Data Cache Performance via Address Correlation: An Upper Bound Study
Improving Data Cache Performance via Address Correlation: An Upper Bound Study Peng-fei Chuang 1, Resit Sendag 2, and David J. Lilja 1 1 Department of Electrical and Computer Engineering Minnesota Supercomputing
More informationIntroduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras
Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture 17 CPU Context Switching Hello. In this video
More informationPipelined processors and Hazards
Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationFADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
In Proceedings of the th International Symposium On High Performance Computer Architecture (HPCA 4) FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring Sotiria Fytraki, Evangelos
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationA Streaming Multi-Threaded Model
A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationBanshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth
More informationChapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.
More informationSimultaneous Multithreading Architecture
Simultaneous Multithreading Architecture Virendra Singh Indian Institute of Science Bangalore Lecture-32 SE-273: Processor Design For most apps, most execution units lie idle For an 8-way superscalar.
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationBeyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji
Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of
More informationChapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture
An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 The Von Neuman Architecture Control unit: responsible for deciding which instruction in a program
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationHardware Support for Software Debugging
Hardware Support for Software Debugging Mohammad Amin Alipour Benjamin Depew Department of Computer Science Michigan Technological University Report Documentation Page Form Approved OMB No. 0704-0188 Public
More informationDarek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June
Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June 21 2008 Agenda Introduction Gemulator Bochs Proposed ISA Extensions Conclusions and Future Work Q & A Jun-21-2008 AMAS-BT 2008 2 Introduction
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationLimiting the Number of Dirty Cache Lines
Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More information250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019
250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr
More informationRaksha: A Flexible Information Flow Architecture for Software Security
Raksha: A Flexible Information Flow Architecture for Software Security Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University {mwdalton, hkannan, kozyraki}@stanford.edu
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationTradeoff between coverage of a Markov prefetcher and memory bandwidth usage
Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end
More informationEmbedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi
Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed
More informationSpectre and Meltdown. Clifford Wolf q/talk
Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationSPECULATIVE MULTITHREADED ARCHITECTURES
2 SPECULATIVE MULTITHREADED ARCHITECTURES In this Chapter, the execution model of the speculative multithreading paradigm is presented. This execution model is based on the identification of pairs of instructions
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationCAD for VLSI 2 Pro ject - Superscalar Processor Implementation
CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationThe Impact of Instruction Compression on I-cache Performance
Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu
More informationKey Point. What are Cache lines
Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible
More informationMeltdown or "Holy Crap: How did we do this to ourselves" Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory
Meltdown or "Holy Crap: How did we do this to ourselves" Abstract Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory locations Breaks all security assumptions given
More informationEECS 570 Final Exam - SOLUTIONS Winter 2015
EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationWednesday, September 13, Chapter 4
Wednesday, September 13, 2017 Topics for today Introduction to Computer Systems Static overview Operation Cycle Introduction to Pep/9 Features of the system Operational cycle Program trace Categories of
More informationQuestion 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.
Questions 1 Question 13 1: (Solution, p ) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Question 13 : (Solution, p ) In implementing HYMN s control unit, the fetch cycle
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationLecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)
Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling
More informationMCD: A Multiple Clock Domain Microarchitecture
MCD: A Multiple Clock Domain Microarchitecture Dave Albonesi in collaboration with Greg Semeraro Grigoris Magklis Rajeev Balasubramonian Steve Dropsho Sandhya Dwarkadas Michael Scott Caveats We started
More informationSudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread
Intra-Warp Compaction Techniques Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Goal Active thread Idle thread Compaction Compact threads in a warp to coalesce (and eliminate)
More informationThe Design Complexity of Program Undo Support in a General-Purpose Processor
The Design Complexity of Program Undo Support in a General-Purpose Processor Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser
ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationDesign of Out-Of-Order Superscalar Processor with Speculative Thread Level Parallelism
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
More informationChecker Processors. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India
Advanced Department of Computer Science Indian Institute of Technology New Delhi, India Outline Introduction Advanced 1 Introduction 2 Checker Pipeline Checking Mechanism 3 Advanced Core Checker L1 Failure
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationSoftware Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationSUPERSCALAR AND VLIW PROCESSORS
Datorarkitektur I Fö 10-1 Datorarkitektur I Fö 10-2 What is a Superscalar Architecture? SUPERSCALAR AND VLIW PROCESSORS A superscalar architecture is one in which several instructions can be initiated
More informationMemory. Objectives. Introduction. 6.2 Types of Memory
Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More information1. Creates the illusion of an address space much larger than the physical memory
Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationWednesday, February 4, Chapter 4
Wednesday, February 4, 2015 Topics for today Introduction to Computer Systems Static overview Operation Cycle Introduction to Pep/8 Features of the system Operational cycle Program trace Categories of
More informationHW1 Solutions. Type Old Mix New Mix Cost CPI
HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by
More informationReview on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala
Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors By: Anvesh Polepalli Raj Muchhala Introduction Integrating CPU and GPU into a single chip for performance
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationChapter 8. Pipelining
Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.
More informationLeaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX
Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX W. Wang, G. Chen, X, Pan, Y. Zhang, XF. Wang, V. Bindschaedler, H. Tang, C. Gunter. September 19, 2017 Motivation Intel
More informationECE 341 Final Exam Solution
ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.
More informationChapter 7 The Potential of Special-Purpose Hardware
Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture
More informationA Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarangnirun,
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationComputer Architecture V Fall Practice Exam Questions
Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationModern Buffer Overflow Prevention Techniques: How they work and why they don t
Modern Buffer Overflow Prevention Techniques: How they work and why they don t Russ Osborn CS182 JT 4/13/2006 1 In the past 10 years, computer viruses have been a growing problem. In 1995, there were approximately
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.
More informationEfficient Sequential Consistency Using Conditional Fences
Efficient Sequential Consistency Using Conditional Fences Changhui Lin CSE Department University of California, Riverside CA 92521 linc@cs.ucr.edu Vijay Nagarajan School of Informatics University of Edinburgh
More information