November 2016 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni
Contacts & Places Prof. William Fornaciari (Professor in charge) email: william.fornaciari@polimi.it webpage: home.dei.polimi.it/fornacia Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni
Research Activities RTL Design and Verification Embedded CPUs Cache Coherence Design Interconnect design Complex multi-core analysis Security-aware SoC design Multi-core Design and Simulation Cache hierarchy in multi-cores NoC-cache design space exploration CPU-GPU architectures NoC optimization
Types of projects Bibliographic research (3 points max) state of the art on a specific topic material organization and presentation comparing different approaches Development project (9 points max) In depth understanding of the tools you are working with Basic theoretical background for the problem SW Coding/HW design
Projects 1 (Area: HW Design) Title: RTL Router Design in SystemVerilog Computer architecture, SystemVerilog The on-chip router represents the key component in the NoC. The project requires to design and implement a simple NoC router that supports Virtual Channels. The four stage architecture is the baseline solution while the VA-SA Speculative implementation represents a critical add-on for the project. The final design requires a complete TestBench for regressions. 1. SystemVerilog 2. Designing Network On-Chip Architectures in the Nanoscale Era, J.Flich and D. Bertozzi 2010 Free download: http://www.crcnetbase.com/isbn/9781439837115
Projects 2 (Area: HW Design) [discontinued] Title: HANDSHAKE Resynchronizer in SystemVerilog Computer architecture DVFS represents a key hardware mechanism to optimize power and performance in a chip. However, the use of different Voltage and Frequency Islands (VFIs) impose to resynchronize signals at each VFI boundary. In this perspective two families of resynchronization scheme can be used: handshake or FIFO. The project requires to implement a simple handshake resynchronizer starting from the DFS support provided by Xilinx FPGA. 1. Metastability - ( http://www.asic-world.com/tidbits/metastablity.html ) 2. Additional material provided by the teaching assistant
Projects 3 (Area: HW Design) [discontinued] Title: FIFO Resynchronizer in SystemVerilog Computer architecture DVFS represents a key hardware mechanism to optimize power and performance in a chip. However, the use of different Voltage and Frequency Islands (VFIs) impose to resynchronize signals at each VFI boundary. In this perspective two families of resynchronization scheme can be used: handshake or FIFO. The project requires to implement a simple FIFO resynchronizer starting from the DFS support provided by Xilinx FPGA. 1. Metastability - ( http://www.asic-world.com/tidbits/metastablity.html ) 2. Additional material provided by the teaching assistant
Projects 4 (Area: HW Design) Title: Superscalar Embedded CPU Design Computer architecture, SystemVerilog The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split L1 caches. The project requires to enhance the provided architecture with a dual issue implementation. A complete validation of the final solution is also required. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 5 (Area: HW Design) Title: Write-back Cache Implementation for an Embedded CPU Computer architecture, SystemVerilog The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The student is required to modify the cache implementation to support the more aggressive write-back cache writing mode. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 6 (Area: HW Design) Title: Performance Counter Support for an Embedded CPU Computer architecture, SystemVerilog The OpenRisc represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The performance counter represents a critical resource to analyze the architecture at run-time. The project requires to develop the minimal performance counter hardware support as well as the software side counterpart to read them for the following metrics: cpu-idle, L1 miss, L1 accesses, per-pipeline-stage stalls, branch-misspredictions. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 7 (Area: HW Design) Title: Branch Prediction Schemes for an Embedded CPU Computer architecture, SystemVerilog Considering embedded CPUs, the branch prediction scheme strongly influences the overall system performance since the CPU is usually a single issue inorder architecture.the OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The project requires to explore the already implemented branch prediction algorithms and implements few more to improve the CPU performance. The validation and design space exploration analysis will complete the project. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 8 (Area: HW Design) Title: WISHBONE-compliant Bus Encryption for an Embedded CPU Computer architecture, SystemVerilog Considering the embedded SoCs the bus encryption represents a valuable features to prevent information leakage thus securing the architecture against the side-channel attack methodologies. The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The project requires the implementation of a flexible bus encryption scheme for the considered SoC. The trade-off analysis comparing the additional requested resources (area and power) and the performance and security metrics complemented the project outcome. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 9 (Area: HW Design) Title: High Level Synthesis for Security Computer architecture, SystemVerilog The High Level Synthesis (HLS) allows to transform a software encoded algorithm into an hardware description language specification with the final goal to speed up portions of a complex algorithm in hardware thanks to ad-hoc accelerators. However, the automated code transformation process can result in a suboptimal design from the performance, power area or security viewpoints. The project aims to compare different cryptographic algorithms encoded in both hardware and software against the output from the HLS tool integrated in the Xilinx Vivado Software Suite. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )
Projects 10 (Area: Architecture Simulation) Title: System Cache and Cache Partitioning in big.little architectures Computer architecture, C++, Python Embedded multi-core solutions are embedded in smartphone, tablets and smart devices with a net impact on our daily life. However, the design of such architectures is strongly constrained by both the power consumption and limited by traditional bus-based on-chip interconnect and cache hierarchies. The project focuses on the cache partitioning schemes for LLC to contribute in the delivering of the next embedded multi-core reference architecture. A full-system, Linux-based, clustered multi-core will be explored considering different cache hierarchies and partitioning schemes using the PARSEC benchmark suite from the application side. 1. GEM5 - http://gem5.org/main_page 2. LLC Partitioning Schemes - RECAP: Region-Aware Cache Partitioning
Projects 11 (Area: Architecture Simulation) Title: The impact of the prefetcher in big.little architectures Computer architecture, C++, Python The prefetcher emerges as a greedy master for cache lines, thus greatly contributing to the final performance of the overall system. Prefetching too late cannot shadow the memory access time, while prefetching too early waist cache lines. The scenario is further complicated by the running applications that competing for the same shared cache resources. The project aims to implement a simple cache partitioning scheme to evaluate and eventually constraint the prefetcher greediness. Different prefetchers coupled with the partitioning scheme will be evaluated. A full-system, Linux-based, clustered multi-core will be explored considering different cache hierarchies and partitioning schemes using the PARSEC benchmark suite from the application side. 1. GEM5 - http://gem5.org/main_page 2. LLC Partitioning Schemes - RECAP: Region-Aware Cache Partitioning
Projects 12 (Area: Architecture Simulation) Title: CPU-GPU multi-core simulators Computer architecture, C++, Python The multi-cores are ubiquitous and the user expects the same performance regardless device at hand, i.e. smartphone, tablet, notebook, desktop. In this scenario the multimedia experience is becoming of paramount importance to deliver a successful architecture, thus chip factories are providing multi-cores endowed with powerful GPUs. The simulation still represents a critical design stage for the early architecture evaluation and the possibility to simulate CPU and GPU at the same time can represents a great advantage for the design architects. The project requires a complete exploration of the gem5-gpu simulation toolchain that allows to execute CUDA kernels in a full-system cycle accurate simulator. 1. GEM5 - http://gem5.org/main_page 2. GEM5-GPU: gem5-gpu: A Heterogeneous CPU-GPU Simulator 3. No Mali: the ARM solution to mimic the GPU in gem5
Projects 13 (Area: HW Design) Title: OpenSparc T2 onto the Xilinx XUPV5-LX110T FPGA Students: <= 3 Computer architecture, Verilog, Xilinx Software The OpenSparc project aims to deliver an high performance multi-core platform to the academic community. OpenSPARC T2 is derived from the UltraSPARC T2 processor, a 64 bit eight core multi-threaded microprocessor. The students are required to boot-up the OpenSparc system onto the compatible XUPV5 FPGA using the ISE toolchain from Xilinx. A set of experiments with single- and multi-threaded applications complement the project assignement. 1. OpenSparc T2: http://www.oracle.com/technetwork/systems/opensparc/opensparc-t2-page-1446157.html#t2- to-use 2. OpenPiton: http://parallel.princeton.edu/openpiton/
Projects 14 (Area: HW Design) Title: Consistency Memory Models on a real multicore Verilog, C/C++ The memory consistency model describes the behavior of the shared memory system for programmers and implementors in terms of correctness. The OpenPiton implements the OpenSparcT1 architecture as the base building block and it is publicly available. The project requires to change the CPU-2-memory interface to explore the benefit of the most prominent consistency models: Sequential Consistency (SC), Total Store Order (TSO), Weak Consistency (WC). 1. A Primer on Memory Consistency and Cache Coherence, Sorin,Hill,Wood 2011 2. OpenPiton: http://parallel.princeton.edu/openpiton/
Projects 15 (Area: Architecture Simulation) Title: ElasticTrace (ARM) Computer architecture, C++, Python The cycle accurate simulation is a viable means to support the Design Space Exploration at early design stages. However, the complex multi-core makes such an evaluation technique extremely time consuming, thus allowing only a small subset of the design space to be explored. The Elastic Trace methodology has been developed at ARM (Samos-2016) to relieve the simulation burden generated by the simulation of complex out-of-order CPU models. Simulation traces are extracted once and can be replayed on a different architecture to validate the differences in terms of performance and power consumption between the two solutions, thus aggressively trimming down the simulation time. The student is required to evaluate the ARM solution considering different multi-core architectures to validate the simulation speed-up. 1. http://gem5.org/tracecpu 2. GEM5
Projects 16 (Area: Architecture Simulation) Title: SynchroTrace Computer architecture, C++, Python SynchroTrace has been developed at the Drexel Lab (Philadelphia University) to support the fast architectural explorations of multi-cores. The methodology should provide the same benefit of the ARM ElasticTrace solution while it delivers few additional DSE features. The student is required to evaluate the SynchroTrace solution considering different multi-core architectures to validate the simulation speed-up. 1. Synchrotrace tutorial http://ece.drexel.edu/faculty/taskin/wiki/vlsilab/index.php/tutorials:synchrotrace_sigil_iiswc _2016 2. GEM5
Projects 17 (Area: HW Design) Title: Rowhammer analysis on FPGAs Verilog, C/C++ The rowhammer is a security-based attack methodology that exploits the unintended side effect in DRAM memory cells of leaking their charges and possibly altering the content of nearby memory rows not involved in the memory access. Many memory vendors are updating their devices to face such a threat, while several devices will not be updated due to the high costs of the transition to the new model. The FPGAs falls in this category since the update to the new device version is expensive. The project requires to explore the possibility to attack an SDRAM equipped FPGA using the rowhammer methodology. 1. Google Project Zero: https://googleprojectzero.blogspot.it/2015/03/exploiting-dramrowhammer-bug-to-gain.html 2. Drammer: Deterministic Rowhammer Attacks on Mobile Platforms Veen et. al., CCS-2016
Projects 18 (Area: HW Design) Title: OpenRisc Mor1kx - porting to FPGA Computer architecture, SystemVerilog The OpenRisc represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. The project requires to port the design to one of the FPGAs that are available in the laboratory. A complete regression test is part of the project, while the port of the Linux OS is considered a plus. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )