Embedded Systems: Projects

Similar documents
Embedded Systems: Projects

Embedded Systems 1: Course Presentation

Embedded Systems 1: On Chip Bus

Computer Architecture!

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Computer Architecture!

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Getting to Work with OpenPiton. Princeton University. OpenPit

Spring 2016 :: CSE 502 Computer Architecture. Introduction. Nima Honarmand

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Lecture 1: Introduction

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Getting to Work with OpenPiton

Formal for Everyone Challenges in Achievable Multicore Design and Verification. FMCAD 25 Oct 2012 Daryl Stewart

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Computer Architecture

Computer Architecture!

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)

FPGA Entering the Era of the All Programmable SoC

Parallel Computing: Parallel Architectures Jin, Hai

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Computer Architecture

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

CS 250 VLSI Design Lecture 11 Design Verification

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Power Aware Architecture Design for Multicore SoCs

Microprocessor Trends and Implications for the Future

Dr. Yassine Hariri CMC Microsystems

VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

Tutorial on Software-Hardware Codesign with CORDIC

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

Multi-core Architectures. Dr. Yingwu Zhu

CS3350B Computer Architecture. Introduction

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Integrating the Par Lab Stack Running Damascene on SEJITS/ROS/RAMP Gold

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Computer Architecture

Verifying the Correctness of the PA 7300LC Processor

Copyright 2012, Elsevier Inc. All rights reserved.

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

Design methodology for multi processor systems design on regular platforms

NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions

Copyright 2016 Xilinx

Micro-Architectural Attacks and Countermeasures

Effective System Design with ARM System IP

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

! Readings! ! Room-level, on-chip! vs.!

Adapted from David Patterson s slides on graduate computer architecture

12. Use of Test Generation Algorithms and Emulation

FPGA briefing Part II FPGA development DMW: FPGA development DMW:

Performance Verification for ESL Design Methodology from AADL Models

2 TEST: A Tracer for Extracting Speculative Threads

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

PACE: Power-Aware Computing Engines

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Multicore Hardware and Parallelism

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics

Embedded Systems. 7. System Components

Architecture of An AHB Compliant SDRAM Memory Controller

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

IWES st Italian Workshop on Embedded Systems Pisa September 2016

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

CS 6534: Tech Trends / Intro

PowerAware RTL Verification of USB 3.0 IPs by Gayathri SN and Badrinath Ramachandra, L&T Technology Services Limited

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

Computer Architecture. Fall Dongkun Shin, SKKU

Read this before starting!

Validation Strategies with pre-silicon platforms

DO-254 Testing of High Speed FPGA Interfaces by Nir Weintroub, CEO, and Sani Jabsheh, Verisense

Design and Implementation of High Performance DDR3 SDRAM controller

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Outline Marquette University

Deep Learning Accelerators

World Class Verilog & SystemVerilog Training

Digital Logic Design Lab

Memory Systems IRAM. Principle of IRAM

Control Hazards. Prediction

Extending the Power of FPGAs to Software Developers:

Transcription:

November 2016 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni

Contacts & Places Prof. William Fornaciari (Professor in charge) email: william.fornaciari@polimi.it webpage: home.dei.polimi.it/fornacia Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni

Research Activities RTL Design and Verification Embedded CPUs Cache Coherence Design Interconnect design Complex multi-core analysis Security-aware SoC design Multi-core Design and Simulation Cache hierarchy in multi-cores NoC-cache design space exploration CPU-GPU architectures NoC optimization

Types of projects Bibliographic research (3 points max) state of the art on a specific topic material organization and presentation comparing different approaches Development project (9 points max) In depth understanding of the tools you are working with Basic theoretical background for the problem SW Coding/HW design

Projects 1 (Area: HW Design) Title: RTL Router Design in SystemVerilog Computer architecture, SystemVerilog The on-chip router represents the key component in the NoC. The project requires to design and implement a simple NoC router that supports Virtual Channels. The four stage architecture is the baseline solution while the VA-SA Speculative implementation represents a critical add-on for the project. The final design requires a complete TestBench for regressions. 1. SystemVerilog 2. Designing Network On-Chip Architectures in the Nanoscale Era, J.Flich and D. Bertozzi 2010 Free download: http://www.crcnetbase.com/isbn/9781439837115

Projects 2 (Area: HW Design) [discontinued] Title: HANDSHAKE Resynchronizer in SystemVerilog Computer architecture DVFS represents a key hardware mechanism to optimize power and performance in a chip. However, the use of different Voltage and Frequency Islands (VFIs) impose to resynchronize signals at each VFI boundary. In this perspective two families of resynchronization scheme can be used: handshake or FIFO. The project requires to implement a simple handshake resynchronizer starting from the DFS support provided by Xilinx FPGA. 1. Metastability - ( http://www.asic-world.com/tidbits/metastablity.html ) 2. Additional material provided by the teaching assistant

Projects 3 (Area: HW Design) [discontinued] Title: FIFO Resynchronizer in SystemVerilog Computer architecture DVFS represents a key hardware mechanism to optimize power and performance in a chip. However, the use of different Voltage and Frequency Islands (VFIs) impose to resynchronize signals at each VFI boundary. In this perspective two families of resynchronization scheme can be used: handshake or FIFO. The project requires to implement a simple FIFO resynchronizer starting from the DFS support provided by Xilinx FPGA. 1. Metastability - ( http://www.asic-world.com/tidbits/metastablity.html ) 2. Additional material provided by the teaching assistant

Projects 4 (Area: HW Design) Title: Superscalar Embedded CPU Design Computer architecture, SystemVerilog The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split L1 caches. The project requires to enhance the provided architecture with a dual issue implementation. A complete validation of the final solution is also required. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 5 (Area: HW Design) Title: Write-back Cache Implementation for an Embedded CPU Computer architecture, SystemVerilog The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The student is required to modify the cache implementation to support the more aggressive write-back cache writing mode. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 6 (Area: HW Design) Title: Performance Counter Support for an Embedded CPU Computer architecture, SystemVerilog The OpenRisc represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The performance counter represents a critical resource to analyze the architecture at run-time. The project requires to develop the minimal performance counter hardware support as well as the software side counterpart to read them for the following metrics: cpu-idle, L1 miss, L1 accesses, per-pipeline-stage stalls, branch-misspredictions. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 7 (Area: HW Design) Title: Branch Prediction Schemes for an Embedded CPU Computer architecture, SystemVerilog Considering embedded CPUs, the branch prediction scheme strongly influences the overall system performance since the CPU is usually a single issue inorder architecture.the OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The project requires to explore the already implemented branch prediction algorithms and implements few more to improve the CPU performance. The validation and design space exploration analysis will complete the project. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 8 (Area: HW Design) Title: WISHBONE-compliant Bus Encryption for an Embedded CPU Computer architecture, SystemVerilog Considering the embedded SoCs the bus encryption represents a valuable features to prevent information leakage thus securing the architecture against the side-channel attack methodologies. The OpenRisc Architecture represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. It implements a 6 stages CPU pipeline with split, write-through L1 caches. The project requires the implementation of a flexible bus encryption scheme for the considered SoC. The trade-off analysis comparing the additional requested resources (area and power) and the performance and security metrics complemented the project outcome. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 9 (Area: HW Design) Title: High Level Synthesis for Security Computer architecture, SystemVerilog The High Level Synthesis (HLS) allows to transform a software encoded algorithm into an hardware description language specification with the final goal to speed up portions of a complex algorithm in hardware thanks to ad-hoc accelerators. However, the automated code transformation process can result in a suboptimal design from the performance, power area or security viewpoints. The project aims to compare different cryptographic algorithms encoded in both hardware and software against the output from the HLS tool integrated in the Xilinx Vivado Software Suite. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )

Projects 10 (Area: Architecture Simulation) Title: System Cache and Cache Partitioning in big.little architectures Computer architecture, C++, Python Embedded multi-core solutions are embedded in smartphone, tablets and smart devices with a net impact on our daily life. However, the design of such architectures is strongly constrained by both the power consumption and limited by traditional bus-based on-chip interconnect and cache hierarchies. The project focuses on the cache partitioning schemes for LLC to contribute in the delivering of the next embedded multi-core reference architecture. A full-system, Linux-based, clustered multi-core will be explored considering different cache hierarchies and partitioning schemes using the PARSEC benchmark suite from the application side. 1. GEM5 - http://gem5.org/main_page 2. LLC Partitioning Schemes - RECAP: Region-Aware Cache Partitioning

Projects 11 (Area: Architecture Simulation) Title: The impact of the prefetcher in big.little architectures Computer architecture, C++, Python The prefetcher emerges as a greedy master for cache lines, thus greatly contributing to the final performance of the overall system. Prefetching too late cannot shadow the memory access time, while prefetching too early waist cache lines. The scenario is further complicated by the running applications that competing for the same shared cache resources. The project aims to implement a simple cache partitioning scheme to evaluate and eventually constraint the prefetcher greediness. Different prefetchers coupled with the partitioning scheme will be evaluated. A full-system, Linux-based, clustered multi-core will be explored considering different cache hierarchies and partitioning schemes using the PARSEC benchmark suite from the application side. 1. GEM5 - http://gem5.org/main_page 2. LLC Partitioning Schemes - RECAP: Region-Aware Cache Partitioning

Projects 12 (Area: Architecture Simulation) Title: CPU-GPU multi-core simulators Computer architecture, C++, Python The multi-cores are ubiquitous and the user expects the same performance regardless device at hand, i.e. smartphone, tablet, notebook, desktop. In this scenario the multimedia experience is becoming of paramount importance to deliver a successful architecture, thus chip factories are providing multi-cores endowed with powerful GPUs. The simulation still represents a critical design stage for the early architecture evaluation and the possibility to simulate CPU and GPU at the same time can represents a great advantage for the design architects. The project requires a complete exploration of the gem5-gpu simulation toolchain that allows to execute CUDA kernels in a full-system cycle accurate simulator. 1. GEM5 - http://gem5.org/main_page 2. GEM5-GPU: gem5-gpu: A Heterogeneous CPU-GPU Simulator 3. No Mali: the ARM solution to mimic the GPU in gem5

Projects 13 (Area: HW Design) Title: OpenSparc T2 onto the Xilinx XUPV5-LX110T FPGA Students: <= 3 Computer architecture, Verilog, Xilinx Software The OpenSparc project aims to deliver an high performance multi-core platform to the academic community. OpenSPARC T2 is derived from the UltraSPARC T2 processor, a 64 bit eight core multi-threaded microprocessor. The students are required to boot-up the OpenSparc system onto the compatible XUPV5 FPGA using the ISE toolchain from Xilinx. A set of experiments with single- and multi-threaded applications complement the project assignement. 1. OpenSparc T2: http://www.oracle.com/technetwork/systems/opensparc/opensparc-t2-page-1446157.html#t2- to-use 2. OpenPiton: http://parallel.princeton.edu/openpiton/

Projects 14 (Area: HW Design) Title: Consistency Memory Models on a real multicore Verilog, C/C++ The memory consistency model describes the behavior of the shared memory system for programmers and implementors in terms of correctness. The OpenPiton implements the OpenSparcT1 architecture as the base building block and it is publicly available. The project requires to change the CPU-2-memory interface to explore the benefit of the most prominent consistency models: Sequential Consistency (SC), Total Store Order (TSO), Weak Consistency (WC). 1. A Primer on Memory Consistency and Cache Coherence, Sorin,Hill,Wood 2011 2. OpenPiton: http://parallel.princeton.edu/openpiton/

Projects 15 (Area: Architecture Simulation) Title: ElasticTrace (ARM) Computer architecture, C++, Python The cycle accurate simulation is a viable means to support the Design Space Exploration at early design stages. However, the complex multi-core makes such an evaluation technique extremely time consuming, thus allowing only a small subset of the design space to be explored. The Elastic Trace methodology has been developed at ARM (Samos-2016) to relieve the simulation burden generated by the simulation of complex out-of-order CPU models. Simulation traces are extracted once and can be replayed on a different architecture to validate the differences in terms of performance and power consumption between the two solutions, thus aggressively trimming down the simulation time. The student is required to evaluate the ARM solution considering different multi-core architectures to validate the simulation speed-up. 1. http://gem5.org/tracecpu 2. GEM5

Projects 16 (Area: Architecture Simulation) Title: SynchroTrace Computer architecture, C++, Python SynchroTrace has been developed at the Drexel Lab (Philadelphia University) to support the fast architectural explorations of multi-cores. The methodology should provide the same benefit of the ARM ElasticTrace solution while it delivers few additional DSE features. The student is required to evaluate the SynchroTrace solution considering different multi-core architectures to validate the simulation speed-up. 1. Synchrotrace tutorial http://ece.drexel.edu/faculty/taskin/wiki/vlsilab/index.php/tutorials:synchrotrace_sigil_iiswc _2016 2. GEM5

Projects 17 (Area: HW Design) Title: Rowhammer analysis on FPGAs Verilog, C/C++ The rowhammer is a security-based attack methodology that exploits the unintended side effect in DRAM memory cells of leaking their charges and possibly altering the content of nearby memory rows not involved in the memory access. Many memory vendors are updating their devices to face such a threat, while several devices will not be updated due to the high costs of the transition to the new model. The FPGAs falls in this category since the update to the new device version is expensive. The project requires to explore the possibility to attack an SDRAM equipped FPGA using the rowhammer methodology. 1. Google Project Zero: https://googleprojectzero.blogspot.it/2015/03/exploiting-dramrowhammer-bug-to-gain.html 2. Drammer: Deterministic Rowhammer Attacks on Mobile Platforms Veen et. al., CCS-2016

Projects 18 (Area: HW Design) Title: OpenRisc Mor1kx - porting to FPGA Computer architecture, SystemVerilog The OpenRisc represents the de-facto open-hw architecture and ISA. The mor1kx is an open source, Verilog implementation that is fully OpenRisc compliant. The project requires to port the design to one of the FPGAs that are available in the laboratory. A complete regression test is part of the project, while the port of the Linux OS is considered a plus. 1. SystemVerilog 2. Computer Architecture: A Quantitative Approach ( >=3rd edition )