Getting to Work with OpenPiton. Princeton University. OpenPit

Similar documents
Getting to Work with OpenPiton

Piton: A 25-core Academic Manycore Research Processor

OpenPiton: An Open Source Manycore Research Framework

OpenPiton in Action. Princeton University. OpenPit

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

Getting to Work with OpenPiton

Part 1 of 3 -Understand the hardware components of computer systems

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Embedded Systems: Projects

GreenDroid: An Architecture for the Dark Silicon Age

Spiral 3-1. Hardware/Software Interfacing

Fast architecture prototyping on FPGAs: frameworks, tools, and challenges

Simplifying FPGA Design for SDR with a Network on Chip Architecture

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

The HammerBlade: An ML-Optimized Supercomputer for ML and Graphs

Power and Energy Characterization of an Open Source 25-core Manycore Processor

Yet Another Implementation of CoRAM Memory

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Spiral 2-8. Cell Layout

IWES st Italian Workshop on Embedded Systems Pisa September 2016

ProtoFlex: FPGA Accelerated Full System MP Simulation

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Adaptable Intelligence The Next Computing Era

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Projects on the Intel Single-chip Cloud Computer (SCC)

SCORPIO: 36-Core Shared Memory Processor

Outline Marquette University

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

An Overview of Standard Cell Based Digital VLSI Design

SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center

edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next?

FABRICATION TECHNOLOGIES

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Creating a Scalable Microprocessor:

A New Era of Silicon Prototyping in Computer Architecture Research

ATLAS: A Chip-Multiprocessor. with Transactional Memory Support

Industry Collaboration and Innovation

45-year CPU Evolution: 1 Law -2 Equations

A Retrospective on Par Lab Architecture Research

Breaking the Boundaries in Heterogeneous-ISA Datacenters

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SpiNNaker - a million core ARM-powered neural HPC

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

HW Trends and Architectures

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović

Simply RISC S1 Core Specification. - version 0.1 -

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

SoC Platforms and CPU Cores

100M Gate Designs in FPGAs

Power Technology For a Smarter Future

POWER7: IBM's Next Generation Server Processor

RISECREEK: From RISC-V Spec to 22FFL Silicon

Low-Power Interconnection Networks

Third Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR

The Nios II Family of Configurable Soft-core Processors

Factors which influence in many core processors

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands

Pactron FPGA Accelerated Computing Solutions

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

How to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)

UPCRC Overview. Universal Computing Research Centers launched at UC Berkeley and UIUC. Andrew A. Chien. Vice President of Research Intel Corporation

OpenSPARC Program. David Weaver Principal Engineer, UltraSPARC Architecture Principal OpenSPARC Evangelist Sun Microsystems, Inc.

RAMP-White / FAST-MP

ProtoFlex Tutorial: Full-System MP Simulations Using FPGAs

Formal for Everyone Challenges in Achievable Multicore Design and Verification. FMCAD 25 Oct 2012 Daryl Stewart

RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA

Midterm Exam. Solutions

Copyright 2016 Xilinx

Software Defined Modem A commercial platform for wireless handsets

AMD Opteron 4200 Series Processor

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Part IV. Review of hardware-trends for real-time ray tracing

Parallelism and Concurrency. COS 326 David Walker Princeton University

POWER7: IBM's Next Generation Server Processor

KiloCore: A 32 nm 1000-Processor Array

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Philippe Thierry Sr Staff Engineer Intel Corp.

The Growing Designer Productivity Gap

CprE 488 Embedded Systems Design. Lecture 2 Embedded Platforms

FPGA: What? Why? Marco D. Santambrogio

Analyzing the Disruptive Impact of a Silicon Compiler

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

VISVESVARAYA TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY. A seminar report on CRUSOE PROCESSOR.

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Architecture Design Space Exploration Using RISC-V

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

Does FPGA-based prototyping really have to be this difficult?

TRIPS: A Distributed Explicit Data Graph Execution (EDGE) Microprocessor

Computer Architecture. What is it?

Lecture 7: Introduction to Co-synthesis Algorithms

GreenDroid: A Mobile Application Processor for a Future of Dark Silicon

Transcription:

Getting to Work with OpenPiton Princeton University http://openpiton.org OpenPit

Princeton Parallel Research Group Redesigning the Data Center of the Future Chip Architecture Operating Systems and Runtimes Power and Cooling Optimization Biodegradable Computing (Materials) 11 PhD Students 3 Undergraduates Winter 2016 Ski Trip 2

Support This work was partially supported by the NSF under Grants No. CCF-1217553, CCF-1453112, and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors. 3

The world s first open source, general purpose, multithreaded manycore processor Open source (GPL core, BSD uncore) manycore Written in Verilog RTL Scales to ½ billion cores Configurable core, uncore Includes synthesis and back-end flow ASIC & FPGA verified Runs full stack multi-user Debian Linux Great for Architecture, Programming Language, Compilers, Operating Systems, Security, EDA research 4

OpenPiton System Overview Tile 5

OpenPiton System Overview Tile Tile 6

OpenPiton System Overview 7

OpenPiton System Overview To Other Chips Interchip Router 8

OpenPiton System Overview To Other Chips DRAM Interchip Router 9

OpenPiton System Overview To Other Chips DRAM Interchip Router Wishbone Master AXI4-Lite Master AXI Master 10

Tile Overview To Other Tiles 11

Chipset Overview To Chip (Tiles) 12

Piton Chip Block Diagram Tile Network On-Chip Links Off-Chip Memory and I/O Interface 25-core 2 Threads per core 64-bit Architecture Modified OpenSPARC T1 Core 3 NoCs 64-bit, 2D Mesh Extend off-chip enabling multichip systems Directory-Based Cache System 64KB L2 Cache per core (Shared) 8KB L1.5 Data Cache 8KB L1 Data Cache 16KB L1 Instruction Cache IBM 32nm SOI Process 6mm x 6mm 460 Million Transistors Target: 1GHz Clock @ 900mV 208 Pin CQFP Package 13

Piton Chip Among largest chips ever built in academia Received silicon and has been tested working in lab 14

Unlidded Piton Chip Piton Chip 15 Piton Chip Piton Processor packaged in 208-pin Ceramic Quad Flat Pack (CQFP) Received ~100 die from IBM

Piton Test Setup DRAM + I/O Bridge FPGA Spartan 6 Bulk Decoupling Misc. Configuration Chipset FPGA Kintex 7 [McKeown et al, HotChips 2016] Piton + Heat Sink Power Supply 16

Tetris on Debian Linux on Piton 17

FPGA Targets Board Frequency (1 Core) Cores Academic Price Xilinx VC707 67MHz 4 $3495 Xilinx ML605 18MHz 2 $1995 Digilent Genesys 2 60MHz 2 $600 Digilent Nexys Video 30MHz 1 $250 Digilent Nexys 4 DDR 29MHz 1 $160 18

OpenPiton Philosophy Focus/Value is in the Uncore Not religious about ISA Provide whole working system We are practical Use Verilog Industry standard tools Use the best tool for job (including commercial CAD tools) Primarily for research, but welcome industry also Licensing All our code is BSD-like Linux, Hypervisor, T1 core (GPL or LGPL) Scalability (Million Core) 19

OpenPiton Community Building a community Welcome community contributions Over 1500 Downloads Google Group Visit http://openpiton.org openpiton@princeton.edu 20

Doing Research with OpenPiton Software Install on Debian, test scalability Operating System Recompile kernel, rebuild SW, run Hardware/Software Co-design Add new instructions, change compiler/hv/os/sw Architecture Change parameters, rebuild HW, run Apps Compiler/ Runtime HV/OS ISA HW 21

Enabled Research Coherence Domain Restriction Fu et al. MICRO 2015 Execution Drafting McKeown et al. MICRO 2014 Memory Inter-arrival Time Traffic Shaper Zhou et al. ISCA 2016 Oblivious RAM Fletcher et al. ASPLOS 2015 DVFS modelling Program A Instruction Fetch Stage Thread Select Stage Successfully Drafted Instructions Lead Instructions Time App 1 App 3 Uniform Traffic More Bursty Traffic Program B Instruction App 2 Decode Stage Execute Stage Memory Stage Writeback Stage Frequency t 2t Request Inter-arr time 3t A Distribution of Traffic 2t 22