On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc.
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 2
Sonics Leader in System IP for SoCs Sonics enables designers to integrate any IP from anywhere, anytime Easy IP re-use Connecting third party IP / subsystems Total system approach: Intelligent memory scheduling Optimal power-aware designs Data flow services: QoS, Security firewalls World-class engineering team Largest team of on-chip network engineers Strong local presence in Japan Commanding presence in digital entertainment, mobile and wireless 8 of top 10 semi SoC companies Results: 2 Billion units shipped Over 200 design completions 3
ARM and Sonics ARM and Sonics have been working together to mutually support SoC customers for more than 10 years Multiple generation of ARM s flagship CPUs for Application Processors Multiple generations of AMBA Sonics fully supports ARM SoC initiatives AMBA, TrustZone, etc. Recently announced expanded partnership focused on enhanced interoperability and power management Plus a patent licensing arrangement 4
How is Your Current SoC Project Going? Are you hitting your performance targets? Did you achieve the frequency you hoped for? Are you staying within your power budgets? Did you see your throughput decrease as frequency increased? Did timing issues at layout force you to re-work your architecture? 5
Common Architecture for Over 16 Years A common on-chip network architecture Structure: IP core sockets, isolated from network fabric by intelligent agents Sockets: AMBA ACE, 3/4, AHB, APB, OCP 1/2/3 Protocols: completely non-blocking multi-threaded fabrics Features: End-to-end QoS, security, error and power management, etc. Software: consistent register-level views Development environment: unified SonicsStudio tools enables a family of micro-architectures SonicsGN: highly scalable multi-domain router-based fabric at up to 2 GHz SonicsSX: low latency cascaded cross-bar fabric Sonics3220: efficient sharing of many peripherals spread across SoC and supporting System IP MemMax scheduler: delivering highest DRAM throughput and QoS 6
533MHz Example: Tablet Application Processor Cortex A15 x 4 CPU CPU 1333MHz 1066MHz 533MHz CPU CPU L2 Cache Cortex A7 x 4 CPU CPU CPU CPU L2 Cache Mali-T658 Quad core GPU GPU GPU GPU Power Domains CoreLink CCI-400 Coherency Fabric 133MHz ROM 267MHz Security 533MHz SRAM 267MHz LCD Controller 200MHz Cam 1 Secure ROM DMA HDMI Video Codec Cam 2 133MHz 267MHz 133MHz 267MHz 200MHz SonicsGN On-chip Network 1066MHz 533MHz Sonics MemMax Memory Scheduler 533MHz Sonics MemMax Memory Scheduler 133MHz Ethernet PCIe 400MHz Audio 133MHz SATA DRAM Cont. DRAM Cont. 267MHz Sonics3220 Peripheral Network 1066MHz 1066MHz 7 133MHz USB APB Peripherals 133MHz
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 8
Market Survey: Increasing SoC Complexity Design complexity increasing Power/Performance/Area remain key challenge Complexity driven Frequency broad range of implementation points 51% need > 1GHz Multiple power domains Better battery life Coping with Dark Silicon Domains often tied to key subsystems Source: Sonics conducted survey during October 2012, with 318 responses 9
Power Consumption is a Major Concern Battery-powered devices Battery life is a key selling feature Battery size impacts weight, pocket-ability, hand-fit, etc Line-powered devices need to be concerned with power, too Power consumption impacts cost of packaging Power supply may be limited (e.g. PoE, Energy Star, EU Energy Label) Cooling issues No new SoC development can afford to ignore power consumption 10
The Dark Silicon Challenge Moore s Law enables integration of massive functionality on SoC More than 1 billion transistors at 28nm But leakage current limits how many transistors can be powered Multiple threshold voltages, dynamic voltage control helps The result: Dark Silicon the imperative to dynamically manage which parts of the SoC are powered Many people believe that Dark Silicon is a problem Sonics believes that it is an opportunity to re-think how we partition SoCs to better exploit performance while minimizing power/energy 11
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 12
difficulty Power Management Techniques General techniques Clock gating Stop/start subsystem clocks Dynamic clock frequency On/off voltage domains Dynamic voltage/frequency domains (DVFS) IP-specific techniques ARM big.little (use optimum IP for loading) Power managers implement the techniques Software: flexible, but slow Hardware: very responsive, but less flexible 13
Reducing Clock Power Reduce the clock frequency when possible Stop the clock when nothing useful to be done To get the best result, this needs to be architected into the IP Prefer a hierarchical approach - Fine-grain clock gating At a register or state machine level, when there is nothing useful to do, stop the clock. - Toggling just 1 clock gate instead of n loads, where n = number of local flops Clock gate 14
Reducing Clock Power To get the best result, this needs to be architected into the IP Prefer a hierarchical approach - Fine-grain clock gating At a register or state machine level, when there is nothing useful to do, stop the clock. - Toggling just 1 clock gate instead of n loads, where n = number of local flops - Course grain clock gating At a component level, when all internal clock gates block the clock, then gate the clock to the component. course grain clock gate Reduce the clock frequency when possible Stop the clock when nothing useful to be done - Toggling just 1 load instead of m loads, where m = number of fine-grain clock gates. 15
Relative Power Measured Benefits of SGN Clock Gating vs. Conventional 1 0.8 Sonics-provided Fine Gating and Idle Detection Synthesis Gating + Sonics Idle Detection Synthesis Gating Only 0.6 0.4 Automatic idle detection 0.2 16 0 0% 25% 50% 75% 100% Relative Throughput
Reducing Clock Power Reduce the clock frequency when possible Stop the clock when nothing useful to be done To get the best result, this needs to be architected into the IP Prefer a hierarchical approach - Fine-grain clock gating At a register or state machine level, when there is nothing useful to do, stop the clock. - Toggling just 1 clock gate instead of n loads, where n = number of local flops - Course grain clock gating At a component level, when all internal clock gates block the clock, then gate the clock to the component. - Toggling just 1 load instead of m loads, where m = number of fine-grain clock gates. This approach allows extremely effective clock gating Typical Sonics designs achieve > 99.5% clock gating, many > 99.9% For example: 16 free running flops in a network with >40K flops (99.96%) 17
Reducing Voltage-related Power Reduce or remove the voltage when possible Partition the design into multiple power domains Reduced voltage can save significant dynamic power: P=C*V 2* f V1 OFF V5 V4 V2 V3 V2 OFF V5 V3 V1 V3 Switching off the voltage saves even more: leakage=0 18
Reducing Voltage-related Power Reduce or remove the voltage when possible Partition the design into multiple power domains Reduced voltage can save significant dynamic power: P=C*V 2* f V1 V2 V3 Switching off the voltage saves even more: leakage=0 Especially effective when large parts of the SoC can be switched off 19
Reducing Voltage-related Power Reduce or remove the voltage when possible Partition the design into multiple power domains Reduced voltage can save significant dynamic power: P=C*V 2* f A15 OFF V1 A7 OFF V2 V3 Switching off the voltage saves even more: leakage=0 Especially effective when large parts of the SoC can be switched off ARM big.little is a good example! 20
Challenge: Enabling Power Domains for the SoC With standard fabrics, the natural choice is to create boundaries at the bus interface The bus must be powered if any of the attached cores are powered - Forces bus into an always-on portion of the SoC, or - Requires partitioning fabric at power domain boundaries, complicating design Requires some kind of domain crossing at the bus interface - Which may have MANY wires I I I I I T T T T T 21
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 22
533MHz Efficient IP Integration Universal connectivity: AMBA (, 4/ACE, AHB, APB ), OCP, PIF and proprietary cores Serialized router-based network: Reduced wire count up to 1/16 HDMI 4 64-Pins 16-Pins Tablet SoC 1333MHz 1333MHz 533MHz Cortex A15 Cortex A7 CoreLink CCI-400 Mali GPU LCD HDMI Video Video Encode Cam Audio 4 OCP SonicsGN On-chip Network DRAM DRAM SRAM ROM PCle Enet SATA USB 23
533MHz High Performance Universal connectivity: AMBA (, 4/ACE, AHB, APB), OCP, PIF and proprietary cores Serialized router-based network: Reduced wire count up to 1/16 High speed: 2GHz Tablet SoC 1333MHz 1333MHz 533MHz Cortex A15 Cortex A7 Mali GPU LCD HDMI Video Video Encode Cam Audio CoreLink CCI-400 4 OCP 2GHz Fabric Speed SonicsGN On-chip Network DRAM DRAM SRAM ROM PCle Enet SATA USB 24
533MHz Highest Bandwidth Universal connectivity: AMBA (, 4/ACE, AHB, APB), OCP, PIF and proprietary cores Serialized router-based network: Reduced wire count up to 1/16 High speed: 2GHz Virtual Channels for efficient link sharing Shared Link Fewer wires Up to 16 Channels Tablet SoC 1333MHz 1333MHz 533MHz Cortex A15 Cortex A7 Mali GPU LCD HDMI Video Video Encode Cam Audio CoreLink CCI-400 4 OCP SonicsGN On-chip Network DRAM DRAM SRAM ROM PCle Enet SATA USB 25
533MHz Security Universal connectivity: AMBA (, 4/ACE, AHB, APB), OCP, PIF and proprietary cores Serialized router-based network: Reduced wire count up to 1/16 High speed: 2GHz Virtual Channels for efficient link sharing Firewalls: Flexible security domains: TrustZone capable Tablet SoC 1333MHz 1333MHz 533MHz Cortex A15 Cortex A7 CoreLink CCI-400 Mali GPU LCD HDMI 4 Video OCP Video Encode Firewall at any Target Cam Audio SonicsGN On-chip Network DRAM DRAM SRAM ROM PCle Enet SATA USB 26
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 27
Challenge: Enabling Power Domains for the SoC With standard fabrics, the natural choice is to create boundaries at the bus interface The bus/cross-bar must be powered if any of the attached cores are on - Forces fabric into an always-on portion of the SoC, or - Requires partitioning fabric at power domain boundaries, complicating design Requires some kind of domain crossing at the bus interface - Which may have MANY wires I I I I I T T T T T 28
Using the Network to Enable Power Domains Could use a bus-style approach Place power boundaries at IP sockets This approach leaves power on the table I I I I I T T T T T 29
Using the Network to Enable Power Domains No need to power the agent (network interface) when the IP core is off I I I I I Always on or off together? T T T T T 30
Using the Network to Enable Power Domains No need to power the agent (network interface) when the IP core is off Network components can be partitioned inside power domains! I I I I I T T T T T 31
Safe Operation with Powered Down Domains Initiator agent clears path to target to enable safe shutdown of power domain Initiator agent returns errors on access to powered-off domains Initiator agent knows power state of each domain along its routing paths Initiator Agent I I I I I T T T T T 32
Network can Automatically Wake-up Components Initiator agent knows which components need to wake up 1. Hold traffic 2. Send a request to the system power manager 3. Receive response 4. Release traffic I I I I I Power Manager T T T T T 33
Tablet SoC Design Example Power Aware On-Chip Network Domain partitioning Clock gating Domain on/off control Tablet SoC Domain 1 Domain 2 Domain 3 Domain 4 Subdom 1 Subdom 2 Subdom 3 Cortex A15 Cortex A7 Mali GPU LCD HDMI Video Video Encode Cam Audio SonicsGN On-chip Network CoreLink CCI-400 Domain 5 Domain 6 Domain 7 DRAM Contrl. DRAM Contrl. SRAM ROM PCle Enet SATA USB Temp. Sensor PMIC I/F 34
Network Power Management Unlimited number of domains: Power, Voltage, Frequency Domains can cross anywhere in the network Synchronous, Asynchronous, Mesochronous crossing Domain 1 Domain 2 Domain 3 Domain 4 Subdom 1 Subdom 2 Subdom 3 Cortex A15 Cortex A7 Mali GPU LCD HDMI Video Video Encode Cam Audio SonicsGN On-chip Network CoreLink CCI-400 Domain 5 Domain 6 Domain 7 DRAM Contrl. DRAM Contrl. SRAM ROM PCle Enet SATA USB 35
Domain Power Manager Network Power Management Unlimited number of domains: Power, Voltage, Frequency Domains can cross anywhere in the network Synchronous, Asynchronous, Mesochronous crossing Power bundle at all domains Fast wake and shutdown Auto wake Power Down Req Power Down Ack Auto Wake Enable Auto Wake Reg Domain 1 Domain 2 Domain 3 Domain 4 Subdom 1 Subdom 2 Subdom 3 Cortex A15 Cortex A7 Mali GPU LCD HDMI Video Video Encode Cam Audio SonicsGN On-chip Network CoreLink CCI-400 Domain 5 Domain 6 Domain 7 DRAM Contrl. DRAM Contrl. SRAM ROM PCle Enet SATA USB 36
Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques On-chip network features and benefits Optimizing dark silicon with on-chip networks Future work 37
Concept: Power Manager IP to Leverage Network Highly Integrated, Power Aware On-Chip Network + On-Chip Power Manager + Integrated Tool Chain = Automated, Fine Grained, Highly Responsive Dark Silicon Solutions Tablet SoC Domain 1 Domain 2 Domain 3 Domain 4 Subdom 1 Subdom 2 Subdom 3 CPU1 CPU2 GPU LCD HDMI Video Video Encode Cam Audio Coherency Fabric SonicsGN On-chip Network Domain 5 Domain 6 Domain 7 DRAM Contrl. DRAM Contrl. SRAM ROM PCle Enet SATA USB Temp. Sensor PMIC I/F ucontroller Future Power Manager 38 March 2013 2013, Sonics, Inc. Proprietary NDA Required
Power Power Integrated Power Management Benefits Complete power management solution: Advanced on-chip network System power manager: hardware and software Advanced tooling environment Wake up CPU to switch power state Conventional Enables much finer grained power control Fast & safe transition to lower power states Power on just in time (auto wake-up) Much less CPU overhead Keep CPU powered off more Avoid lots of context switches Power Savings Time Hardwarecontrolled switching Power Savings Future Power Manager Power Savings Earlier completion Power Savings Time 39
Future Power Management Benefits Sonics: Uniquely positioned to provide advanced SoC power management Capability On-chip network that spans arbitrary collections of power domains Power/voltage/clock domain aware onchip network with power management interface Auto-wake algorithm Integrate network capture and performance analysis tools Automated support for domain partitioning Automated correct-by-construction approach Benefit Easily implement many domains Supports late/iterative partitioning choices Safe and fast hardware-controlled shutdown Auto-wakeup signals to power manager Ensures minimum ON time Minimize leakage and idle power Reduced time and effort Reduced time and effort Supports many more domains without TTM and verification risks Can save HALF of total SoC power consumption! 40
THANK YOU 41
Managing Power with SonicsGN Flexible power domain support Asynch/mesochronous Isolation/level shifters HW-controlled safe shutdown Automatic wakeup Benefits: More domains Quicker shutdown Faster wakeup Keep more dark, more of the time DDR3 2133 DDR3 2133 133 MHz 133 MHz 533 MHz 533 MHz 533 MHz 133 MHz On-die SRAM DRAM Ch. 1 DRAM Ch. 2 On-die ROM IP Control Peripherals S S S S S S 128 128 128 64 32 32 T T T T T T 533 MHz 1333 MHz 1066 MHz 533 MHz Cortex- A15 Cluster M 128 I A 2x2 B 2x3 D 1x3 Cortex- A7 Cluster Mali- T658 Cluster CCI-400 M S 128 128 E 4x1 C 2x3 Display Ctrl. M 32 H 5x2 HDMI M 64 Video Video Engine Encode M M 32 64 I T I I I I SonicsGN Request Network 267 MHz 133 MHz I 4x1 267 MHz 267 MHz F 4x1 J 3x1 G 4x1 I I I I I I I 64 64 64 64 64 64 64 M M M M M M M Cam 1 Cam 2 Audio USB 1 USB 2 USB 3 USB OTG 200 MHz 200 MHz 400 MHz 133 MHz 133 MHz 133 MHz 133 MHz T I I I I I I I I 32 32 32 64 S M M PCIe E-net 32 M Security Engine 267 MHz 133 MHz 267 MHz 64 64 64 64 M M M M SD/ M DMA SATA UFS CF/ HSI MMC 267 MHz 133 MHz 133 MHz 133 MHz 133 MHz Power Domain Boundary 42 50% SoC Power Reduction!
Reducing power consumption Engineers have developed many power saving techniques Reduce the clock frequency when possible Stop the clock if nothing useful to be done Reduce the voltage when possible (P=CV 2 F) Remove (switch) the voltage in many cases Develop islands of (frequency, voltage, switched power) Part of the SoC may need to be running full-speed While other portions can be slowed, stopped, or switched off How do these techniques affect the creation and use of IP cores? How do these techniques affect the SoC infrastructure? 43