Transcede Solving 4G Challenges for Pico, Micro and Macrocell Platforms Jim Johnston CTO Communications Convergence Processing Mindspeed Technologies, Inc. August 24, 2010
The Next Internet Wave Mobility and Content Innovative New Platforms New Networked Applications + = Explosive demand for Mobile Broadband 2
Impact on Service Providers Subscribers (M) 1,000 900 800 700 600 500 400 300 200 100 0 CY04 CY05 CY06 CY07 CY08 CY09 CY10 CY11 CY12 CY13 TB Per Month 3,6000,000 1,800,000 0.09 EB per mo 0.2 EB per mo 109% CAGR 2009-2014 0.6 EB per mo 1.2 EB per mo 2.2 EB per mo 3.6 EB per mo 2009 2010 2011 2012 2013 2014 Wireline broadband Cellular mobile broadband Source: Infonetics Q4 2009 Source: 2010 Cisco VNI Mobile Overloaded Networks 3
2G to 4G - System Architecture Evolution - to the Evolved Packet Core RF PA D/A A/D Equ. TDMA MODEM BTS A-bis: TDM Speech codec FEC TRAU BSC Time Slots-DSx A: TDM MSC 2G GSM 1990s PSTN RF PA Equ. D/A A/D CDMA MODEM Node B FEC Security Mobility Mgt RNC MAC SGSN Serving GPRS Support Node 3G W-CDMA 2000s IP Packets Over ATM Cells/TimeSlots QoS GGSN Gateway GPRS Support Node Video processing Speech Codec Voice band data Media GW IP Packet RF PA Equ. D/A A/D MIMO OFDMA MODEM enode B FEC Security QoS MAC Native IP Packets IP Packet Mobility Management agw 4G LTE -2010 IP Packet Mobile Operator Core Network Packet Switching 4
Mobile Broadband Architecture Going Distributed Macro Micro Pico Business Enterprise Femto Femto Metro Urban High-Density Subs Business Metro Residential 1200 400 200 50 10 Supported Subscribers Serving a broad range of basestations - from Macro to Femto 5
Mindspeed SoC Competencies Multi-Processor Communications Technologies Multi-Core Software Convergence Applications Power density (db) Frequency Algorithms Standards Certification Software Verification Software Architecture Packet Processing Voice, Video, and Modems 6
Silicon Architecture Evolution 7
Silicon & System Architecture Innovation Old BTS Solutions enodeb Protocol Stack Transcede enodebs SoCs T1/E1-ATM Abis Network Processors Ethernet IP LTE RRC LTE PDCP LTE RLC Application Specific Architecture Solution Ethernet IP Cipher RoHC Cortex A9s DSPs FPGAs DSPs FPGAs IQ - Proprietary Application Agnostic Component Architecture Solution LTE MAC LTE PHY IQ - CPRI System Architecture Evolution Cortex A9s FEC CevaX FFT DSP RAKE IQ - CPRI Silicon Architecture Evolution 8
Transcede Dramatically Reduces System BOM NPUs DSP Farm App Hardware Accelerator FPGAs App I/O > $3000 < $300 while significantly accelerating Time-to-Market 9
enb Transcede - Multi-Core SoC High Performance DSP Farm High Performance SERDES IO T4000: 600MHz/300MHz T4020: 750MHz/366MHz High Performance SMP RISC NoC: AXI Packet Network on Chip General Purpose Peripherals 10
Processors: Instruction and Data Level Parallelism 600MHz to 750MHz Operation 4 A9 Multi-Processing Cores SMP & AMP support Parallel Cores with HW Snoop Logic 32bit Instructions Load/Store Register File 13 stage Pipeline Small L1 Memories 32kB Program Cache 32kB Data Cache Large L2 Memories High Efficiency Instruction Level Parallelism 512kB Unified Cache Embedded Trace Module Complex Instruction Diagnostics Dual Data/Program DMA/Cache Extensive Tool Chain C-Compliers/Debugger/etc. High Efficiency Instruction & Data Level Parallelism 600MHz to 750MHz Operation 256bit-8 Way VLIW DSP 8-32bit Parallel Instructions Most common size 16-16bit Parallel Instruction 9 stage Pipeline 128bit 4/8 MAC SIMD 4-16bit x 16bit MACs 8-8bit x 8bit MACs Large L1 Memories 96kB Program RAM Direct Mapped Cache Option 128kB Data RAM Banked Simultaneous Access On-chip Emulation Module Complex Breakpoints/Trace/etc. Dual Data/Program DMA/Cache Extensive Tool Chain C-Compliers/Debugger/etc. 600MHz to 750MHz Operation 160bit-VLIW DSP Fine Grain VLIW 160bit SIMD 24bit x 16bit MACs Built in FFT Radix Support Large L1 Memories 10kB Program RAM 80kB Data RAM Banked Simultaneous Access External Sequencer Control High Performance DMA High Efficiency Data Level Parallelism Side-band Multi-Core Sequencing Control Channel Processor to Processor Pipelined Data Flow 11
Processors: Instruction and Data Level Parallelism Symmetric Multi-Core Instruction Parallelism High Core Level Instruction Parallelism Control/Branch Code Focus Cortex-A9 CPU 12
600MHz to 750MHz CevaX1641 8-way VLIW DSP 600MHz to 750MHz Operation 256bit-8 Way VLIW DSP 8-32bit Parallel Instructions Most common size 16-16bit Parallel Instructions 128bit 4/8 MAC SIMD 4-16bit x 16bit MACs 8-8bit x 8bit MACs Large L1 Memories 96kB Program RAM Direct Mapped Cache Option 128kB Data RAM Banked Simultaneous Access On-chip Emulation Module Complex Breakpoints/Trace/etc. Dual Data/Program DMA/Cache Extensive Tool Chain C-Compliers/Debugger/etc. High Core Level Data Parallelism Parallel Math Focus High Core Level Instruction Parallelism Launch Rate Focus 13
Mindspeed Application Processor Programmable Application Specific Signal Processing General Purpose Signal Processor 62.5% RAM, 37.5% Core Application Specific Signal Processor 45% RAM, 55% Core L1 RAM CevaX Core L1RAM L1 RAM Simplified Instruction Pipeline Limited Control Code Focus Small PRAM, L1 sized to limited function focus L1 RAM Core L1 RAM L1 RAM Fixed Function Signal Processing 35-40% RAM, 60-65% Core L1 RAM FFT /DFT Core L1 RAM No Instructions State Machine Control RAM sized to one function Limited Savings For Loss In Application Flexibility 14
Mindspeed Application Processor () 600MHz to 750MHz Operation 160bit-1 Way VLIW DSP 160bit 4 MAC SIMD - 4-24bit x 16bit MACs Built in FFT Radix Support - Bit Reverse Addressing Circular Buffering Built in Byte Data RAM read Very Wide/Large L1 Memories - 10kB Program RAM, - 80kB Data RAM - Banked Simultaneous Access External Sequencer Control High Performance DMA for Data 4G/3G Application Library 15
CoreSight SoC HW SW Debug Support Tightly coupled MAC/PHY Streaming Real-Time Debug Capability! 16
Transcede 4000 SoC DDR3 DDR3 TSMC 40G Process 0.9V 12W typical 31mm x 31mm 26 Processors 9.1MBytes RAM >300M transistors ARM A9 L2 Cache MSPD MSPD MSPD ARM A9 Ceva X1641 Ceva X1641 MSPD MSPD MSPD L2 DSP RAM ARM A9 MP Core +L1 ARM A9 L2 Cache Ceva X1641 Ceva X1641 DMAs MSPD ARM A9 Ceva X1641 L2 DSP RAM ARM A9 L2 MP Core Cache +L1 MSPD MSPD MSPD Ceva X1641 Multi-Layer Bus Structure Ceva X1641 Ceva X1641 Ceva X1641 Ceva X1641 L2 DSP RAM Multi-Layer Bus Structure Multi-Layer Bus Structure FEC L3 SRAM L3 SRAM srio PCIe CPRI 10-High Speed Serdes GigE GigE MAC Nasdaq: MAC MSPD 17
Multi-SoC Chaining in enb Cells Each T4k is Chainable to other T4ks Shared Inter-T4k Memory Maps srio based HW Bridges T4k(s) Share DDR,SRAM,IO,etc. srio Mailbox System For Control srio AXI DMA for Data Transfers T4k(s) can chain between enodebs Create a enb optical X2 interface srio or VPN GigE T4k Distributed Antenna Systems CPRI5.0 up to 20km Between BBU and RRH @10GT/s RRH RRH Chainable Processing Tiles (Optical-20km or Electrical-20cm) Seamless Scalability (Switchless) Resulting In Lowest: Power, Cost, Area RRH= Remote Radio Head Radio Radio Radio Optical PMA Optical PMA Optical PMA Optical PMA Optical PMA CPRI CPRI CPRI CPRI srio T4K DSP+NP srio srio T4K DSP+NP srio srio T4K DSP+NP srio Optical PMA X 2 Optical PMA srio T4K DSP+NP srio GigE GigE GigE X2 Interface GigE Multi-Port GigE Copper/Optical PHY Large enb S1 Interface Small enb S1 Interface 18
Software Architecture Evolution 19
Mobile Data Link Standards Evolution 2008 2009 2010 2011 2012 2013 2G 3G 4G Digital Voice Optimized Circuit Switched Data on Voice Structure Digital Voice Optimized Packet on Circuit Switched Enhanced Data on Voice Digital Data Optimized Packet Switched Voice on Data Structure 3G Multi-Standard Drives enb to Software Defined Radio(SDR) Architecture For HW Reuse 4G Notes: Throughput rates are peak theoretical network rates. Radio channel bandwidths indicated. Dates refer to expected initial commercial network deployment except 2008, which shows available technologies that year. 20
Innovation in Multi-Core Programming Task Dependencies Sys HW params Deadline (latency) Rqrd. DSP Resources List of Tasks Task MIPS Task Composer Tool Task List Pass/Fail Indication However, C is a sequential language, so how do application developers map their C code into multi-core SoCs? New modeling approach allows application partitioning and profiling early on in the design phase 21
Software Multi-Core HW Mapping: LTE PHY & L2 NAS RRC Radio to Router Relay RRC S1-AP NAS S1-AP PDCP PDCP SCTP SCTP RLC RLC IP IP MAC MAC L2 L2 L1 L1 L1 L1 LTE-Uu S1-MME UE enodeb Nasdaq: MME MSPD 22
Software Multi-Core HW Mapping: LTE PHY PHY TASKS FFT0 FFT1 RACH0 RACH1 UL SIC idft0 idft1 DL FEC 0 UL FEC 0 IO Move DL FEC 0 UL SIC IO Move idft1 RACH0 idft0 FFT0 UL FEC 0 RACH1 Optimized Match of Instruction and Data Parallelism 23
Key Differentiators of the Transcede Family Software configurable for all flavors of LTE, W-CDMA and WiMAX (SDR) --- Supports China standards, including TD-SCDMA and TD-LTE Significantly reduces system bill of materials Integrated L2 and L1 on a single SoC provides lowest possible latency Simplified programming model allows easy adaptation Roadmap of scalable SoCs delivers a range of performance/cost points across the range of base stations THANK YOU! 24