ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015

Size: px

Start display at page:

Download "ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015"

Agnes Golden
5 years ago
Views:

1 ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil - Institute of Computing

2 Motivation ISA Aging x86 code is bigger than RISC (ARM) 2

3 What about other architectures? V4$ V4T$ V5TE$ V6$ VFP2$ DB$ V6T2$ V7$ HWDIV$ FP16$ MP$ NEON/VFP3$ VFP4$ 299$ 302$ 320$ 366$ 466$ 469$ 477$ 481$ 483$ 485$ 487$ ARM 1990 RIOS POWER POWER POWER POWER POWER3 II 2001 POWER POWER POWER POWER POWER POWER POWER POWER POWER PowerPC $ 250$ 300$ 350$ 400$ 450$ 500$ 550$ 600$ Total$number$of$InstrucGons$ Total number of instructions 3

4 The x86 instruction set Intel 8086 family, variable-length format Operation code: opcode + other bits to uniquely identify an instruction 4

Average instruction opcode size by x86 features 4.1 3.8 2.

5 Average instruction opcode size by x86 features Variable-length format no longer benefits most used instruction 5

6 AVX & SSE (vs x87) SPEC2006FP Modern compilers use AVX or SSE as default ISA for floating point calculations 6

7 Solutions? 7

8 Radical Approaches Breaking Backward Compatibility 1 Reduce all opcodes to 2 bytes 2 Reduce all opcodes to 1 or 2 bytes 3 Convert to a RISC-like ISA encoding 8

9 Evaluation Code size (%) bwaves cactusadm Approach 1 Approach 2 Approach % 35.5% calculix dealii gamess GemsFDTD gromacs lbm leslie3d milc namd povray soplex x86 code is bigger than RISC (ARM) for most programs Solution (2) encoding shows that variable-length is better than RISC and x86. 9 sphinx3 tonto wrf zeusmp GeoMean

10 However... Breaking x86 backward compatibility is not an option. Software base Market What now? 10

11 Recycling Mechanism 11

12 Recycling Mechanism Remove outdated and unused instructions Re-use opcode space to encode new instructions while maintaining backward compatibility Benefits Open room for encoding new instructions with less bits - improving program size and cache. x86 complexity can be reduced, opening market for specific domains; e.g. low-end embedded devices. 12

13 Two examples 13

14 Outdating Recycling 2010: ISA : ISA :ISA 2 CPU Revision A Industry warns software vendors CPU Revision B ISA Evolution /ISA Releases SW Revision A SW Revision B ISA release vs revisions Opcode 1h 4h AAA Opcode 1h 4h VADD 14

15 Outdating Recycling 2010: ISA : ISA :ISA 2 CPU Revision A Industry warns software vendors CPU Revision B ISA Evolution /ISA Releases SW Revision A??? SW Revision B ISA release vs revisions Opcode 1h 4h AAA Opcode 1h 4h VADD 15

16 CPU Revision B Revision Mismatch Trap Mask Selector SW Revision A Execution hits opcode 1h 4h Revision A Opcode Trap? 0h N h 4h Y Trap Mask Vectors for revisions A against Z 16

17 Emulation Old software revision executing on new processor revision leads to backward compatibility issues Solution: software emulation mechanism via CPU generated traps. Allows non-sequential ISA evolution disputes over new extensions (XOP, FMA4,...): vendors could emulate each other instructions using the trap mechanism. 17

18 Emulation Emulation must avoid using outdated instructions Emulation Routines: Operating System Firmware Linker Operating System Loader Executable header annotated with software 18

19 Evaluation Static and Dynamic instruction analysis of Linux and Windows from

20 Static Analysis Used Instructions Linux 100 Windows Year 20

21 Dynamic Analysis Fraction of the dynamic trace 100 % 99 % 98 % 97 % 96 % 95 % MMX P6 SSE SSE2 X87 16-bit Windows95 Windows98 WindowsXP WindowsVista Windows7 Slackware3 Ubuntu4 Ubuntu8 Ubuntu12 21

22 Emulation Overhead Experiment - Linux kernel trap implementation Tolerating a 5% overhead: we can re-encode 40% of the x86 ISA 22

23 How Many Instructions to Emulate? % instructions emulated at runtime =4 =5 =6 =8 =10 =12 = win win95+slack win95+slack win98+slack3.0 SSE SSE win98+slack win98+slack winxp+slack winxp+slack winxp+slack winxp+ubu winxp+ubu SSE3 SSSE3 SSE4.1 SSE4.2+AES+CLMUL 2006-winxp+ubu vista+ubu vista+ubu8.10 AVX 2009-win7+ubu win7+ubu win7+ubu win7+ubu12.10

24 Runtime Overhead Runtime overhead (%) =4 =5 =6 =8 =10 =12 = win win95+slack win95+slack win98+slack3.0 SSE SSE win98+slack win98+slack winxp+slack winxp+slack winxp+slack winxp+ubu winxp+ubu winxp+ubu vista+ubu vista+ubu SSE3 SSSE3 SSE4.1 SSE4.2+AES+CLMUL AVX 2009-win7+ubu win7+ubu win7+ubu win7+ubu12.10

25 Instruction Decoder Decoder + ucode ROM: from 2% to 17% of processor area Removed instructions still needed to be decoded (to generate traps) Reuse instruction encodings More than one decoder in recent x86 implementations (up to 3 fast and 1 slow decoder) 25

26 Decoder Critical Path Improvements C. Path Improv. (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX

27 Decoder Area Gains Area Reduction (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX

28 Decoder Power Gains Power Reduction (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX

29 Conclusion Static and Dynamic analysis shows that a great number of x86 instructions are obsolete. Recycling mechanism: re-encoding instructions without breaking backward compatibility We could emulate 40% of x86 instructions with less than 5% overhead Decoder critical path improvements up to 50% Decoder area reduced up to 73% ucode ROM reduced up to 43% Power consumption reduced up to 70% 29

30 Questions? ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil - Institute of Computing

ISA-Aging Envelhecimento de Conjuntos de Instruções

ISA-Aging Envelhecimento de Conjuntos de Instruções Rodolfo Azevedo rodolfo@ic.unicamp.br Slides baseados na apresentação do artigo SHRINK: Reducing the ISA Complexity Via Instruction Recycling ISCA 2015