SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

Size: px

Start display at page:

Download "SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD"

Gervais Small
5 years ago
Views:

1 OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that a processor includes a SIMD (Single Instruction Multiple Data) unit in order to accelerate application processing. While a SIMD unit is a part of a processor, it evolves more rapidly than the integer unit of the processor. Since the use of an FPU (Floating Point Unit) and a SIMD unit is basically abandoned from the kernel, there can be places inside the kernel where a SMID unit works effectively to deal with a large amount of data processing. This paper describes our preliminary work to explore the possibility to utilize a SIMD unit in the kernel. We performed preliminary experiments by using UML (User Mode Linux) and show that data copying can be improved. 1 University of Tsukuba 2 University of Tsukuba 1. SIMD (Single Instruction Multiple Data) SIMD SIMD SIMD SIMD OS OS OS FPU (Floating Point Unit) SIMD OS SIMD OS OS SIMD SIMD SIMD OS OS SIMD UML (User Mode Linux) OS SIMD 2. SIMD SIMD SIMD Intel x86 SSE, SSE2, SSE3, AVX ARM NEON SIMD SIMD SIMD 1 c 2012 Information Processing Society of Japan

1 2 SIMD 4 1 2 SIMD 4 SIMD SIMD SIMD SIMD SIMD SIMD OS SIMD SIMD OS OS

UML (User Mode Linux) SIMD OS OS SIMD OS SIMD UML (User Mode Linux) OS

2 1 2 SIMD SIMD 4 SIMD SIMD SIMD SIMD SIMD SIMD OS SIMD SIMD OS OS SIMD SIMD OS OS SIMD SIMD SIMD AES SIMD SIMD 3. UML (User Mode Linux) SIMD OS OS SIMD OS SIMD UML (User Mode Linux) OS SIMD UML Linux Linux UML SIMD UML SIMD 3 UML SIMD Linux 2 c 2012 Information Processing Society of Japan

3 int a[256],b[256],c[256] foo () { int i; for( i = 0 ; i < 256 ; i++) { a[i] = b[i] + c[i]; } } 3 4 SIMD UML UML Linux SIMD OS SIMD 4. GNU Compiler Collection (GCC) GCC C SIMD SIMD SIMD C SIMD SIMD GCC ftree-vectorize GCC SIMD GCC SIMD 4 int a[256], b[256], c[256] a[5] = b[5] + c[5] a[6] = b[6] + c[6] SIMD SIMD 5. SIMD OS Linux OS 3 c 2012 Information Processing Society of Japan

4 gprof 5.1 SIMD UML UML Linux GCC IBM ThinkPad X1 CPU Intel Core-i5 2520M 2.50GHz RAM 4GB Intel Core-i5 2520M Intel SIMD SSE3 AVX 5.2 UML UML GCC ftree-vectorize ftree-vectorize-verbose ftree-vectorize GCC ftree-vectorize-verbose 23 GCC OS OS OS 1 Linux source file vectorized loop num /arch/um/drivers/drivers/slip user.c 1 /mm/vmstat.c 1 /fs/ext2/inode.c 2 /fs/ext3/inode.c 2 /fs/ext3/hash.c 2 /fs/isofs/util.c 1 /fs/reiserfs/fix node.c 1 /drivers/base/map.c 1 /net/ipv4/inet hashtables.c 1 /net/ipv4/tcp input.c 1 /net/ipv4/devinet.c 1 /lib/sort.c 1 /lib/bitmap.c 5 /lib/cmdline.c 1 /net/core/dev.c 1 /crypto/algapi.c UML SIMD OS UML GNU Profiler (gprof) Unix Bench 3) SIMD UNIX Bench UML gprof UML UML 2 UNIX Bench memcpy memcpy 4 c 2012 Information Processing Society of Japan

5 2 Linux function consumption time (sec) share (%) memcpy os arch prctl hard handler userspace strncpy SIMD memcpy memcpy SIMD memcpy SIMD SIMD memcpy memcpy SIMD memcpy intel SSE SSE intel x86 SIMD SIMD memcpy movdqu SIMD / movdqu 128 / movdqu 128 / SIMD memcpy memcpy x86 SIMD memcpy memcpy 5,000,000 5 SIMD memcpy memcpy SIMD memcpy memcpy 512 SIMD memcpy SIMD memcpy SIMD x86 5 c 2012 Information Processing Society of Japan

6 4 SIMD memcpy total time memcpy time memcpy rate(%) normal UML s 12.96s 8.72% SIMD memcpy UML s 8.75s 5.68% 33% 8.72% 5.68% UNIXBench OS sec sec 5% 4 UML SIMD UML 8. 6 SIMD memcpy memcpy 3 SIMD memcpy UML function consumption time (sec) rate (%) os arch prctl memcpy hard handler userspace strncpy SIMD memcpy SIMD memcpy UML memcpy SIMD memcpy UNIX BENCH 3 SIMD memcpy UML UML UNIXBench gprof SIMD memcpy 12.96sec 8.75sec 8.1 OS SIMD SIMD memcpy 128 memcpy OS memcpy memcpy memcpy 512 memcpy SIMD OS 6 c 2012 Information Processing Society of Japan

7 8.2 SIMD memcpy memcpy SIMD SIMD SIMD memcpy memcpy SIMD SIMD SIMD memcpy memcpy memcpy memcpy memcpy 128 SIMD SIMD memcpy memcpy SIMD 8.3 SIMD UML OS SIMD SIMD SIMD OS SIMD SIMD OS SIMD SIMD ARM SIMD ARM SIMD NEON NEON SIMD OS SIMD OS SIMD OS SIMD 9. SIMD OS SIMD OS OS OS SIMD UML(User Mode Linux) GCC OS OS OS SIMD OS SIMD intel SIMD AVX 256 SIMD OS 7 c 2012 Information Processing Society of Japan

8 1) Takashi Nakamura, Satoshi Miki, Shuichi Oikawa, Automatic Vectorization by Runtime Binary Translation, In Proceedings of 2011 Second International Conference on Networking and Computing,pp.87-94, ) The User-mode Linux Kernel Home Page 3) byte-unixbench Unix benchmark Suite 4) Intel 64 and IA-32 Architectures Software Developer s Manuals 5) Intel Applications Tuning for Streaming SIMD Extensions PDF/apps simd.pdf 8 c 2012 Information Processing Society of Japan

High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization

High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization 1 Introduction The purpose of this lab assignment is to give some experience in using SIMD instructions on x86 and getting compiler