COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE

Size: px

Start display at page:

Download "COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE"

Ophelia Rogers
5 years ago
Views:

1 COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA July 2017 Santa Barbara, CA

2 EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS ARE CRUCIAL FUNCTIONAL TESTING IS CRITICAL

3 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING

4 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Test case 2 Test case 3 Expected result 1 Expected result 2 Expected result 3 Application Test case n Expected result n

5 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Test case 2 Test case 3 Expected result 1 Expected result 2 Expected result 3 Application Test case n Expected result n TESTING IS AN IDEAL CANDIDATE FOR PARALLELISATION

6 CPU SERVERS Expensive Do not scale easily as test suites grow Can be extremely underutilised

and widely available Large-scale parallelism,

7 CPU SERVERS Expensive Do not scale easily as test suites grow Can be extremely underutilised GPUS Cheap and widely available Large-scale parallelism, thousands of threads SIMD architecture suited to functional testing

8 EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Test case 1 Test case 2 Test case 3 Read test cases: INPUT[] = {test case 1 test case n} Transfer INPUT[] to GPU memory Build and launch tested program on the GPU threads Expected result 1 Expected result 2 Expected result 3 th_id n-1 OUTPUT[th_id] = program( INPUT[th_id] ) Test case n Expected result n Transfer OUTPUT[] to CPU memory A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages , Sweden, Nov 2014.

9 EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Read test cases: INPUT[] = {test case 1 test case n} Test case 1 Test case 2 Test case 3 Transfer INPUT[] to GPU memory Build and launch tested program on the GPU threads th_id n-1 Expected result 1 Expected result 2 Expected result 3 CHALLENGES Usability Test case n OUTPUT[th_id] = program( INPUT[th_id] ) Transfer OUTPUT[] to CPU memory Expected result n Scope Performance? A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages , Sweden, Nov 2014.

10 INTRODUCING PARTECL Test cases (CSV format) Unmodified source files ParTeCL CodeGen OpenCL ParTeCL Runtime Execution on the GPU Config file

11 INPUTS Example: Configuration: #include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } input: int a 1 input: int b 2 result: int sum variable: sum Test cases: int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); }

12 PARTECL CODEGEN Example: OpenCL: #include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); } #include "structs.h" //#include <stdio.h> //#include <stdlib.h> /*int c;*/ int addc(int a, int b, int *c){ return a + b + (*c); } kernel void main_kernel( global struct test_input* inputs, global struct test_result* results){ int idx = get_global_id(0); struct test_input input_gen = inputs[idx]; global struct test_result *result_gen = &results[idx]; int argc = input_gen.argc; result_gen->test_case_num = input_gen.test_case_num; int c; int a = input_gen.a; int b = input_gen.b; c = 3; int sum = addc(a, b, &c); /*printf("%d + %d + %c = %d\n", a, b, c, sum);*/ result_gen->sum = sum; }

13 CODE TRANSFORMATIONS global scope variables command line arguments standard in/out standard library (partial support): clclibc

14 PARTECL RUNTIME Read test cases: INPUT[] = {test case 1 test case n} Transfer INPUT[] to GPU memory Automatically generated OpenCL Build and launch tested program on the GPU threads th_id n-1 OUTPUT[th_id] = program( INPUT[th_id] ) Transfer OUTPUT[] to CPU memory

15 CHALLENGES Usability Scope Performance? Test cases (CSV format) Unmodified source files ParTeCL CodeGen OpenCL ParTeCL Runtime Execution on the GPU Config file

16 EVALUATION 1. Speedup against CPU 2. Data transfer overhead 3. Comparison to a multi-core CPU 4. Correctness

17 EXPERIMENT Subjects: EEMBC - Industry-standard benchmark suite for embedded software Hardware: GPU - NVidia Tesla K40m; CPU - Intel Xeon, 8 cores Test suite size: 130K

18 SPEEDUP AGAINST CPU

19 DATA TRANSFER OVERHEAD viterb00 Input transfer Output transfer Kernelexecution 80 fbital00 Input transfer Output transfer Kernelexecution a2time01 Input transfer Output transfer Kernelexecution 40 autcor00 Input transfer Output transfer Kernelexecution Execution time [ms] Execution time [ms] Execution time [ms] Execution time [ms] Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale Number of tests (log base 2 scale) Execution time [ms] tblook01 Input transfer Output transfer Kernelexecution Execution time [ms] conven00 Input transfer Output transfer Kernelexecution Execution time [ms] fft00 Input transfer Output transfer Kernelexecution Execution time [ms] puwmod01 Input transfer Output transfer Kernelexecution Execution time [ms] rspeed01 Input transfer Output transfer Kernelexecution Number of tests (log base 2 scale Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale Number of tests (log base 2 scale

20 DATA TRANSFER OVERHEAD

21 COMPARISON TO A MULTI-CORE CPU

22 CHALLENGES Usability Scope Performance

23 CORRECTNESS For all 9 benchmarks, testing results from the GPU are an exact match to the testing results from the CPU.

24 SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results

25 SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results FUTURE WORK Extend evaluation & scope Analyse & improve performance

26 THANKS ParTeCL CodeGen ParTeCL Runtime clclibc github.com/wyaneva/partecl-codegen github.com/wyaneva/partecl-runtime github.com/wyaneva/clclibc

28 C FEATURES Out of the box: pure functions, function calls, double precision (for OpenCL 1.2) With transformations: standard in/out global scope variables standard library calls (partial support) Unsupported (yet): dynamic memory allocation file I/O recursion

Ajitha Rajan, Christophe Dubach. in preparation for: ISSTA July 2017 Santa Barbara, CA

COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach in preparation for: ISSTA 2017 10 July 2017 Santa Barbara, CA EMBEDDED SOFTWARE IS EVERYWHERE