Just-In-Time Software Pipelining

Size: px
Start display at page:

Download "Just-In-Time Software Pipelining"

Transcription

1 Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara

2 What is software pipelining? A loop optimization exposing instruction-level parallelism (ILP) for (i = 0; i<n; i++){ } a: x = y + 1 b: y = A[i] + x c: B[i+2] = B[i]*x : A[i+2] = B[i+2] 2

3 What is software pipelining? A loop optimization exposing instruction-level parallelism (ILP) for (i = 0; i<n; i++){ a <1> a: x = y + 1 b: y = A[i] + x <1> c b c: B[i+2] = B[i]*x : A[i+2] = B[i+2] <2> } Local epenence Loop-carrie epenence 2

4 <1> c a <1> b <2> 3

5 Iteration <1> c a <1> b <2> 3

6 a b c Iteration a <1> <1> c b <2> 3

7 a Iteration b c a b c a <1> <1> c b <2> 3

8 a b c a b c Iteration Initiation interval(ii)=2 a <1> <1> c b <2> 3

9 a b c a b c Iteration Initiation interval(ii)=2 a <1> a b c <1> c b <2> 3

10 a b c a b c Iteration Initiation interval(ii)=2 a <1> a b c a <1> c b <2> b c 3

11 a b c a b c Iteration Initiation interval(ii)=2 a <1> a b c a <1> c b <2> b c 3

12 a b c a b c a b c a b c Iteration Initiation interval(ii)=2 Kernel 2 stages Different iterations work on ifferent stages in parallel 3

13 a b c a b c a b c a b c Iteration Initiation interval(ii)=2 Kernel 2 stages Different iterations work on ifferent stages in parallel 3

14 a b c a b c a b c a b c Iteration Initiation interval(ii)=2 Kernel 2 stages Different iterations work on ifferent stages in parallel 3

15 a b c a b c a b c a b c Iteration Initiation interval(ii)=2 Kernel 2 stages Different iterations work on ifferent stages in parallel II is the performance inicator 3

16 Software pipelining has been static Extensively stuie in 3 ecaes, an efficient for wie-issue architectures VLIW [Lam 1988] superscalar [Ruttenberg et al. 1996] 4

17 Software pipelining has been static Extensively stuie in 3 ecaes, an efficient for wie-issue architectures VLIW [Lam 1988] superscalar [Ruttenberg et al. 1996] It is seen only in static compilers 4

18 Software pipelining has been static Extensively stuie in 3 ecaes, an efficient for wie-issue architectures VLIW [Lam 1988] superscalar [Ruttenberg et al. 1996] It is seen only in static compilers Most works aim to minimize II but not compile overhea 4

19 It is time now to exten software pipelining to ynamic compilers! 5

20 It is time now to exten software pipelining to ynamic compilers! Dynamic languages are increasingly popular JavaScript an PhP 88.9% an 81.5% in client an server websites (W3Techs) 5

21 It is time now to exten software pipelining to ynamic compilers! Dynamic languages are increasingly popular JavaScript an PhP 88.9% an 81.5% in client an server websites (W3Techs) Huge amount of legacy coe Small optimization scope: a loop iteration Software pipelining enlarges the scope to many iterations 5

22 It is time now to exten software pipelining to ynamic compilers! Dynamic languages are increasingly popular JavaScript an PhP 88.9% an 81.5% in client an server websites (W3Techs) Huge amount of legacy coe Small optimization scope: a loop iteration Software pipelining enlarges the scope to many iterations Minimizing compile overhea must be the 1 st objective Only simple/fast algorithms can be use 5

23 It is time now to exten software pipelining to ynamic compilers! Dynamic languages are increasingly popular JavaScript an PhP 88.9% an 81.5% in client an server websites (W3Techs) Huge amount of legacy coe Small optimization scope: a loop iteration Software pipelining enlarges the scope to many iterations Minimizing compile overhea must be the 1 st objective Only simple/fast algorithms can be use linear-time algorithms are preferre 5

24 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] 6

25 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] for (i = 0; i<n; i++){ a b c } 6

26 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] for (i = 0; i<n; i++){ Original a optimizat b ion scope c } 6

27 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] for (i = 0; i<n; i++){ Original a optimizat b ion scope c } 6 for (j = 0; j<n; j+=m){ for (i = j; i<j+m; i++){ a b c } }

28 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] for (i = 0; i<n; i++){ Original a optimizat b ion scope c } Atomic region 6 for (j = 0; j<n; j+=m){ for (i = j; i<j+m; i++){ a } } b c

29 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] Costly rollback Software: Light-weight checkpointing for (i = 0; i<n; i++){ Original a optimizat b ion scope c } Atomic region 6 for (j = 0; j<n; j+=m){ for (i = j; i<j+m; i++){ a } } b c

30 Challenges Memory aliases kill parallelism Harware: Atomic region + rotating alias registers [MICRO-46] Costly rollback Software: Light-weight checkpointing Scheuling is expensive for (i = 0; i<n; i++){ Original a optimizat b ion scope c } Atomic region 6 for (j = 0; j<n; j+=m){ for (i = j; i<j+m; i++){ a } } b c

31 7 Framework (on Transmeta CMS)

32 x86 binary Framework (on Transmeta CMS) 7

33 x86 binary Framework (on Transmeta CMS) Interpreter & profiler 7

34 x86 binary Interpreter & profiler Framework (on Transmeta CMS) Hot region optimizations 7

35 x86 binary Interpreter & profiler Framework (on Transmeta CMS) Hot region optimizations Acyclic scheuler 7

36 x86 binary Interpreter & profiler Framework (on Transmeta CMS) Hot region optimizations Acyclic scheuler Assembler 7

37 x86 binary Interpreter & profiler Framework (on Transmeta CMS) Hot region optimizations Acyclic scheuler Coe cache Assembler 7

38 x86 binary Interpreter & profiler Framework (on Transmeta CMS) Hot region optimizations Acyclic scheuler Coe cache Assembler 8

39 Framework (on Transmeta CMS) Hot region optimizations Acyclic scheuler Assembler 8

40 Framework (on Transmeta CMS) Hot region optimizations Loop selection Acyclic scheuler Acyclic scheuler Assembler 8

41 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Acyclic scheuler Acyclic scheuler Assembler 8

42 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Scheuling Acyclic scheuler Acyclic scheuler Assembler 8

43 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Acyclic scheuler Acyclic scheuler Scheuling Rotating alias reg alloc Assembler 8

44 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Acyclic scheuler Acyclic scheuler Scheuling Rotating alias reg alloc Assembler Coe generation 8

45 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Acyclic scheuler Acyclic scheuler Scheuling Rotating alias reg alloc Assembler Coe generation 8

46 Framework (on Transmeta CMS) Hot region optimizations Initialization Loop selection Acyclic scheuler Acyclic scheuler Scheuling Rotating alias reg alloc Assembler Atomic region Coe generation Rotating alias register file 8

47 Scheuling is expensive NP-complete problem to fin an optimal scheule [Collan et al. 1996] 9

48 Scheuling is expensive NP-complete problem to fin an optimal scheule [Collan et al. 1996] O(V 3 ) at least, exponential at worst [Rau et al. 1992] V: number of operations 9

49 Scheuling is expensive NP-complete problem to fin an optimal scheule [Collan et al. 1996] O(V 3 ) at least, exponential at worst [Rau et al. 1992] V: number of operations Can we linearize software pipelining? 9

50 Keep scheuling time uner control 10

51 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity 10

52 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity For iterative sub-algorithms, have a threshol: the maximum #iterations allowe 10

53 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity For iterative sub-algorithms, have a threshol: the maximum #iterations allowe The smaller the threshol, the less the compile overhea Once exceee, abort software pipelining 10

54 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity For iterative sub-algorithms, have a threshol: the maximum #iterations allowe The smaller the threshol, the less the compile overhea Once exceee, abort software pipelining Key question: How small can the threshol be? 10

55 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity For iterative sub-algorithms, have a threshol: the maximum #iterations allowe The smaller the threshol, the less the compile overhea Once exceee, abort software pipelining Key question: How small can the threshol be? Fin a goo enough scheule No backtracking Priority function: approximate an never upate Separate epenence an resource constraints Separate local an loop-carrie epenences 10

56 Keep scheuling time uner control Scheule in linear time Use either simple or fast sub-algorithms Avoi cubic or exponential complexity For iterative sub-algorithms, have a threshol: the maximum #iterations allowe The smaller the threshol, the less the compile overhea Once exceee, abort software pipelining Key question: How small can the threshol be? Fin a goo enough scheule No backtracking Priority function: approximate an never upate Separate epenence an resource constraints Separate local an loop-carrie epenences Iteratively improve a scheule 10

57 Just-In-Time Software Pipelining 11

58 Just-In-Time Software Pipelining Prepartition H, B Quickly creates an initial scheule to start with 11

59 Just-In-Time Software Pipelining Prepartition H, B Local scheuling Quickly creates an initial scheule to start with Hanles local epenences an resources 11

60 Just-In-Time Software Pipelining Prepartition H, B Local scheuling Kernel expansion Quickly creates an initial scheule to start with Hanles local epenences an resources Ajusts the scheule for loop-carrie epenences 11

61 Just-In-Time Software Pipelining Prepartition Local scheuling Kernel expansion Exit H, B Quickly creates an initial scheule to start with Hanles local epenences an resources Ajusts the scheule for loop-carrie epenences 11

62 Just-In-Time Software Pipelining Prepartition Local scheuling Kernel expansion Exit H, B Rotation 3 Quickly creates an initial scheule to start with Hanles local epenences an resources Ajusts the scheule for loop-carrie epenences Iteratively improves the scheule 11

63 Just-In-Time Software Pipelining Prepartition Local scheuling Kernel expansion Exit H, B Rotation 3 Time complexity: O(V+E) V: #operations E: #epenences 11 Quickly creates an initial scheule to start with Hanles local epenences an resources Ajusts the scheule for loop-carrie epenences Iteratively improves the scheule

64 Prepartition Iteration Local scheuling Kernel expansion Exit Rotation Local epenences Resources Loop-carrie epenences Illustration 12

65 Prepartition Local scheuling Kernel expansion abc RecMII Iteration Exit Rotation Local epenences Resources Loop-carrie epenences Illustration 12

66 Prepartition Local scheuling abc RecMII Iteration Kernel expansion abc Kernel Exit Rotation abc Local epenences Resources Loop-carrie epenences abc Illustration 12

67 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho 13

68 Conventional Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho O(V 3 ) 13

69 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho Conventional O(V 3 ) Turn the problem into a Markov ecision process 1 st time Howar algorithm is applie to pipelining 13

70 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho Conventional O(V 3 ) Howar policy iteration algo. O(exponential*E) Turn the problem into a Markov ecision process 1 st time Howar algorithm is applie to pipelining 13

71 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho Conventional Howar policy iteration algo. Linearize Howar O(V 3 ) O(exponential*E) O( H* E) Turn the problem into a Markov ecision process 1 st time Howar algorithm is applie to pipelining Linearize Howar with a small constant H 13

72 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho Conventional O(V 3 ) Howar policy iteration algo. O(exponential*E) Linearize Howar O( E) Turn the problem into a Markov ecision process 1 st time Howar algorithm is applie to pipelining Linearize Howar with a small constant H 13

73 Calculating RecMII Recurrence Minimum II II etermine by the biggest epenence cycles Neee by almost every software pipelining metho Conventional O(V 3 ) Howar policy iteration algo. O(exponential*E) Linearize Howar O( E) Turn the problem into a Markov ecision process 1 st time Howar algorithm is applie to pipelining Linearize Howar with a small constant H A by-prouct: critical 13 operations

74 Divie operations into stages Bellman-For O(V*E) 14

75 Divie operations into stages Bellman-For O(V*E) Also an iterative sub-algorithm 14

76 Divie operations into stages Bellman-For Linearize Bellman-For O(V*E) O( E) B* Also an iterative sub-algorithm Linearize it with a constant B 14

77 Divie operations into stages Bellman-For O(V*E) Linearize Bellman-For O( E) B* Also an iterative sub-algorithm Linearize it with a constant B Specific orer in visiting eges Scan noes in sequential orer, an visit their incoming eges Values are propagate along local eges in the 1 st itr. Values are propagate along loop-carrie eges in the 2 n itr. 14

78 Divie operations into stages Bellman-For O(V*E) Linearize Bellman-For O( E) Also an iterative sub-algorithm Linearize it with a constant B Specific orer in visiting eges Scan noes in sequential orer, an visit their incoming eges Values are propagate along local eges in the 1 st itr. Values are propagate along loop-carrie eges in the 2 n itr. 14

79 Prepartition Local scheuling abc RecMII Iteration Kernel expansion abc Kernel Exit Rotation abc Local epenences Resources Loop-carrie epenences abc Illustration 15

80 Prepartition Local scheuling abc RecMII Iteration Kernel expansion abc Kernel Exit Rotation abc Local epenences Resources Loop-carrie epenences abc Illustration 15

81 Prepartition Local scheuling abc II Iteration Kernel expansion abc Kernel Exit Rotation Local epenences Resources Loop-carrie epenences Illustration (Cont.) 16 abc abc

82 X X Prepartition Local scheuling Kernel expansion Exit Rotation Local epenences Resources a Illustration (Cont.) b c a Loop-carrie epenences 16 b c II a b c Kernel a b c Iteration

83 Local scheuling Any local scheuling algorithm can be use E.g. list scheuling 17

84 Local scheuling Any local scheuling algorithm can be use E.g. list scheuling Weakness: Loop-carrie epenences may be violate 17

85 Local scheuling Any local scheuling algorithm can be use E.g. list scheuling Weakness: Loop-carrie epenences may be violate To reuce the chance of violation: Before scheuling, priority function consiers loopcarrie epenences in avance Prioritize critical operations 17

86 Prepartition Local scheuling Kernel expansion Exit Rotation Iteration a b c a b c II a Kernel X X Local epenences Resources Loop-carrie epenences Illustration (Cont.) 18 b c a b c

87 Prepartition Local scheuling Kernel expansion Exit Rotation Iteration a b c a b c II a Kernel X X Local epenences Resources Loop-carrie epenences Illustration (Cont.) 18 b c a b c

88 x x Prepartition Local scheuling Kernel expansion Exit Rotation Local epenences Resources a b c Illustration (Cont.) 19 Loop-carrie epenences a b c II a b c a b c Iteration

89 Prepartition Local scheuling Kernel expansion x x x Exit Rotation Local epenences Resources Iteration a b c Loop-carrie epenences Illustration (Cont.) 19 II a b c Kernel a b c a b c

90 Prepartition Local scheuling Kernel expansion x x x Exit Rotation Local epenences Resources Loop-carrie epenences Illustration (Cont.) Iteration a b c a b c a b c 20 a b c

91 Prepartition Local scheuling Kernel expansion Exit Rotation Local epenences Resources a b c Loop-carrie epenences Illustration (Cont.) 20 Iteration a b c a Kernel b c a b c

92 Experiments Transmeta CMS on SPEC2k traces 21

93 Experiments Transmeta CMS on SPEC2k traces Functional simulator to comprehensively Explore threshols H an B Evaluate compile overhea an scheules quality 21

94 Experiments Transmeta CMS on SPEC2k traces Functional simulator to comprehensively Explore threshols H an B Evaluate compile overhea an scheules quality Cycle-accurate simulator Simulates cache misses, latencies, Initial performance stuy 21

95 Linearize Howar vs. exponential backoff + binary search [Rau et al. 1992] 22

96 1000 Linearize Howar vs. exponential backoff + binary search [Rau et al. 1992] #epenences

97 1000 Linearize Howar vs. exponential backoff + binary search [Rau et al. 1992] #epenences

98 Linearize Howar vs. exponential backoff + binary 1000 search [Rau et al. 1992] Average 9X faster Peak 812X faster H 3 3 for 96% of 11,992 loops #epenences

99 1000 Linearize Howar vs. exponential backoff + binary search [Rau et al. 1992] Average 9X faster Peak 812X faster H 3 3 for 96% of 11,992 loops H 14 for all loops 22 #epenences

100 Linearize Bellman-For 23

101 Linearize Bellman-For 100% % Total loops 50% 0% B 23

102 Linearize Bellman-For 100% % Total loops 50% 0% B B 3 for 98.8% of 11,992 loops 23

103 Linearize Bellman-For 100% % Total loops 50% 0% B B 3 for 98.8% of 11,992 loops 23

104 Linearize Bellman-For 100% % Total loops 50% 0% B B 3 for 98.8% of 11,992 loops B 5 for all the loops 23

105 Linearize Bellman-For 100% % Total loops 50% 0% B B 3 for 98.8% of 11,992 loops B 5 for all the loops From now on, we set H=10, B=5 11,910 loops scheule 23

106 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 95% 1X RS2 DESP JITSP RS2 DESP JITSP 24

107 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 95% 1X RS2 DESP JITSP RS2 DESP JITSP JITSP achieves optimal scheules for 95% loops 24

108 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 12X 95% 1X RS2 DESP JITSP RS2 DESP JITSP JITSP achieves optimal scheules for 95% loops 24

109 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 12X 13% 95% 1X RS2 DESP JITSP RS2 DESP JITSP JITSP achieves optimal scheules for 95% loops 24

110 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 12X 13% 95% 25% 1X RS2 DESP JITSP RS2 DESP JITSP JITSP achieves optimal scheules for 95% loops 24

111 Scheuling overhea & scheules quality Overhea % loops with optimal scheules 12X 13% 23% 95% 25% 1X RS2 DESP JITSP RS2 DESP JITSP JITSP achieves optimal scheules for 95% loops 24

112 Compile overhea istribution Acyclic scheuler, 27% Software pipelining, 33% Assembler, 6% Hot region optimizations, 34% Note: acyclic scheuler hanles acyclic coe, or loops NOT selecte for software pipelining 25

113 Preliminary performance 40 hot loops 26

114 Preliminary performance too many unrolls 1% Successfully generate coe 25% too many registers 55% filtere 19% 40 hot loops 26

115 Preliminary performance too many unrolls 1% Successfully generate coe 25% too many registers 55% filtere 19% 40 hot loops 26

116 Preliminary performance too many unrolls 1% Successfully generate coe 25% too many registers 55% filtere 19% 40 hot loops The architecture has bottleneck in registers 2/7/24/32 preicate/static alias/integer/floating point available for pipelining 26

117 Preliminary performance too many unrolls 1% Successfully generate coe 25% too many registers 55% filtere 19% 40 hot loops The architecture has bottleneck in registers 2/7/24/32 preicate/static alias/integer/floating point available for pipelining 26

118 Preliminary performance too many unrolls 1% Successfully generate coe 25% too many registers 55% filtere 19% 40 hot loops The architecture has bottleneck in registers 2/7/24/32 preicate/static alias/integer/floating point available for pipelining 26

119 Preliminary performance (Cont.) 2.00 II/MII (The lower, the better) 1.50 Optimal RS2 DESP JITSP 27

120 Preliminary performance (Cont.) 2.00 II/MII (The lower, the better) 1.50 Optimal RS2 DESP JITSP JITSP achieves optimal scheules for all but 1 loops 27

121 Preliminary performance (Cont.) 40% 20% 0% -20% Speeup RS2 DESP JITSP -40% JITSP 10~36% speeup. Better than the others 28

122 Preliminary performance (Cont.) 40% 20% 0% -20% Speeup RS2 DESP JITSP -40% JITSP 10~36% speeup. Better than the others Exception: loop 2. Optimal scheule but slowown ue to memory aliases retranslation neee 28

123 Preliminary performance (Cont.) 40% Speeup 20% 0% -20% RS2 DESP JITSP -40% JITSP 10~36% speeup. Better than the others Exception: loop 2. Optimal scheule but slowown ue to memory aliases retranslation neee Speeup swim(5.3%), ammp(4.4%), 28 mcf(3.6%)

124 Conclusion 29

125 Conclusion 1st linear software pipelining algorithm implemente for ynamic compilers 29

126 Conclusion 1st linear software pipelining algorithm implemente for ynamic compilers Turns a traitionally-expensive optimization into linear time O(V+E) 29

127 Conclusion 1st linear software pipelining algorithm implemente for ynamic compilers Turns a traitionally-expensive optimization into linear time O(V+E) Taking avantages of results from various omains: harware circuit esign (Retiming) stochastic control (Howar algorithm) Graph (Bellman-For) software pipelining (Rotation scheuling an DESP) 29

128 Conclusion 1st linear software pipelining algorithm implemente for ynamic compilers Turns a traitionally-expensive optimization into linear time O(V+E) Taking avantages of results from various omains: harware circuit esign (Retiming) stochastic control (Howar algorithm) Graph (Bellman-For) software pipelining (Rotation scheuling an DESP) Generates optimal or near-optimal scheules with reasonable compile overhea 29

129 Future work Register availability A more architecture registers Algorithm: Register pressure-aware Implementation Loop selection Re-translation Evaluation Benchmarks 30

130 Backup slies 31

131 Howar algorithm Each operation is reware to reach a cycle via 1 policy ege The bigger the cycle, the more the rewar 32

132 Howar algorithm Each operation is reware to reach a cycle via 1 policy ege The bigger the cycle, the more the rewar a c b 32

133 Howar algorithm Each operation is reware to reach a cycle via 1 policy ege The bigger the cycle, the more the rewar a c b 32

134 Howar algorithm Each operation is reware to reach a cycle via 1 policy ege The bigger the cycle, the more the rewar a c b 32

135 Distribution statistics of JITSP 33

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation Michael O Boyle mob@inf.e.ac.uk Room 1.06 January, 2014 1 Two recommene books for the course Recommene texts Engineering a Compiler Engineering a Compiler by K. D. Cooper an L. Torczon.

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

CMSC 430 Introduction to Compilers. Spring Register Allocation

CMSC 430 Introduction to Compilers. Spring Register Allocation CMSC 430 Introuction to Compilers Spring 2016 Register Allocation Introuction Change coe that uses an unoune set of virtual registers to coe that uses a finite set of actual regs For ytecoe targets, can

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Power-Performance Trade-offs for Energy-Efficient Architectures: A Quantitative Study

Power-Performance Trade-offs for Energy-Efficient Architectures: A Quantitative Study In the Proc. of the Intl. Conf. on Computer Design (ICCD-02), Frieburg, March 2002 Power-Performance Trae-offs for Energy-Efficient Architectures: A Quantitative Stuy Hongbo Yang R. Govinarajan Guang R.

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique International OPEN ACCESS Journal Of Moern Engineering Research (IJMER) An Aaptive Routing Algorithm for Communication Networks using Back Pressure Technique Khasimpeera Mohamme 1, K. Kalpana 2 1 M. Tech

More information

Algorithm for Intermodal Optimal Multidestination Tour with Dynamic Travel Times

Algorithm for Intermodal Optimal Multidestination Tour with Dynamic Travel Times Algorithm for Intermoal Optimal Multiestination Tour with Dynamic Travel Times Neema Nassir, Alireza Khani, Mark Hickman, an Hyunsoo Noh This paper presents an efficient algorithm that fins the intermoal

More information

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms Proceeings of the 7th WSEAS International Conference on Simulation, Moelling an Optimization, Beijing, China, September 15-17, 2007 429 Frequenc Domain Parameter Estimation of a Snchronous Generator Using

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

Baring it all to Software: The Raw Machine

Baring it all to Software: The Raw Machine Baring it all to Software: The Raw Machine Elliot Waingol, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Srikrishna Devabhaktuni, Rajeev Barua, Jonathan Babb,

More information

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots Aaptive Loa Balancing base on IP Fast Reroute to Avoi Congestion Hot-spots Masaki Hara an Takuya Yoshihiro Faculty of Systems Engineering, Wakayama University 930 Sakaeani, Wakayama, 640-8510, Japan Email:

More information

Chapter 3 (Cont III): Exploiting ILP with Software Approaches. Copyright Josep Torrellas 1999, 2001, 2002,

Chapter 3 (Cont III): Exploiting ILP with Software Approaches. Copyright Josep Torrellas 1999, 2001, 2002, Chapter 3 (Cont III): Exploiting ILP with Software Approaches Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Exposing ILP (3.2) Want to find sequences of unrelated instructions that can be overlapped

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links Overview 15-441 15-441: Computer Networking 15-641 Lecture 24: Wireless Eric Anerson Fall 2014 www.cs.cmu.eu/~prs/15-441-f14 Internet mobility TCP over noisy links Link layer challenges an WiFi Cellular

More information

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11:

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11: airwise alignment using shortest path algorithms, Gunnar Klau, November 9,, : 3 3 airwise alignment using shortest path algorithms e will iscuss: it graph Dijkstra s algorithm algorithm (GDU) 3. References

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks Provisioning Virtualize Clou Services in IP/MPLS-over-EON Networks Pan Yi an Byrav Ramamurthy Department of Computer Science an Engineering, University of Nebraska-Lincoln Lincoln, Nebraska 68588 USA Email:

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance International Journal of Moern Communication Technologies & Research (IJMCTR) Impact of FTP Application file size an TCP Variants on MANET Protocols Performance Abelmuti Ahme Abbasher Ali, Dr.Amin Babkir

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are 9. Code Scheduling for ILP-Processors Typical layout of compiler: traditional, optimizing, pre-pass parallel, post-pass parallel {Software! compilers optimizing code for ILP-processors, including VLIW}

More information

ACE: And/Or-parallel Copying-based Execution of Logic Programs

ACE: And/Or-parallel Copying-based Execution of Logic Programs ACE: An/Or-parallel Copying-base Execution of Logic Programs Gopal GuptaJ Manuel Hermenegilo* Enrico PontelliJ an Vítor Santos Costa' Abstract In this paper we present a novel execution moel for parallel

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding .. ou Can Do That Unit Computer Organization Design of a imple Clou & Distribute Computing (CyberPhysical, bases, Mining,etc.) Applications (AI, Robotics, Graphics, Mobile) ystems & Networking (Embee ystems,

More information

Uninformed search methods

Uninformed search methods CS 1571 Introuction to AI Lecture 4 Uninforme search methos Milos Hauskrecht milos@cs.pitt.eu 539 Sennott Square Announcements Homework assignment 1 is out Due on Thursay, September 11, 014 before the

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Workspace as a Service: an Online Working Environment for Private Cloud

Workspace as a Service: an Online Working Environment for Private Cloud 2017 IEEE Symposium on Service-Oriente System Engineering Workspace as a Service: an Online Working Environment for Private Clou Bo An 1, Xuong Shan 1, Zhicheng Cui 1, Chun Cao 2, Donggang Cao 1 1 Institute

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Nearest Neighbor Search using Additive Binary Tree

Nearest Neighbor Search using Additive Binary Tree Nearest Neighbor Search using Aitive Binary Tree Sung-Hyuk Cha an Sargur N. Srihari Center of Excellence for Document Analysis an Recognition State University of New York at Buffalo, U. S. A. E-mail: fscha,sriharig@cear.buffalo.eu

More information

Distributed Planning in Hierarchical Factored MDPs

Distributed Planning in Hierarchical Factored MDPs Distribute Planning in Hierarchical Factore MDPs Carlos Guestrin Computer Science Dept Stanfor University guestrin@cs.stanfor.eu Geoffrey Goron Computer Science Dept Carnegie Mellon University ggoron@cs.cmu.eu

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients Probabilistic Meium Access Control for 1 Full-Duplex Networks with Half-Duplex Clients arxiv:1608.08729v1 [cs.ni] 31 Aug 2016 Shih-Ying Chen, Ting-Feng Huang, Kate Ching-Ju Lin, Member, IEEE, Y.-W. Peter

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

A Metric for Routing in Delay-Sensitive Wireless Sensor Networks

A Metric for Routing in Delay-Sensitive Wireless Sensor Networks A Metric for Routing in Delay-Sensitive Wireless Sensor Networks Zhen Jiang Jie Wu Risa Ito Dept. of Computer Sci. Dept. of Computer & Info. Sciences Dept. of Computer Sci. West Chester University Temple

More information

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Aitional Divie an Conquer Algorithms Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Divie an Conquer Closest Pair Let s revisit the closest pair problem. Last

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

Dense Disparity Estimation in Ego-motion Reduced Search Space

Dense Disparity Estimation in Ego-motion Reduced Search Space Dense Disparity Estimation in Ego-motion Reuce Search Space Luka Fućek, Ivan Marković, Igor Cvišić, Ivan Petrović University of Zagreb, Faculty of Electrical Engineering an Computing, Croatia (e-mail:

More information

Handling missing values in kernel methods with application to microbiology data

Handling missing values in kernel methods with application to microbiology data an Machine Learning. Bruges (Belgium), 24-26 April 2013, i6oc.com publ., ISBN 978-2-87419-081-0. Available from http://www.i6oc.com/en/livre/?gcoi=28001100131010. Hanling missing values in kernel methos

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Slicing Long Running Queries

Slicing Long Running Queries licing Long unning Queries Nicolas Bruno Microsoft esearch nicolasb@microsoft.com Vivek Narasayya Microsoft esearch viveknar@microsoft.com avi amamurthy Microsoft esearch ravirama@microsoft.com ABTACT

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Optimizing the quality of scalable video streams on P2P Networks

Optimizing the quality of scalable video streams on P2P Networks Optimizing the quality of scalable vieo streams on PP Networks Paper #7 ASTRACT The volume of multimeia ata, incluing vieo, serve through Peer-to-Peer (PP) networks is growing rapily Unfortunately, high

More information

Natural Language Processing (CSE 517): Dependency Syntax and Parsing

Natural Language Processing (CSE 517): Dependency Syntax and Parsing Natural Language Processing (CSE 517): Depenency Syntax an Parsing Noah A. Smith Swabha Swayamipta c 2018 University of Washington {nasmith,swabha}@cs.washington.eu May 11, 2018 1 / 93 Recap: Phrase Structure

More information

PART 5. Process Coordination And Synchronization

PART 5. Process Coordination And Synchronization PART 5 Process Coorination An Synchronization CS 503 - PART 5 1 2010 Location Of Process Coorination In The Xinu Hierarchy CS 503 - PART 5 2 2010 Coorination Of Processes Necessary in a concurrent system

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Jian Zhao, Chuan Wu The University of Hong Kong {jzhao,cwu}@cs.hku.hk Abstract Peer-to-peer (P2P) technology is popularly

More information

Instruction Scheduling

Instruction Scheduling Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation

More information

DeltaPath: Precise and Scalable Calling Context Encoding

DeltaPath: Precise and Scalable Calling Context Encoding DeltaPath: Precise an Scalable Calling Context Encoing Qiang Zeng, Junghwan Rhee, Hui Zhang, Nipun rora, Guofei Jiang, Peng Liu Penn State University, NEC Laboratories merica BSTRCT Calling context provies

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stoylar arxiv:15.4984v1 [cs.ni] 27 May 21 Abstract

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Distributing Frank-Wolfe via Map-Reduce

Distributing Frank-Wolfe via Map-Reduce Distributing Frank-Wolfe via Map-Reuce Armin Moharrer an Stratis Ioanniis ortheastern University amoharrer@ece.neu.eu, ioanniis@ece.neu.eu Abstract We ientify structural properties uner which a convex

More information

Patterns for! Parallel Programming II!

Patterns for! Parallel Programming II! Lecture 4! Patterns for! Parallel Programming II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Task Decomposition Also known as functional

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors

More information

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance Receive: November 20, 2017 121 A Novel Density Base Clustering Algorithm by Incorporating Mahalanobis Distance Margaret Sangeetha 1 * Velumani Paikkaramu 2 Rajakumar Thankappan Chellan 3 1 Department of

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Demystifying Automata Processing: GPUs, FPGAs or Micron s AP?

Demystifying Automata Processing: GPUs, FPGAs or Micron s AP? Demystifying Automata Processing: GPUs, FPGAs or Micron s AP? Marziyeh Nourian 1,3, Xiang Wang 1, Xiaoong Yu 2, Wu-chun Feng 2, Michela Becchi 1,3 1,3 Department of Electrical an Computer Engineering,

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

SMARQ: Software-Managed Alias Register Queue for Dynamic Optimizations

SMARQ: Software-Managed Alias Register Queue for Dynamic Optimizations SMARQ: SoftwareManaged Alias Register Queue for Dynamic Optimizations heng Wang Youfeng Wu Hongbo Rong Hyunchul ark rogramming Systems Lab Microprocessor and rogramming Research Intel Labs {cheng.c.wang,

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

Scalable Deterministic Scheduling for WDM Slot Switching Xhaul with Zero-Jitter

Scalable Deterministic Scheduling for WDM Slot Switching Xhaul with Zero-Jitter FDL sel. VOA SOA 100 Regular papers ONDM 2018 Scalable Deterministic Scheuling for WDM Slot Switching Xhaul with Zero-Jitter Bogan Uscumlic 1, Dominique Chiaroni 1, Brice Leclerc 1, Thierry Zami 2, Annie

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Optimising for the p690 memory system

Optimising for the p690 memory system Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections ) Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications Top-own Connectivity Policy Framework for Mobile Peer-to-Peer Applications Otso Kassinen Mika Ylianttila Junzhao Sun Jussi Ala-Kurikka MeiaTeam Department of Electrical an Information Engineering University

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems an Software 83 (010) 1864 187 Contents lists available at ScienceDirect The Journal of Systems an Software journal homepage: www.elsevier.com/locate/jss Embeing capacity raising

More information

Enabling Rollback Support in IT Change Management Systems

Enabling Rollback Support in IT Change Management Systems Enabling Rollback Support in IT Change Management Systems Guilherme Sperb Machao, Fábio Fabian Daitx, Weverton Luis a Costa Coreiro, Cristiano Bonato Both, Luciano Paschoal Gaspary, Lisanro Zambeneetti

More information

Data Mining: Clustering

Data Mining: Clustering Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity

More information

MOBILE data is continuing to grow exponentially, and

MOBILE data is continuing to grow exponentially, and IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 25, NO. 3, JUNE 2017 1431 Enhancing Mobile Networks With Software Define Networking an Clou Computing Zizhong Cao, Stuent Member, IEEE, Shivenra S. Panwar, Fellow,

More information