Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations

Size: px
Start display at page:

Download "Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations"

Transcription

1 2012 Thir International Conference on Networking an Computing Towars a Low-Power Accelerator of Many FPGAs for Stencil Computations Ryohei Kobayashi Tokyo Institute of Technology, Japan kobayashi@arch.cs.titech.ac.jp Shinya Takamaea-Yamazaki Tokyo Institute of Technology, Japan JSPS Research Fellow, Japan takamaea@arch.cs.titech.ac.jp Kenji Kise Tokyo Institute of Technology, Japan kise@cs.titech.ac.jp Abstract We have propose the effective stencil computation metho an the architecture by employing multiple small FPGAs with 2D-mech topology. In this paper, we show that our propose architecture works correctly on the real 2D-mesh connecte FPGA array. We evelope a software simulator in C++, which emulates our propose architecture, an implemente two prototype systems in Verilog HDL. One prototype system is for logic verification with communication moules an the other is for estimation of power consumption without communication moules. We run the former prototype system for 2M cycles an check the behavior with the software simulator. Our architecture is evelope towars a low-power accelerator of many FPGAs. The evaluation result with the secon prototype shows that the system of a single FPGA noe with eight floating-point aers an eight floating-point multipliers archives 2.24GFlop/s in 0.16GHz operations with 2.37W power consumption. This performance/w value is about six-times better than NViia GTX280 GPU car. Inex Terms FPGA accelerator, Stencil computation, Lowpower I. INTRODUCTION Stencil computation is one of the typical scientific computing kernels [1]. Various accelerators to solve stencil computation at high spee are esigne by using multiple high en FPGAs [2][3]. We have propose a stencil computing metho optimize for a 2D-mesh-connecte FPGA array [4]. This paper escribes implementation result of our propose metho. This paper also shows that our esigne architecture works correctly on the real 2D-mesh connecte FPGA array. This system is evelope towars a low-power accelerator of many FPGAs. We alreay have evelope a 2D-mesh connecte FPGA array, ScalableCore system which is a high spee simulation environment for many-core processors research [5]. The ScalableCore system uses multiple small-capacity FPGAs, which are connecte in 2D-mesh. In this paper, we use harware components of the ScalableCore system as an infrastructure for HPC harware accelerators. In orer to achieve high performance, the pipelines of the execution units shoul be kept operating effectively while the computation. In the stencil computation, whole the ata is ivie into multiple blocks an each block is assigne to each FPGA. The bounary ata of each block is share by the ajacent FPGAs. In our system, the computation orer is Fig. 1. 2D stencil computation Fig. 2. Pseuo coe of 2D stencil computation customize in each FPGA in orer to increase the acceptable latency of the ata sharing among the FPGAs. II. PARALLEL STENCIL COMPUTATION BY USING MULTI-FPGAS Fig. 1 shows a typical pattern of 2D stencil computation. In the figure, each circle represents a value of gri-point an each value of gri-point at next time-step is compute by using the values of its four ajacent gri-points at current time-step. Fig. 2 shows a pseuo coe of 2D stencil computation shown in Fig. 1. In the figure, k represents time-step, (i, j) represents coorinate of gri-point. Two buffers, V0 an V1, are use for the computation. The value of gri-point (i, j) is represente as Vn[i][j] an n represent the buffer number (0 or 1). As shown as the fourth line in Fig. 2, Vn[i][j] is upate by the summation of four values. The each value is obtaine by multiplying weighting factor by one ajacent gri-points (Vn[i-1][j], Vn[i][j-1], Vn[i][j+1], Vn[i+1][j]). As shown as the seventh an eighth line in Fig. 2, every gri-point is upate for the next time-step /12 $ IEEE DOI /ICNC

2 Fig. 3. Block ivision an assigne to each FPGA. Whole the ata is ivie into multiple blocks accoring to the number of vertical an horizontal array of FPGAs an each block is assigne to each FPGA. The bounary ata of each block is share multiple FPGAs via their communication interfaces. The ata sharing takes some overhea of ata traversals. In orer to eliminate this overhea, we customize the computation orer for each FPGA. As shown in Fig. 3, the ata set of stencil computation is ivie into several blocks accoring to the number of vertical an horizontal array of FPGAs. Each ata block is assigne to each FPGA. The computation on each FPGA uses the assigne ata an the bounary ata of each block share. The necessary bounary ata of the ajacent FPGAs have to be sent to. In Fig. 3, the circle represents gri-point, a group of gri-points (4 4) is assigne one FPGA, an arrow represents communication to the neighbor FPGA. Gray regions represent the ata subset communicate to other FPGAs. Fig. 4 shows two cases of computation orer. Fig. 4 (a) shows the orer that FPGA (A) an FPGA (B) compute by the same orer. A otte square shows the ata subset assigne to a FPGA. In fact, the computations use extra ata of the bounary which is not share. However, extra ata is omitte in this figure for simplicity. We efine a sequent process to compute all the gri-points at a time-step as Iteration. The circle represents one gri-point. The alphabet in a circle represents ID of the FPGA. The number in a cycle represents computing orer in the FPGA, therefore, the computations of each FPGA procee in orer of the arrow. In this example, each FPGA upates the assigne ata of sixteen gri-points (from 0 to 15) uring every Iteration. For simplicity, we assume that a computation upating a value of one gri-point takes just a cycle an several FIFOs are use in orer to avoi illegal moification of the ata. The value of A0 is compute at 0th cycle an the value of A1 is compute at 1st cycle in FPGA (A). Similarly, the value of B0 is compute at 0th cycle an the value of B1 is compute at 1st cycle in FPGA (B). All the computations are processe in this orer. We assume that each FPGA can use the obtaine ata of the FPGA in a single cycle. After the completion of the computations for each Iteration, the process procees to the next time-step. In this case, Iteration takes sixteen cycles to complete the computations. The first Iteration begins with 0th cycle an the secon Iteration begins with 16th cycle. Fig. 4. The computing orer of gri-points on FPGA. (b) is propose metho [4]. Fig. 5. Computing orer applie propose metho. Therefore, the thir Iteration begins with 32n cycle. In Fig. 4 (a), the computation of gri-point B1 uses the values of vertical an horizontal gri-points A13, B5, B0, B2. The value of gri-point A13 nees to be communicate between FPGA (A) an FPGA (B) because the value is share with these FPGAs. The others o not nee to be communicate between FPGAs. In this computation orer, the value of gripoint A13 is compute at 13th cycle an the value of gri-point B1 is compute at 17th cycle. The computation of B1 uses the compute value of A13. In orer not to stall the computation of B1, the value of A13 must be communicate within three cycles (14, 15, 16) after the computation. The values of gripoints A12, A14, A15 must also be communicate within three cycles in orer not to stall the computations. If the N M gripoints are assigne to a single FPGA, every shar value must be communicate within N-1 cycles because of this iscussion. Fig. 4 (b) moels that FPGA (C) an FPGA (D) compute in reverse orer. The computation orer of FPGA (C) is the inverse orer of FPGA (A) in Fig.4 (a). FPGA (B) an FPGA (D) use the same computation orer. In this case, in orer not to stall the computation of D1 of Iteration 2 (17th cycle), the margin to sen value of C1 (1st cycle) is 15 cycles (2sim16). If the N M gri-points are assigne to a single FPGA, communication latency between FPGA (A) an FPGA (B) must be within N M 1 cycles because of this iscussion. In this way, by means of changing computation orer, acceptable latency of communication is increase. Fig. 5 shows the computation orer (propose metho) in each FPGA. The square represents FPGA, the arrow represents 344

3 Fig. 7. Fig. 6. MADD architecture with eight. Relationship between the gri-points an BlockRAM. computation orer in Fig. 5. FPGAs of 1st an 3r rows compute in the same orer as FPGA (C) in Fig. 4 (b). FPGAs of 2n an 4th in Fig. 5 compute in the same orer as FPGA (D) in Fig. 4 (b). As iscusse in Fig. 4, the communication latency between FPGAs in propose metho can ensure the cycles to require about one Iteration. Communication to face each other in the irection of the arrow can also. That is, compute cycles of ajacent sies are equal when to place FPGA (C) an FPGA (D) in Fig. 4 (b) upsie own. Consier the communication between the left an right sies of the FPGA. C3 an C0 are ajacent when the two sie-bysie to the left or right FPGA (C) in Fig. 4 (b). In this time, acceptable latency of communication is 12 cycles in the FPGA of the right. This number of cycles is calculate by the cycles accoring to an Iteration minus the gri-points of one sie cycles. In this way, the propose metho gives increase acceptable latency of communication by computing the up an own in reverse orer, in other wors, this metho ensure margin of about one Iteration. Until now, we efine that the computation of one gri-point takes one cycle. However, if the computation of one gripoint takes k cycles, the acceptable communication latency is (N M M) k cycles between left FPGA an right FPGA. III. ARCHITECTURE AND IMPLEMENTION The noe architecture implemente in a FPGA is part of the computation. This architecture is assume a single FPGA. Therefore, communication between FPGAs is not consiere. We efine that the noe architecture implemente communication moules is system architecture. An then, use ata type is single precision floating-point. A. Noe Architecture Fig. 6 shows the noe architecture with eight multiplyaer units. The square in the figure represents BlockRAM 1. MADD represents Multiply-Aer unit. The square in MADD represents register. Both multiplier an aer are single precision floating-point unit which conforms to IEEE 754. We use the multiplier an aer both have seven pipeline stages. In this case, since two registers are inclue to the MADD, the pipeline of the ata path in the MADD becomes sixteen stages. Therefore, the ata path is regare as connecting the eight stages aer an eight stages multiplier. This pipeline scheuling is vali only when with of compute gri is equal to the pipeline stages of multiplier an aer. So, we ecie multiplier an aer have eight stages. We explain the reason later in this pepar. Fig. 7 shows the relationship between BlockRAM in Fig. 6 an gri-points. The number written in BlockRAM in Fig. 6 correspons to the number in respectively. In Fig. 7, the ata set which assigne to each FPGA is split in the vertical irection, an is store in each BlockRAM (0 7). They are surroune by the ashe line. If the ata set of is assigne to one FPGA, the split ata set (8 128) is store in each BlockRAM (0 7). Furthermore, the ata of the communication region is store in another BlockRAM or some BlockRAMs (it is not 0 7 BlockRAM surroune by the ashe line). The communication region is the set of ata which is transferre to the ajacent noes. However, the computation in single FPGA always use ata of same region, an on t upate the ata of communication region since the ata of communication region is not communicate because of not existing ajacent FPGA noes. Therefore, the BlockRAM store the ata of the communication region oes not have ports to input. Fig. 8 shows MADD pipeline operation. The circle in the figure represents the value of gri-point an the square is the computation result which the value of the gri-point is multiplie by a weighting factor. Both multiplier an aer have eight stages of the pipeline. Fig. 8 (a) shows the number of gri-point. We explain the computation of gri-points First of all, gri-points 1 8 are loae from BlockRAM an they are input to the multiplier in cycles 0 7. Next, the computation result is output from multiplier, at the same times, gri-points are input to the multiplier in cycles An then, gri-points are input to the multiplier, at the same time, value of gri-points 1 8 an multiplie by a weighting factor are summe in cycles Finally, computation results that ata of up, own, left an right gir-points are multiplie by a weighting factor an summe are output in cycles The ata of gri-point which will be use must not be upate by writing computation result in BlockRAM. Therefore, general approach uses the temporary buffer in which the ata is store, such as FIFO, before writing them in BlockRAM. 1 BlockRAM is low-latency SRAM which each FPGA has. 345

4 e e e e e e e e e e e e e e e / e e e e e e e e e e e e e e e e e / e e e e e e e e e e e e e e e / e e e e e e e e e e e e e e e e e /ZD e e e e e e e e e e e e Fig. 8. MADD pipeline operation. But, the propose architecture nees no aitional temporary buffer because MADD pipeline give the same functionality as temporary buffer. In the case of Fig. 8, the ata of gripoints are upate in cycles This ata of gripoints are input to the multiplier in cycles 32 40, an are not use later. Therefore, if the computation in a single FPGA, the orer of upate ata is protecte without using FIFO. As previously explaine, this scheuling is vali only when with of compute gri is equal to the pipeline stages of multiplier an aer. The with of compute gri which a MADD processes is eight because the number of the pipeline stages of the multiplier an the aer is eight. This architecture achieves about 100% always fille. The filing rate of the pipelines is (N-8/N) 100. N is cycles which taken this computation. In aition to, this architecture oes not use the aitional temporary buffer to upate ata. Therefore, this architecture can achieve high computation performance an the small circuit area. E ^ t D D D D D D D D e ' ' t ' ' E ^ Fig. 9. System architecture. ZKD y&^ h ^ ^ ^ ^ :' B. System Architecture Fig. 9 shows the system architecture. We escribe the ifference between Fig. 9 an Fig. 6. TheDESin the figure is a eserializer which receives ata from ajacent FPGA an theseris a serializer which sens ata to ajacent FPGA. The ata which the eserializer receives is store in FIFO to maintain the upate orer. The ata which the FIFO receives is store in only the BlockRAM. The input of the serializer is also prepare FIFO. This FIFO is input computation results of MADD, however, only the ata of communication region. An then, GATE as vali-bit of 1bit to computation results of MADD an input this ata to the serializer. This vali-bit is rea-enable signal of the FIFO prepare as the output estination of the eserializer which receives the ata from ajacent FPGAs. Therefore, this vali-bit ensures that the ata of communication region which is use to compute is store to the FIFO. C. Development Flow We implement the prototype system compose of many FPGAs for logic verification of propose metho. This implementation is use boars of ScalableCore. We explain ratio- nality of the implementation that multi-fpga are connecte. Logic verification of small FPGA is easier than implemente in a single big FPGA. Even if a FPGA has broken own, the system operates normally by replacing the FPGA. In this way, there are several merits. An then, use ata type is integer because of the ease of ebugging. We coe the software simulator in C++, which emulates stencil computation in cycle level accuracy in multiple FPGA noes. The execution results of the software simulator are verifie by compare to the execution result of the stencil computation program in function level accuracy coe in C. Then, we implemente the circuits in Verilog HDL by reference to the cycle level software simulator an verifie them by using iverilog an GTKwave. We use MADD which type is integer. The implementation of Ser/Des is use ata recovery an NRZI coe. D. Initialization Mechanism As escribe in II, the computation orer on each FPGA is ifferent to increase the acceptable latency of communication. To etermine the computation orer of each FPGA, every FPGA uses own position coorinate in the system. We 346

5 W^ WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD WZKD ^ZD Fig. 12. Configuration iagram of the mesh connecte FPGA array. Fig. 11. Fig. 10. Proviing cooinate. Sening start signal of computation. implemente a mechanism to provie the position coorinate. Fig. 10 shows how to provie the position coorinate for all FPGAs. The square in Fig. 10 represents FPGA noe, we efine the noe in the upper left as Master noe. The Master noe provies their positions to ajacent noes. The horizontal arrow in Fig. 10 represents the elivery which x-coorinate is provie by aing own x-coorinate an 1. The vertical arrow in Fig. 10 represents the elivery which a y-coorinate is provie by aing own y-coorinate an 1. Eventually, all of FPGA noes know own position coorinate. It is necessary for this array system to be synchronize precisely the timing of start of computation in the first Iteration because this array system is not able to get the ata of communication region to be use for the next Iteration if there is a skew. Therefore, we esigne the prototype circuit generating the start signal of computation. Fig. 11 shows communication pattern of start signal of the computation. The square in Fig. 11 represents FPGA noe, an the arrow represents start signal of computation in the first Iteration. The FPGA noe in the upper left sens start signal to right an own FPGA noes at first. The FPGA noe which receive start signal from upper or left noes sens start signal to right an own FPGA noes immeiately. By oing this, all FPGA noes receive the start signal. IV. EVALUATION A. Environment Fig. 12 shows harware configuration of FPGA array 2.Itis possible to scale array system freely accoring to gri-size of stencil computation by connecting the FPGA in mesh. Each noe in the FPGA array is equippe with FPGA (Xilinx Spartan-6 XC6SLX16), an BlockRAM capacity of each FPGA is 72KB. Implementing MADD in the FPGA is use IP core that core-generator which Xilinx Co. owns gives. Implementing single MADD expens four pieces of 32 DSPblocks which a Spartan-6 FPGA has. Therefore, the number of MADD to be able to be implemente in single FPGA is eight. We coe these circuits in Verilog HDL an use Xilinx ISE 13.3 to generate circuit information. We use the program of stencil computation which is coe in C because of verification of the circuits implemente an comparison of execution spee. We coe the program for verification by using Softfloat library whose computation precision is same as floating-point arithmetic of FPGA. Moreover, we coe the program for comparison of execution spee by not using the library because it is important for this version program to run faster. IV-B shows performance evaluation of the single FPGA which eight MADD is implemente in. Gri-size 3 is 2Data set (64 128) an number of Iteration are Computation result is output to PC connecte by USB, an we compare it to program execution result which is coe in C,as a result, ata of all gri-points are matche. B. Harware Resource Consumption LUT utilization of single an eight MADD implemente in the FPGA are 9% an 50% respectively 4. Table I shows harware resource consumption of single FPGA, however, this 2 SRAM in Fig. 12 is not use. 3 The total number of gri-points which can be compute are, 72KB(BlockRAMcapacity) 4B(ata-size of gri-point(single precision floating-point)), 18K. However, With of gri is 64 because of number of MADD an scheuling conition. 4 9% inclues communication moule to output to PC an optimizations are enable because of implementing multiple MADD. Therefore, LUT utilization of eight MADD implemente in the FPGA is less than 9%

6 TABLE I HARDWARE RESOURCE CONSUMPTION Device Utilization Summary Slice Logic Utilization Use / Available Utilization LUTs 4,560 / 9,112 50% Slices 1,527 / 2,278 67% BlockRAM 24 / 32 75% DSP48A1 32 / % & ' W TABLE II DESIGN PARAMETERS. operation frequency the number of FPGA the number of MADD harware peak performance number of computation the total number of gri-points number of Iteration F GHz N FPGA N MADD P peak GFlop/s OP GRID ITER Fig. 13. FPGA. e e e &D, Peak an effective performance of stencil computation in single table o not inclue moule to communicate with ajacent FP- GAs. Utilization of DSP block is 100% because implementing eight MADD consumes all of DSP block. t W leel C. Performance Of Single FPGA Noe Table II shows esign parameters to analyze performance of FPGA array. Operation frequency is F GHzthe number of MADD implemente in each FPGA is N MADD, the number of FPGA N FPGA. Each MADD can operate aition an multiplication on every cycle at the same time. For this reason, harware peak performance of single MADD is 2F GFlop/s, an harware peak performance of single FPGA is 2FN MADD GFlop/s. Therefore, harware peak performance of FPGA array which N FPGA are connecte is shown below. P peak =2 F N FPGA N MADD (1) When operation frequency is 0.16GHz, harware peak performance P peak is 2.56 GFlop/s because N MADD is 8, N FPGA is 1. However, as shown in Fig. 2, Average utilization of MADD unit is 100 (4+3)/8 = 87.5% computation of single gir-point is floating point arithmetic of seven times 5. Therefore, peak performance with operation frequency 0.16GHz is = 2.24GFlop/s. Fig. 13 shows peak performance an effective performance of stencil computation by single FPGA epening on operation frequency. Effective performance is measure by the total number of floating point arithmetic ivie by execution time. The total number of floating point arithmetic is shown below by using OP, GRID, ITER in Table II 6. We measure execution time by stop-watch. OP GRID ITER = As shown in Fig. 13, since peak an effective performance of stencil computation are almost same, that overhea of 5 The four multiplications an the three aitions 6 OP is the total number of computation require to upate ata of a gripoints from time-step k to k+1. In this case, OP is seven because of four multiplications an the three aitions. e e Fig. 14. Power consumption. propose computation metho is small is figure out. Moreover, we compile the stencil computation program coe for comparison in C with -O3 option. Effective performance is 8.64GFlop/s when running on a single threa in Intel Core i with operation frequency 3.4GHz. This result is aequate performance, compare to 2.8GFlop/s in [6]. The effective performance with the prototype system for estimation of power consumption without communication moules shows that the system of a single FPGA implementing eight floating-point aers an eight floating-point multipliers archives 2.24GFlop/s in 0.16GHz operations with 2.37W power consumption. Effective performance in Intel Corei with operation frequency 3.4GHz is 8.64GFlop/s. Therefore, single FPGA achieves performance of 26% of Intel Core i7. D. Power Consumption in Single FPGA Noe We connecte multiple FPGA noes with operation frequency 0.16GHz an measure power consumption in single FPGA noe by Watt Checker. The power consumption in FPGA system connecte 10 FPGA noes is 25W. Fig. 14 shows power consumption epening on the number of FPGA noes. Power consumption in single FPGA noe is about 2.37W by taking a linear approximation for the plotte points. The value (1.1404) of linear approximate equation in Fig. 14 is thought power consumption of power boar supplying to each FPGA noe. 348

7 & ' Fig. 15. & e', e e e e e E Estimation of effective performance improvement rate. E. Operation Check in Real System Presently, we checke that the array system compose of four FPGA noes (2, 2) run without causing a stall when operation frequency is 40MHz, communication frequency is 100MHz. We verifie the array system by comparing the sum ata of gir-points assigne to a FPGA with execution result of the stencil computation program coe in C. F. Estimation of Effective Performance in 256 FPGA Noes In this section, we show the estimation of effective performance when F is 0.16GHz, N MADD is 8 an N FPGA is 256. Fig. 15 shows the estimation of effective performance improvement rate epening on the number of FPGA. P peak is 655GFlop/s because of equation (1). But, since utilization of MADD is 87.5%, upper limit of effective performance is 655GFlop/s = 573GFlop/s without overhea of communication. Moreover, we show the estimation of effective performance par watt. Power consumption of FPGA array compose of 256 FPGAs is estimate at 607W because of the approximate expression in Fig. 14. Therefore, the estimation of effective performance par watt is 0.944GFlop/sW. V. RELATED WORK The many of works that stencil computation is optimize for multi-core processors an GPU have been reporte. Augustin et al.[6] reports that they execute stencil computation by using Intel Xeon E5220 qua-core processor running at 2.26GHz. Single core of the processor achieves 2.8GFlop/s, just 31% of the peak performance. Moreover, two E5220 processors achieve 15.9GFlop/s for 8 cores, 21.8% of the peak. Phillips et al.[7] reports that they execute stencil computation by using NVIDIA TESLA C1060 GPU. Then, single GPU achieves 51.2GFlop/s, 65.6% of the peak performance in ouble-precision arithmetic. This computation performance is reuce further by the GPU cluster. In the case of a gris, computation performance is 42.2% of the peak performance. Several stuies of esigning harware for stencil computation by using FPGA have been reporte [2][8]. [2] proposes harware for stencil computation that is compose of systolic array of programmable processing elements an implement prototype by using multiple FPGAs (ALTERA Staratix family). Sano et al. achieves performance scalability with a constant memory-banwith by implementing architecture applying pipeline scheuling metho that is propose for Cell Automata. However, this work is ifferent from our work in implementing architecture an type of FPGA. Sato et al. [8] implement circuits that calculate Poisson s equation by using FPGA array. VI. CONCLUSION This paper escribes a high performance stencil computing metho optimize for a 2D-mesh-connecte FPGA array. This paper also escribes implementation result of our propose metho. We showe that our propose architecture works correctly on the real 2D-mesh connecte FPGA array. We evelope a prototype system for estimation of power consumption without communication moules. This prototype system of a single FPGA with eight floating-point aers an eight floating-point multipliers archives 2.24GFlop/s in 0.16GHz operations with 2.37W power consumption. ACKNOWLEDGMENT This work is supporte in part by Core Research for Evolutional Science an Technology (CREST), JST. REFERENCES [1] Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leoni Oliker, Davi Patterson, John Shalf, an Katherine Yelick. Stencil computation optimization an auto-tuning on state-of-the-art multicore architectures. In Proceeings of the 2008 ACM/IEEE conference on Supercomputing, SC 08, pp. 4:1 4:12, Piscataway, NJ, USA, IEEE Press. [2] K. Sano, Y. Hatsua, an S. Yamamoto. Scalable streaming-array of simple soft-processors for stencil computations with constant memorybanwith. In Fiel-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pp , may [3] M. Shafiq, M. Pericas, R. e la Cruz, M. Araya-Polo, N. Navarro, an E. Ayguae. Exploiting memory customization in fpga for 3 stencil computations. In Fiel-Programmable Technology, FPT International Conference on, pp , ec [4] Kobayashi Ryohei, Sano Shintaro, Takamaea-Yamazaki Shinya, an Kise Kenji. High performance stencil computation on mesh connecte fpga arrays. In Transactions on Symposium on Avance Computing Systems an Infrastructures, Vol. 2012, pp , may [5] Shinya Takamaea-Yamazaki, Shintaro Sano, Yoshito Sakaguchi, Naoki Fujiea, an Kenji Kise. In International Symposium on Applie Reconfigurable Computing (ARC 2012), March [6] Werner Augustin, Vincent Heuveline, an Jan-Philipp Weiss. Optimize stencil computation using in-place calculation on moern multicore systems. In Proceeings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par 09, pp , Berlin, Heielberg, Springer-Verlag. [7] E.H. Phillips an M. Fatica. Implementing the himeno benchmark with cua on gpu clusters. In Parallel Distribute Processing (IPDPS), 2010 IEEE International Symposium on, pp. 1 10, april [8] SATO Kazuki, JIANG Li, TAKAHASHI Kenichi, TAMUKOH Hakaru, KOBAYASHI Yuichi, an SEKINE Masatoshi. Performance evaluation of poisson equation an cip metho implemente on fpga array. IEICE technical report. Circuits an systems, Vol. 109, No. 396, pp ,

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Implementation an Evaluation of AS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Kazuya Matsumoto 1, orihisa Fujita 2, Toshihiro Hanawa 3, an Taisuke Boku 1,2 1 Center for Computational

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation Michael O Boyle mob@inf.e.ac.uk Room 1.06 January, 2014 1 Two recommene books for the course Recommene texts Engineering a Compiler Engineering a Compiler by K. D. Cooper an L. Torczon.

More information

MODULE VII. Emerging Technologies

MODULE VII. Emerging Technologies MODULE VII Emerging Technologies Computer Networks an Internets -- Moule 7 1 Spring, 2014 Copyright 2014. All rights reserve. Topics Software Define Networking The Internet Of Things Other trens in networking

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Kinematic Analysis of a Family of 3R Manipulators

Kinematic Analysis of a Family of 3R Manipulators Kinematic Analysis of a Family of R Manipulators Maher Baili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S. 6597 1, rue e la Noë, BP 92101,

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks Architecture Design of Mobile Access Coorinate Wireless Sensor Networks Mai Abelhakim 1 Leonar E. Lightfoot Jian Ren 1 Tongtong Li 1 1 Department of Electrical & Computer Engineering, Michigan State University,

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

A SoPC design of a real-time high-definition stereo matching algorithm based on SAD

A SoPC design of a real-time high-definition stereo matching algorithm based on SAD A SoPC esign of a real-time high-efinition stereo matching algorithm base on SAD Abstract Xiang Zhang *, Huaixiang Zhang, Yifan Wu School of Computer, Hangzhou Dianzi University, Hangzhou 310018, China

More information

I DT MC. Operating Manual SINAMICS S120. Verification of Performance Level e in accordance with EN ISO

I DT MC. Operating Manual SINAMICS S120. Verification of Performance Level e in accordance with EN ISO I DT MC Operating Manual SINAMICS S20 Verification of Performance Level e in accorance with EN ISO 3849- Document Project Status: release Organization: I DT MC Baseline:.2 Location: Erl. F80 Date: 24.09.2009

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Recitation Caches and Blocking. 4 March 2019

Recitation Caches and Blocking. 4 March 2019 15-213 Recitation Caches an Blocking 4 March 2019 Agena Reminers Revisiting Cache Lab Caching Review Blocking to reuce cache misses Cache alignment Reminers Due Dates Cache Lab (Thursay 3/7) Miterm Exam

More information

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency Software Reliability Moeling an Cost Estimation Incorporating esting-effort an Efficiency Chin-Yu Huang, Jung-Hua Lo, Sy-Yen Kuo, an Michael R. Lyu -+ Department of Electrical Engineering Computer Science

More information

Finite Automata Implementations Considering CPU Cache J. Holub

Finite Automata Implementations Considering CPU Cache J. Holub Finite Automata Implementations Consiering CPU Cache J. Holub The finite automata are mathematical moels for finite state systems. More general finite automaton is the noneterministic finite automaton

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 0 0 0 0 0 0 0 0 on-uniform Sensor Deployment in Mobile Wireless Sensor etworks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University Boca Raton,

More information

Midterm Exam. ECE 448 Spring Wednesday Section. (15 points)

Midterm Exam. ECE 448 Spring Wednesday Section. (15 points) Miterm Exam ECE 448 Spring 2 Wenesay Section (5 points) Instructions: Please rea this entire ocument carefully before beginning! Zip all your eliverables into an archive .zip an submit it through

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Demystifying Automata Processing: GPUs, FPGAs or Micron s AP?

Demystifying Automata Processing: GPUs, FPGAs or Micron s AP? Demystifying Automata Processing: GPUs, FPGAs or Micron s AP? Marziyeh Nourian 1,3, Xiang Wang 1, Xiaoong Yu 2, Wu-chun Feng 2, Michela Becchi 1,3 1,3 Department of Electrical an Computer Engineering,

More information

Uninformed search methods

Uninformed search methods CS 1571 Introuction to AI Lecture 4 Uninforme search methos Milos Hauskrecht milos@cs.pitt.eu 539 Sennott Square Announcements Homework assignment 1 is out Due on Thursay, September 11, 014 before the

More information

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2. JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 13/009, ISSN 164-6037 Krzysztof WRÓBEL, Rafał DOROZ * fingerprint, reference point, IPAN99 NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES

More information

International IEEE Symposium on Field-Programmable Custom Computing Machines

International IEEE Symposium on Field-Programmable Custom Computing Machines - International IEEE Symposium on ield-programmable Custom Computing Machines Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Bandwidth Kentaro Sano Yoshiaki Hatsuda

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

A Duality Based Approach for Realtime TV-L 1 Optical Flow

A Duality Based Approach for Realtime TV-L 1 Optical Flow A Duality Base Approach for Realtime TV-L 1 Optical Flow C. Zach 1, T. Pock 2, an H. Bischof 2 1 VRVis Research Center 2 Institute for Computer Graphics an Vision, TU Graz Abstract. Variational methos

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks Coorinating Distribute Algorithms for Feature Extraction Offloaing in Multi-Camera Visual Sensor Networks Emil Eriksson, György Dán, Viktoria Foor School of Electrical Engineering, KTH Royal Institute

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Two Dimensional-IP Routing

Two Dimensional-IP Routing Two Dimensional-IP Routing Mingwei Xu Shu Yang Dan Wang Hong Kong Polytechnic University Jianping Wu Abstract Traitional IP networks use single-path routing, an make forwaring ecisions base on estination

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks sensors Article EDOVE: Energy an Depth Variance-Base Opportunistic Voi Avoiance Scheme for Unerwater Acoustic Sensor Networks Safar Hussain Bouk 1, *, Sye Hassan Ahme 2, Kyung-Joon Park 1 an Yongsoon Eun

More information

Baring it all to Software: The Raw Machine

Baring it all to Software: The Raw Machine Baring it all to Software: The Raw Machine Elliot Waingol, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Srikrishna Devabhaktuni, Rajeev Barua, Jonathan Babb,

More information

Scalable Deterministic Scheduling for WDM Slot Switching Xhaul with Zero-Jitter

Scalable Deterministic Scheduling for WDM Slot Switching Xhaul with Zero-Jitter FDL sel. VOA SOA 100 Regular papers ONDM 2018 Scalable Deterministic Scheuling for WDM Slot Switching Xhaul with Zero-Jitter Bogan Uscumlic 1, Dominique Chiaroni 1, Brice Leclerc 1, Thierry Zami 2, Annie

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Evolutionary Optimisation Methods for Template Based Image Registration

Evolutionary Optimisation Methods for Template Based Image Registration Evolutionary Optimisation Methos for Template Base Image Registration Lukasz A Machowski, Tshilizi Marwala School of Electrical an Information Engineering University of Witwatersran, Johannesburg, South

More information

Dual Arm Robot Research Report

Dual Arm Robot Research Report Dual Arm Robot Research Report Analytical Inverse Kinematics Solution for Moularize Dual-Arm Robot With offset at shouler an wrist Motivation an Abstract Generally, an inustrial manipulator such as PUMA

More information

Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Based on Complexity

Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Based on Complexity Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Base on Complexity Xiangao Huang 1 Wei Huang 2 Xiaozhou Liu 3 Chao Wang 4 Zhu jing Wang 5 Tao Wang 1 1 College of Engineering, Shantou

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal 991 Investigation into a new incremental forming process using an ajustable punch set for the manufacture of a oubly curve sheet metal S J Yoon an D Y Yang* Department of Mechanical Engineering, Korea

More information

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University Dierential I DDQ Testable Static RAM Architecture Walee K. Al-Assai Anura P. Jayasumana Yashwant K. Malaiya y Technical Report CS-96-102 February 1996 Department of Electrical Engineering/ y Department

More information

PART 2. Organization Of An Operating System

PART 2. Organization Of An Operating System PART 2 Organization Of An Operating System CS 503 - PART 2 1 2010 Services An OS Supplies Support for concurrent execution Facilities for process synchronization Inter-process communication mechanisms

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Advanced method of NC programming for 5-axis machining

Advanced method of NC programming for 5-axis machining Available online at www.scienceirect.com Proceia CIRP (0 ) 0 07 5 th CIRP Conference on High Performance Cutting 0 Avance metho of NC programming for 5-axis machining Sergej N. Grigoriev a, A.A. Kutin

More information

Efficient Recovery from False State in Distributed Routing Algorithms

Efficient Recovery from False State in Distributed Routing Algorithms Efficient Recovery from False State in Distribute Routing Algorithms Daniel Gyllstrom, Suarshan Vasuevan, Jim Kurose, Gerome Milau Department of Computer Science University of Massachusetts Amherst {pg,

More information

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body International Engineering Mathematics Volume 04, Article ID 46593, 7 pages http://x.oi.org/0.55/04/46593 Research Article Invisci Uniform Shear Flow past a Smooth Concave Boy Abullah Mura Department of

More information

Using the disparity space to compute occupancy grids from stereo-vision

Using the disparity space to compute occupancy grids from stereo-vision The 2010 IEEE/RSJ International Conference on Intelligent Robots an Systems October 18-22, 2010, Taipei, Taiwan Using the isparity space to compute occupancy gris from stereo-vision Mathias Perrollaz,

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Experion PKS R500 Migration Planning Guide

Experion PKS R500 Migration Planning Guide Experion PKS R500 Migration Planning Guie EPDOC-XX70-en-500E May 2018 Release 500 Document Release Issue Date EPDOC-XX70- en-500e 500 0 May 2018 Disclaimer This ocument contains Honeywell proprietary information.

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA Arash Nosrat Faculty of Engineering Shahid Chamran University Ahvaz, Iran Yousef S. Kavian

More information

Stereo Vision-based Subpixel Level Free Space Boundary Detection Using Modified u-disparity and Preview Dynamic Programming

Stereo Vision-based Subpixel Level Free Space Boundary Detection Using Modified u-disparity and Preview Dynamic Programming 2015 IEEE Intelligent Vehicles Symposium (IV) June 28 - July 1, 2015. COEX, Seoul, Korea Stereo Vision-base Subpixel Level Free Space Bounary Detection Using Moifie u-isparity an Preview Dynamic Programming

More information

Optimal Oblivious Path Selection on the Mesh

Optimal Oblivious Path Selection on the Mesh Optimal Oblivious Path Selection on the Mesh Costas Busch Malik Magon-Ismail Jing Xi Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 280, USA {buschc,magon,xij2}@cs.rpi.eu Abstract

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Aitional Divie an Conquer Algorithms Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Divie an Conquer Closest Pair Let s revisit the closest pair problem. Last

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding .. ou Can Do That Unit Computer Organization Design of a imple Clou & Distribute Computing (CyberPhysical, bases, Mining,etc.) Applications (AI, Robotics, Graphics, Mobile) ystems & Networking (Embee ystems,

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 04 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for

More information

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Jian Zhao, Chuan Wu The University of Hong Kong {jzhao,cwu}@cs.hku.hk Abstract Peer-to-peer (P2P) technology is popularly

More information

Parts Assembly by Throwing Manipulation with a One-Joint Arm

Parts Assembly by Throwing Manipulation with a One-Joint Arm 21 IEEE/RSJ International Conference on Intelligent Robots an Systems, Taipei, Taiwan, October, 21. Parts Assembly by Throwing Manipulation with a One-Joint Arm Hieyuki Miyashita, Tasuku Yamawaki an Masahito

More information

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications Top-own Connectivity Policy Framework for Mobile Peer-to-Peer Applications Otso Kassinen Mika Ylianttila Junzhao Sun Jussi Ala-Kurikka MeiaTeam Department of Electrical an Information Engineering University

More information

arxiv: v2 [cs.dc] 8 Feb 2018

arxiv: v2 [cs.dc] 8 Feb 2018 SEVENTH FRAMEWORK PROGRAMME THEME ICT-2013.3.4 Avance Computing, Embee an Control Systems arxiv:1801.08761v2 [cs.dc] 8 Feb 2018 Execution Moels for Energy-Efficient Computing Systems Project ID: 611183

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Identifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning

Identifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning Identifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning Yukinori Sato (JAIST / JST CREST) Hiroko Midorikawa (Seikei Univ. / JST CREST) Toshio Endo (TITECH / JST CREST)

More information