The Tofu Interconnect D

Size: px
Start display at page:

Download "The Tofu Interconnect D"

Transcription

1 2018 IEEE International Conferene on Cluster Computing The Tofu Interonnet D Yuihiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouihi Hirai, Toshiyuki Shimizu Next Generation Tehnial Computing Unit Fujitsu Limited Kawasaki, Japan {aji, t-kawashima, tokamoto, shidax, k-hirai, t.shimizu}@jp.fujitsu.om Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji Uhida, Tomohiro Inoue AI Platform Business Unit Fujitsu Limited Kawasaki, Japan {hiramoto.shinya, ikeda.yoshir-02, yoshikawa.takah, k_uhida, inoue.tomohiro}@jp.fujitsu.om Abstrat In this paper, we introdue a new highly salable interonnet alled Tofu interonnet D that will be used in the post-k mahine. This mahine will offiially be operational around The letter D represents high density node and dynami paket sliing for dual-rail transfer. Herein we desribe the design and the evaluation results of TofuD. Due to the high-density pakaging, the optial link ratio of TofuD has dereased to 25% from the 66% optial link ratio of Tofu2. TofuD applies a new tehnique alled dynami paket sliing to redue lateny and to improve fault resiliene. The evaluation results show that the one-way 8-byte Put lateny is 0.49 μs. This is 31% lower than the lateny of Tofu2. The injetion rate per node is 38.1 GB/s whih is approximately 83% of the injetion rate of Tofu2. The link effiieny is as high as approximately 93%. Keywords high-performane omputing, interonnet, highdensity pakaging, fault resiliene I. INTRODUCTION The Tofu interonnet family is a group of system interonnets for highly salable HPC systems developed by Fujitsu. The Tofu Interonnet D (TofuD) is a new member to this family and designed for used in the post-k mahine [1] that will be operational around Tofu stands for torus fusion that represents the designed ombination of dimensions with an independent onfiguration and a routing algorithm. The letter D represents high density node and dynami paket sliing for dual-rail transfer. In this paper, we desribe the design overview, speifiation, and evaluation results of TofuD. The design overview inludes the new node onfiguration that inorporates the high-density memory pakaging tehnology, the optimizations for the inreasing number of non-uniform memory aess (NUMA) domains, and a new paket transfer tehnique that redues lateny and improves resiliene. Setion II explains the bakground of this work. Setion III presents related work. Setion IV introdues the design of TofuD, and Setion V presents the results of performane evaluation. Setion VI onludes this paper. II. BACKGROUND A. Tofu Interonnet The Tofu interonnet [2][3] was developed for the K omputer [4] that beame operational in The 6D mesh/torus network of Tofu ahieved high salability of 82,944 ompute nodes, and the virtual 3D torus rank mapping sheme provided both high availability and topology-aware programmability. Tofu was also used in the PRIMEHPC FX10 system whih doubled the number of proessor ores per node to sixteen from eight of the K omputer. A node address in the physial 6D network is represented by six-dimensional oordinates X, Y, Z, A, B, and C. The A and C oordinates an be 0 or 1, and the B oordinate an be 0, 1, or 2. The range of the X, Y, and Z oordinates depends on the system size. Two nodes whose oordinates are different by 1 in one axis and idential in the other five axes are adjaent and are onneted to eah other. When a ertain axis is onfigured as a torus, the node with oordinate 0 in the axis and the node with the maximum oordinate value are onneted to eah other. The A- and C-axes are fixed to the mesh onfiguration and the B-axis is fixed to the torus onfiguration. Eah node has 10 ports for the 6D mesh/torus network. Eah of the X-, Y-, Z-, and B-axes uses two ports, and eah of the A- and C-axes use one port. Eah link provided 5.0 GB/s peak throughput. Eah link had 8 lanes of high-speed differential I/O signals at a Gbps data rate. Tofu was implemented as an interonnet ontroller (ICC) hip with 80 lanes of signals for the network. All links were eletri, and there was no optial link in the original Tofu interonnet. Eah node had four Tofu network interfaes (TNIs) so that four data were simultaneously transmitted to four independent diretions and four data were reeived from four independent diretions. The injetion bandwidth per node was 20 GB/s. The total injetion bandwidth (whih yields the theoretial peak performane of the nearest neighbor data exhange) of the K omputer was 1.66 PB/s. The bisetion bandwidth (whih yields the theoretial peak performane of global data exhange) of the K omputer was 46.1 TB/s for the physial mesh and the torus network, or 34.6 TB/s for the virtual torus network. In a large torus network, /18/$ IEEE DOI /CLUSTER

2 there are performane differenes of one to two orders of magnitude depending on the ommuniation pattern; therefore topology-aware tuning of appliations is important. The TNI provided the ommuniation funtion of remote diret memory aess (RDMA) Put/Get, system paket, and Tofu barrier. The system paket was used for system ontrol and IP ommuniation. The Tofu barrier handles multiple stages of ommuniation for barrier synhronization with hardware that is unaffeted by OS jitter that severely deteriorates the lateny when software handles the ommuniation. Barrier gate (BG) is a hard-wired module that synhronously ommuniates with other BGs. Speifially, eah BG waits for signals from up to two preset BGs, and then transmits signals to up to two other preset BGs. There are two types of BG, start-and-end point and relay point. Eah startand-end point BG is fixedly assoiated with an interfae alled a barrier hannel (BCH). The MPI library alloates these ommuniation resoures at the reation of eah ommuniator. The redue-broadast tree algorithm onsumes one BCH and five BGs, or the reursive-doubling algorithm onsumes one BCH and log2(n) BGs. A BG an perform the redution operation so that the Tofu barrier an perform all-redue olletive ommuniation that is limited to one element. In Tofu, the Tofu barrier was available only on TNI number 0 and there were 8 BCHs and 64 BGs; 8 BGs were for start-and-end points and 56 BGs were for relay points. Therefore, up to eight ommuniators per node ould simultaneously use the Tofu barrier. When there were multiple proesses on a node, the intra-node proesses were synhronized by software and the representative proess used a BCH for the inter-node synhronization. B. Tofu Interonnet 2 The next version Tofu interonnet 2 (Tofu2) [5][6] was designed for the PRIMEHPC FX100 system launhed in Eah node of FX100 had eight pakages of hybrid memory ube (HMC) that ontained a stak of memory die. In ontrast, eah node of the K omputer and FX10 had eight inline memory modules that had been used over 30 years. This transition from a wide memory module to a small memory pakage redued the node footprint of FX100. To redue the node footprint further, the Tofu2 implementation also shifted to proessor hip integration from the independent ICC hip of Tofu. Considering the balane with 128 olloated signal lanes for memory on the proessor hip, Tofu2 halved the number of signal lanes to 40 from the 80 signal lanes of Tofu. To ompensate for halving the number of signal lanes, Tofu2 signifiantly improved the data rate of the signals from 6.25-Gbps to Gbps by introduing optial links. The link bandwidth and the injetion bandwidth per node were inreased to 12.5 GB/s and 50 GB/s, respetively. In the ommuniation funtion of Tofu2, the following features were extended; RDMA atomi read modify write, triggered ommuniation (alled session mode for nonbloking olletive ommuniation), and RDMA for system use. In FX100, not only the number of ompute ores were inreased to 32, but the reommended number of user proesses in a node was also inreased from 1 to 2 beause two NUMA domains alled ore-memory groups (CMGs) were introdued on a hip. Therefore, the number of RDMA ommuniation resoures alled ontrol queues (CQs) was required to be inreased to alloate dediated CQ to eah user proess. In Tofu, eah TNI had three CQs and one out of the three CQs was fixed for system use. For one or two user proesses per node, eah proess was assigned one dediated CQ per TNI and the MPI ommuniation library internally used four CQs simultaneously. When the number of proesses per node exeeded two, the total number of assigned CQs for eah proess dereased. When the number of proesses per node exeeded eight, CQs were shared by multiple proesses. In Tofu2, the number of CQs per TNI inreased from 3 to 12 to avoid shared CQ even if the number of proesses per node was 32. C. The Post-K Computer The post-k omputer is a system developed to replae the K omputer and will start operating around The post-k omputer is designed to take full advantage of the assets of the K omputer suh as appliations, users, tools, system operational knowledge, and the faility. The post-k is required not only to expand appliation domains, but also to signifiantly improve appliation performane, speifially up to 100 times or more than that on the K. Fujitsu ooperates with the asset holder RIKEN and develops leading edge tehnologies of FX100 to onstrut the post-k mahine. III. RELATED WORK This setion desribes the system interonnets used in the reent world-lass systems other than the Tofu interonnet family. All systems have the same level of bisetion bandwidth whih represents the theoretial peak performane of global data exhange. On the other hand, the total injetion bandwidth signifiantly differs depending on the type of network topology. Some systems have a total injetion bandwidth lose or equal to their own bisetion bandwidth and the other systems have a total injetion bandwidth muh higher than their own bisetion bandwidth. A. InfiniBand TM InfiniBand TM (IB) [7] is a standard speifiation of interonnet defined by the InfiniBand Trade Assoiation. IB produts have been widely used to build HPC lusters. The network interfae is alled host hannel adapter (HCA) and an ordinary HCA is implemented as a disrete hip and mounted on an adaptor ard. An ordinary IB network is onstruted by using swith boxes. Construting an interonnetion network with independent omponents suh as adapter ards and swith boxes is disadvantageous in terms of pakaging density and power onsumption. However, there is the advantage in the flexibility of onfiguration. For example, a node onfiguration that has an inreased number of HCAs enhanes injetion bandwidth and aelerates ommuniation intensive appliations. In the other example, the network onfiguration alled a full-bisetion bandwidth fat-tree, of whih the 647

3 bisetion bandwidth is equivalent to the total injetion bandwidth, suppresses variation in the exeution time of appliations not optimized for the network topology. Mellanox s dual-rail EDR IB HCA will be used in the Summit system [8] whih will start full operation in The injetion bandwidth per node is 25 GB/s. The total injetion or bisetion bandwidth will be approximately 115 TB/s. The TaihuLight system, whih started operation in 2016, also used Mellanox s IB HCAs and swith hips [9]. The Sunway network of TaihuLight was onstruted as a four-stage tapered fat-tree. The total injetion bandwidth was 512 TB/s and the bisetion bandwidth was approximately 70 TB/s. There was a rare example of IB HCA integration. Orale s Sonoma proessor [10] was designed for high-density sale-out servers and there were two built-in HCAs on a hip. The injetion bandwidth per node was 13.6 GB/s. B. Omni-Path Omni-Path [11] is Intel s HPC interonnet family. In the first generation, the host fabri interfae (HFI) is implemented as a disrete hip and mounted on an adaptor ard or integrated into a CPU pakage. Omni-Path is onsidered likely to be used in the future Aurora system [12]. The first-generation Omni- Path was used in the Oakforest-PACS system that beame operational in The injetion bandwidth per node was 12.5 GB/s. The total injetion or bisetion bandwidth was TB/s. C. Aries Interonnet The Aries interonnet [13] developed by Cray is a highly salable system interonnet that employs a Dragonfly-based topology. The network interfae and the router were implemented together in a disrete hip. Eah Aries hip had four network interfaes and onneted four nodes. Eah network interfae had two ports to onnet the internal router port. Eah router port operated at a link throughput of 4.7 GB/s for global links or 5.25 GB/s in a group of 384 nodes. Therefore, the injetion bandwidth per node was 10.5 GB/s. The upgraded Piz Daint system that started operation in 2016 used Aries. The total injetion bandwidth and the bisetion bandwidth were 71 TB/s and 36 TB/s respetively. D. Blue Gene/Q Five-dimensional Torus IBM Blue Gene/Q (BG/Q) was a highly salable superomputer that had a five-dimensional torus network [14][15]. Eah node has 10 links for the torus network and eah link provides 2.0 GB/s peak throughput. The injetion bandwidth per node was 20 GB/s. The Sequoia system that started lassified operations in 2013 was a BG/Q system with 98,304 nodes. The total injetion bandwidth was 1.97 PB/s and the bisetion bandwidth was 49.2 TB/s. The harateristis and performane of the BG/Q five-dimensional torus network were similar to those of the 6D mesh/torus network of the Tofu interonnet. IV. DESIGN OF TOFUD This setion desribes the design of TofuD fousing on the differene ompared to Tofu2. A. Node Configuration Figure 1 shows a blok diagram of the post-k omputer node. The number of CMGs inreased to four from two of Tofu2, and the number of TNIs also inreased from four to six. The CMGs and the TNIs are onneted by the network on hip (NOC). As the number of CMGs inreases, there is a differene in the distane between TNIs and eah CMG. Two CMGs are far from TNIs, and the other two CMGs are near TNIs. Figure 2 shows a prototype CMU. Two proessor pakages and three able ages are ooled by water. One ompute node onsists of one pakage in whih one proessor hip and four staks of high bandwidth memory (HBM) are integrated. As a trade-off with the use of the high-density memory pakaging tehnology, the number of memory staks per node has halved from FX100 that used eight pakages of HMC. In order to balane with the halved number of memory staks, the TofuD again halved the number of signal lanes to 20 from 40 of Tofu2. To redue the hardware ost, the TofuD uses mainstream quad-lane ative optial ables. Half of the CMUs in a shelf onnet two optial ables of the X- and Y-axes, and the other half onnet three optial ables of the X-, Y-, and Z-axes. Eah ative optial able is shared by two links in the same diretion of two ompute nodes on the same CMU. Although the number of signals for eah ative optial able is one-third of that of the board-mount optial assembly used in Tofu2, the number of optial modules on the board redues to 2.5 from 8 of FX100 owing to the redutions in the optial link ratio, number of high-speed signals per node, and number of nodes per board. CMG Memory Memory CMG NOC CMG Memory Memory CMG PCIe Controller TNI0 TNI1 TNI2 TNI3 TNI4 TNI5 Tofu Network Router X+ X- Y+ Y- Z+ Z- A B+ B- C Fig. 1. Blok diagram of the post-k omputer node 648

4 Fig. 2. Prototype CPU memory unit B. Pakage Struture and Link Configuration In a rak of the post-k omputer, eah of the upper and lower halves of the rak houses 192 nodes with the geometry (X, Y, Z, A, B, C) = (2, 2, 4, 2, 3, 2). Eah half rak aommodates four building bloks alled shelves, two in the front-side and two in the rear-side. The geometry of a shelf is (X, Y, Z, A, B, C) = (1, 1, 4, 2, 3, 2). Figure 3 shows a prototype rak of the post-k omputer. Eah side of the rak stores four shelves vertially. Eah shelf houses 24 CPU memory units (CMUs) that loads two nodes onneted in C- axis. All onnetions in a half rak use eletri links and the onnetions out of a half rak use optial links. Therefore, half of the onnetions in the X- and Y-axes and one fourth of the onnetions in the Z-axis use optial links. Beause of the high-density pakaging and large struture of the half rak, the optial link ratio of the TofuD is as low as 25%, whih has substantially dereased from 66% for Tofu2 that used optial links for onnetion out of a 2U hassis with the geometry (X, Y, Z, A, B, C) = (1, 1, 3, 2, 1, 2). Fig. 3. Prototype rak of the post-k omputer C. Injetion Rate per Node Table I shows the omparison of node and link onfigurations within the Tofu family. TofuD uses a highspeed signal of 28-Gbps data rate that is approximately 9% faster than that of Tofu2. However, due to the redution of the number of signals, TofuD redues the link bandwidth to 6.8 GB/s, whih is approximately 54% for Tofu2. To ompensate the redution in the link bandwidth, TofuD inreases the number of simultaneous ommuniations from 4 of Tofu2 to 6. The injetion rate of TofuD is enhaned to approximately 80% of that of Tofu2. There are six adjaent nodes in the virtual 3D torus therefore topology-aware algorithms an use six simultaneous ommuniations effetively. The logi iruits of TofuD operate at a 425-MHz lok frequeny, whih is about 9% faster than the lok frequeny of Tofu2. The width of the datapath dereases from 256 to 128 bits as the number of signal lanes dereased. TABLE I. DATA RATES OF SIGNAL AND INJECTION RATES Tofu Tofu2 TofuD Number of signal lanes per node Data rate (Gbps) Link bandwidth (GB/s) Number of TNIs per node Injetion bandwidth per node (GB/s)

5 D. Communiation Resoures TABLE II shows a omparison of the number of ommuniation resoures within the Tofu family. Both the number of ompute ores and the number of TNIs per node inreased by 1.5 times from Tofu2, and the number of CQs per TNI remained onstant at 12. In Tofu2, there was no hange in the Tofu barrier. In TofuD, the amount of ommuniation resoures for the Tofu barrier has inreased as the number of CMGs has inreased. To alloate a BCH from a different TNI to eah CMG, the Tofu barrier beomes available on all TNIs in TofuD, and the number of resoures per node inreased signifiantly for both BCH and BG. The ratio of the BCH to BG inreased from 1:8 to 1:3 beause the redue-broadast tree algorithm for the intra-node part of synhronization is assumed to redue the number of BGs to be used. The buffer size of eah BG is also expanded so that the Tofu barrier an perform all-redue of eight integer or three floating point elements with one synhronization. TABLE II. NUMA DOMAIN AND COMMUNICATION RESOURCES Tofu Tofu2 TofuD Number of ompute ores per node 8, Number of CMGs per node Number of TNIs per node Number of CQs per node Number of BCHs per node Number of BGs per node E. Dynami Paket Sliing for Dual-rail Transfer The physial oding sublayer (PCS) of Tofu2 was developed based on the 100Gb Ethernet tehnology. The paket transfer lateny of Tofu2 was inreased to approximately 0.3 μs from approximately 0.1 μs for Tofu beause of the omplex transmission tehnology inluding enoding, symbol detetion, multi-lane distribution, and laneto-lane deskew. In Tofu2, there was another issue in the faulttolerane feature as follows. Tofu2 introdued the link degradation feature that redued the number of ative lanes without losing a paket. However, one the link degraded, the number of lanes never reovered; therefore, there is no fault resiliene. To address these issues, TofuD applies a new tehnique alled dynami paket sliing for dual-rail transfer. To address the lateny issue, TofuD implements independent PCS for eah signal lane and splits a paket in the data-link layer. To address the fault-resiliene issue, TofuD dupliates a paket and redundantly transfers it in both lanes as opposed to reduing the number of ative lanes. The data link layer adds information to the paket, indiating that the paket has been split or dupliated. The data link layer monitors the reeiverside PCS s detetion frequenies of CRC and other transmission errors and adds the transmission quality status information to the paket as well. The data link layer determines the split mode of the paket, depending on the reeived transmission quality status information. Figure 4 shows the frame format that inludes a routing header, a transport layer paket (TLP), and padding spae for a data link layer paket (DLLP). First, the data link layer stores a DLLP to the frame. Next, the data link layer simultaneously generates two slies from the frame. The routing header is dupliated to the two slies, TLP and DLLP are split or dupliated, and the padding is removed. Finally, the two slies are distributed to two PCSs and eah PCS adds a preamble, a CRC ode alled FCS, and inter-frame gap to the slie. Figure 5 shows the undivided slie format that inludes a routing header, full TLP, full DLLP, and ontrol odes to envelop the payload. Figure 6 shows the divided slie formats that inludes a routing header, a split TLP, a split DLLP, and ontrol odes to envelop the payload. The PAT field in a slie indiates the pattern of paket splitting, and the STAT field indiates the status of the observed transmission quality. The PAT field is defined as a 3-bit width field for future expansion to quad-lane routing header 1 LEN DABC1 DX DY DZ DABC2 DI B S 0 VC TLP +0 TLP +1 TLP +2 TLP +3 TLP +4 TLP +5 TLP +6 TLP +7 TLP +8 TLP +9 TLP +10 TLP +11 TLP +12 TLP +13 TLP +14 TLP +15 transport layer (padding) TLP TLP +(32LEN+16) TLP +(32LEN+17) TLP +(32LEN+18) TLP +(32LEN+19) TLP +(32LEN+20) TLP +(32LEN+21) TLP +(32LEN+22) TLP +(32LEN+23) TLP +(32LEN+24) TLP +(32LEN+25) TLP +(32LEN+26) TLP +(32LEN+27) TLP +(32LEN+28) TLP +(32LEN+29) TLP +(32LEN+30) TLP +(32LEN+31) (data link layer) F F Fig. 4. Frame format 650

6 preamble routing header 1 LEN DABC1 DX DY DZ DABC2 DI B S 0 VC PAT STAT SEQ TLP +0 TLP +1 TLP +2 TLP +3 TLP +4 TLP +5 TLP +6 TLP +7 TLP +8 TLP +9 TLP +10 TLP +11 TLP +12 TLP +13 TLP +14 TLP +15 transport layer TLP TLP +(32LEN+16) TLP +(32LEN+17) TLP +(32LEN+18) TLP +(32LEN+19) TLP +(32LEN+20) TLP +(32LEN+21) TLP +(32LEN+22) TLP +(32LEN+23) TLP +(32LEN+24) TLP +(32LEN+25) TLP +(32LEN+26) TLP +(32LEN+27) TLP +(32LEN+28) TLP +(32LEN+29) TLP +(32LEN+30) TLP +(32LEN+31) DLLP +0 DLLP +1 DLLP +2 DLLP +3 other ontrol +0 other ontrol +1 other ontrol +2 other ontrol +3 DLLP +4 DLLP +5 DLLP +6 DLLP +7 other ontrol +4 other ontrol +5 other ontrol +6 other ontrol +7 data link layer F DLLP +8 DLLP +9 DLLP +10 DLLP F DLLP +12 DLLP +13 DLLP +14 DLLP +15 FCS inter-frame gap Fig. 5. Undivided slie format for the dupliate-mode preamble routing header 1 LEN DABC1 DX DY DZ DABC2 DI B S 0 VC PAT STAT SEQ TLP +0 TLP +1 TLP +2 TLP +3 TLP +4 TLP +5 TLP +6 TLP +7 TLP +16 TLP +17 TLP +18 TLP +19 TLP +20 TLP +21 TLP +22 TLP +23 transport layer TLP +(32LEN) TLP +(32LEN+1) TLP +(32LEN+2) TLP +(32LEN+3) TLP +(32LEN+4) TLP +(32LEN+5) TLP +(32LEN+6) TLP +(32LEN+7) TLP +(32LEN+16) TLP +(32LEN+17) TLP +(32LEN+18) TLP +(32LEN+19) TLP +(32LEN+20) TLP +(32LEN+21) TLP +(32LEN+22) TLP +(32LEN+23) data link layer DLLP +0 DLLP +1 DLLP +2 DLLP +3 other ontrol +0 other ontrol +1 other ontrol +2 other ontrol +3 F DLLP +8 DLLP +9 DLLP +10 DLLP +11 FCS inter-frame gap preamble routing header 1 LEN DABC1 DX DY DZ DABC2 DI B S 0 VC PAT STAT SEQ TLP +8 TLP +9 TLP +10 TLP +11 TLP +12 TLP +13 TLP +14 TLP +15 TLP +24 TLP +25 TLP +26 TLP +27 TLP +28 TLP +29 TLP +30 TLP +31 transport layer TLP +(32LEN+8) TLP +(32LEN+9) TLP +(32LEN+10) TLP +(32LEN+11) TLP +(32LEN+12) TLP +(32LEN+13) TLP +(32LEN+14) TLP +(32LEN+15) TLP +(32LEN+24) TLP +(32LEN+25) TLP +(32LEN+26) TLP +(32LEN+27) TLP +(32LEN+28) TLP +(32LEN+29) TLP +(32LEN+30) TLP +(32LEN+31) data link layer DLLP +4 DLLP +5 DLLP +6 DLLP +7 other ontrol +4 other ontrol +5 other ontrol +6 other ontrol +7 F DLLP +12 DLLP +13 DLLP +14 DLLP +15 FCS inter-frame gap Fig. 6. Divided slie format for the split-mode V. PERFORMANCE EVALUATION This setion gives early evaluation results of the fundamental performane of TofuD. A. Evaluation Environment The ommuniation performane of TofuD was evaluated by system-level logi simulations. The simulation models were built using the Verilog RTL odes for the prodution, and inluded multiple nodes. The simulations were performed on Cadene s hardware emulators. The simulated proessor ores exeuted the test programs that used the TofuD hardware diretly. The lateny results were measured diretly from the simulation waveforms; thus we obtained one-way latenies without halving average round-trip latenies. The throughput results were derived from the measured lateny values. For Tofu and Tofu2, the evaluation results of lateny breakdown were obtained from the simulation waveforms as well as TofuD. The other results of Tofu and Tofu2 were evaluated with atual mahines using the low-level ommuniation library. In these preliminary evaluations, the test programs inluded no ommuniation software stak suh as an MPI library; therefore, the evaluation results inluded no software overhead, and all test programs performed nearest-neighbor ommuniation. B. Lateny TABLE III shows the evaluated results of the latenies of Tofu, Tofu2, and TofuD. In eah evaluation, it is assumed that a Put transfer is exeuted between the nearest neighbor nodes on the same board, and the time from when the initiator 651

7 proess started the Put transfer to when the target proess read the data was measured. In Tofu, the diret desriptor feature redued the lateny by more than 0.2 μs. In Tofu2, the ahe injetion feature redued the lateny by nearly 0.2 μs. Both these redutions in Tofu and Tofu2 are the result of bypassing the main memory with the newly introdued features of the network interfae. In TofuD, the lateny is redued by approximately 0.2 μs again. Overall, the lateny has been redued by 46% from Tofu and 31% from Tofu2. The redution is mainly due to the overhauling of the transmission tehnology suh as the ompensation for signal skew, and reonsideration of the pipeline design of data-paths. There is an additional penalty of approximately 0.05 μs if the initiator proess runs on a far CMG in the initiator node and the target proess also runs on a far CMG in the target node. Although the differene is small in TofuD, the inreasing density and loality on the hip may impat the ommuniation lateny in future systems. Figure 7 presents the breakdowns of lateny of one-way and one-hop Put transfer. A lateny value for eah omponent was obtained from the simulation waveforms. In Tofu2, the paket transfer lateny through one link and two swithes was inreased by approximately 0.2 μs from Tofu due to the omplex PCS derived from 100 Gb Ethernet. The paket transfer lateny of TofuD ahieved nearly the same lateny as Tofu owing to the new dynami paket sliing tehnique. In TofuD, the part of the one-way Put lateny other than the paket transfer was almost the same as Tofu2. In total, approximately 0.2 μs of one-way Put lateny has been redued in TofuD ompared with Tofu2. C. Injetion Rate TABLE IV lists the evaluation results of injetion rates and effiienies of Tofu, Tofu2, and TofuD. In Tofu and Tofu2, four Put transfers in different diretions were simultaneously exeuted and total throughputs were evaluated. In TofuD, six Put transfers in different diretions were exeuted. The injetion rate of TofuD is more than two times higher than that of Tofu and 17% lower than that of Tofu2. The effiienies of Tofu are lower than that of a single Put transfer, beause Tofu was not integrated in the proessor hip, leading to a bottlenek in the bus that onnets the proessor hip and the interonnet ontroller hip. The relatively low effiienies are mainly beause of the paket size of the bus, whih inludes only one ahe line of data. lateny (nse) Tofu Tofu2 TofuD Fig. 7. Comparison of lateny breakdowns of one-way Put transfer TABLE IV. Rx CPU Rx Host bus Rx TNI Paket Transfer Tx TNI Tx Host bus Tx CPU INJECTION RATES AND EFFICIENCIES OF SIMULTANEOUS PUT TRANSFERS OF TOFU FAMILY Injetion rate [GB/s] Effiieny [%] Tofu (K) Tofu (FX10) Tofu TofuD Tofu2 and TofuD are integrated into the proessor hips and the effiienies of injetion rates are almost the same as that of the single Put transfer presented in the next subsetion. D. Throughput TABLE V shows the evaluated results of Put throughputs and the effiienies of Tofu, Tofu2, and TofuD. The throughput of TofuD is 33% faster than that of Tofu and 45% slower than that of Tofu2. The effiienies exeed 90% for all versions. These high effiienies are the distintive harateristis of the Tofu interonnet family, and are due to the rather large paket size for an HPC interonnet. Although a larger paket size is ostly in design, it also redues the software overheads of system-wide ommuniation protools suh as IP over Tofu. TABLE III. ONE-WAY 8-BYTE PUT LATENCIES BETWEEN NEAREST NEIGHBOR NODES OF TOFU FAMILY Communiation settings Lateny [μs] Tofu Desriptor on main memory 1.15 Diret Desriptor 0.91 Tofu2 Cahe injetion OFF 0.87 Cahe injetion ON 0.71 TofuD To/From far CMGs 0.54 To/From near CMGs 0.49 TABLE V. THROUGHPUTS OF PUT TRANSFER AND EFFICIENCIES OF THE TOFU FAMILY Throughput [GB/s] Effiieny [%] Tofu Tofu TofuD

8 The effiieny of Tofu2 is slightly lower than that of Tofu and TofuD. This mainly beause of the overhead of data alignment. Tofu and TofuD were implemented in 128-bit datapaths and the data alignment was 16 bytes. Tofu2 was implemented in 256-bit width and the alignment was 32 bytes. E. Intra-node Lateny of the Tofu Barrier The Tofu barrier is extended for intra-node use in TofuD. This subsetion presents the evaluated lateny results of the intra-node Tofu barrier. First, the lateny of eah omponent was evaluated from the waveform of a simple test that uses only one BCH and two BGs onneted in series. The lateny result of a BCH and a start-and-end BG was approximately 0.48 μs, and the lateny result of a relay BG was nearly 0.13 μs. Next, intra-node synhronization latenies using Tofu barrier were evaluated using the test programs. The number of BCHs to be synhronized varied from 4 to 48. If the number of BCHs exeeds the number of TNI, multiple BCHs were used in a TNI. The test programs used the redue-broadast tree algorithm for intra-tni synhronization and the reursive doubling algorithm for inter-tni synhronization. The total number of used BGs per node and the number of ommuniation stages for eah test program was shown in TABLE VI. In these test programs, one proess operated all BCHs; therefore, the deviation of the synhronization start time was small as ompared with the atual usage ondition in whih eah BCH is operated by a different proess. Figure 8 shows the evaluated results and the estimated latenies. The minimum latenies were estimated so that the lateny omponent of relay BGs inreased in proportion to the log2 of the number of BCHs. However, as the number of BCHs per TNI inreased beyond 1, the evaluation results beame worse than the estimated minimum latenies. The waveform result showed that all BCHs and BGs were serially proessed. The lateny of the BCH and the BG at the start point were overlapped between BCHs for 0.19 μs out of 0.48 μs and the remaining 0.29 μs were serialized. The estimated latenies of proessing the BG and the BCH serially were lose to the evaluation results. The evaluation results showed that there was the lateny penalty when alloating multiple BCHs from the same TNI to the same ommuniator. The MPI library should be implemented using the Tofu barrier avoiding this penalty as follows. If the number of proesses in a node does not exeed six, the MPI library should alloate one BCH to eah proess from different TNI. If the number of proesses in a node exeeds six, the MPI library should alloate one BCH to eah of six groups of proesses. Eah group of proesses share one BCH and synhronize within the group via memory. lateny (μse) TABLE VI. CONFIGURATIONS OF THE TEST PROGRAMS OF THE TOFU BARRIER Number of start-and-end points Number of TNIs Max. number of BCHs per TNI Max. number of BGs per TNI Number of ommuniation stages Estimated latenies assuming serialization Evaluated results from waveform Estimated minimum latenies number of BCHs per node Fig. 8. Estimated and evaluated results of the Tofu barrier test programs VI. CONCLUSION In this paper, we introdued a new and highly salable interonnet alled Tofu Interonnet D that will be used in the post-k mahine, whih will be operational around The letter D represents high density node and dynami paket sliing for dual-rail transfer. This paper desribed the design of TofuD inluding the pakage struture of the node, the rak, the link onfiguration between nodes, the injetion rate per node, inreased ommuniation resoures and a new paket transfer tehnique. This paper also presented the evaluation results of TofuD. The one-way 8-byte Put lateny was 0.49 μs that was redued by 31% from that for Tofu2. The injetion rate per node was 38.1 GB/s whih was approximately 83% of the injetion rate for Tofu2. The link effiieny was as high as approximately 93%. Additionally, the evaluation results showed the onstraints on the in-node usage of the Tofu barrier to avoid performane penalty. 653

9 REFERENCES [1] RIKEN Center for Computational Siene About the Projet. [online] Available at: [Aessed: 06- May ] [2] Y. Ajima, S. Sumimoto and T. Shimizu, "Tofu: A 6D Mesh/Torus Interonnet for Exasale Computers," in IEEE Computer, vol. 42, no. 11, pp. 36?40, [3] Y. Ajima, Y. Takagi, T. Inoue, S. Hiramoto and T. Shimizu, The Tofu Interonnet, IEEE 19th Annual Symposium on High Performane Interonnets (HOTI), pp , [4] H. Miyazaki, Y. Kusano, N. Shinjo, F. Shoji, M. Yokokawa and T. Watanabe, Overview of the K omputer System, Fujitsu Sientifi and Tehnial Journal, vol. 48, no.3, pp , [5] Y. Ajima et al. "Tofu Interonnet 2: System-on-Chip Integration of High-Performane Interonnet," In Proeedings of the 29th International Conferene on Superomputing (ISC14), pp , [6] Y. Ajima et al., The Tofu Interonnet 2, IEEE 22nd Annual Symposium on High-Performane Interonnets (HOTI), pp , [7] InfiniBand Trade Assoiation, InfiniBand Arhiteture Speifiation Volume 1 Release 1.2.1, [8] Oak Ridge Leadership Computing Faility Summit. [online] Available at: [Aessed: 06- May ] [9] Jak Dongarra, "Report on the Sunway TaihuLight System." [online] Available at: [Aessed: 06- May ] [10] B. Vinaik and R. Puri, Orale s Sonoma Proessor: Advaned Lowost SPARC Proessor for Enterprise Workloads, HotChips 27, [11] M. S. Birrittella et al., Intel Omni-path Arhiteture: Enabling Salable, High Performane Fabris, IEEE 23rd Annual Symposium on High- Performane Interonnets (HOTI), pp. 1-9, [12] Intel Aurora Fat Sheet. [online] Available at: [Aessed: 15- May ] [13] G. Faanes, et al., Cray asade: a sable HPC system based on a Dragonfly network, In Proeedings of the International Conferene on High Performane [14] D. Chen, et al., The IBM Blue Gene/Q Interonnetion Network and Message Unit, In Proeedings of the International Conferene on High Performane Computing, Networking, Storage and Analysis (SC 2012), Artile 26, [15] D. Chen et al., Looking under the hood of the IBM Blue Gene/Q network, 2012 International Conferene for High Performane Computing, Networking, Storage and Analysis (SC), pp. 1-12,

The Tofu Interconnect D

The Tofu Interconnect D The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji

More information

SSD Based First Layer File System for the Next Generation Super-computer

SSD Based First Layer File System for the Next Generation Super-computer SSD Based First Layer File System for the Next Generation Super-omputer Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Sept. 24 th, 2018 0 Outline of This Talk A64FX: High

More information

Post-K Supercomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA

Post-K Supercomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA Post-K Superomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA Toshiyuki Shimizu Nov. 15th, 2018 Post-K is under development, information in these slides is subjet to hange without notie 0 Agenda

More information

次世代スーパーコンピュータ向け ファイルシステムについて

次世代スーパーコンピュータ向け ファイルシステムについて Gfarm シンポジウム 2018 次世代スーパーコンピュータ向け ファイルシステムについて Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Ot. 26 th, 2018 0 Outline of This Talk A64FX: High Performane Arm CPU Next Generation

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect

Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shunji Uno, Shinji Sumimoto, Kenichi Miura, Naoyuki Shida, Takahiro Kawashima,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect

Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shunji Uno, Shinji Sumimoto, Kenichi Miura, Naoyuki Shida, Takahiro Kawashima,

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments Establishing Seure Ethernet LANs Using Intelligent Swithing Hubs in Internet Environments WOEIJIUNN TSAUR AND SHIJINN HORNG Department of Eletrial Engineering, National Taiwan University of Siene and Tehnology,

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo

More information

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary * DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Eunheol Kim, Gwan Choi, Mark Yeary * Dept. of Eletrial Engineering, Texas A&M University, College Station, TX-77840

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks Multi-hop Fast Conflit Resolution Algorithm for Ad Ho Networks Shengwei Wang 1, Jun Liu 2,*, Wei Cai 2, Minghao Yin 2, Lingyun Zhou 2, and Hui Hao 3 1 Power Emergeny Center, Sihuan Eletri Power Corporation,

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration Automati Generation of Transation-Level Models for Rapid Design Spae Exploration Dongwan Shin, Andreas Gerstlauer, Junyu Peng, Rainer Dömer and Daniel D. Gajski Center for Embedded Computer Systems University

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU Parallel Blok-Layered Nonbinary QC-LDPC Deoding on GPU Huyen Thi Pham, Sabooh Ajaz and Hanho Lee Department of Information and Communiation Engineering, Inha University, Inheon, 42-751, Korea Abstrat This

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Implementing Load-Balanced Switches With Fat-Tree Networks

Implementing Load-Balanced Switches With Fat-Tree Networks Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, Jul. 2016 3231 Copyright 2016 KSII Reevaluating the overhead of data preparation for asymmetri multiore system on graphis proessing

More information

Design of High Speed Mac Unit

Design of High Speed Mac Unit Design of High Speed Ma Unit 1 Harish Babu N, 2 Rajeev Pankaj N 1 PG Student, 2 Assistant professor Shools of Eletronis Engineering, VIT University, Vellore -632014, TamilNadu, India. 1 harishharsha72@gmail.om,

More information

Computing Pool: a Simplified and Practical Computational Grid Model

Computing Pool: a Simplified and Practical Computational Grid Model Computing Pool: a Simplified and Pratial Computational Grid Model Peng Liu, Yao Shi, San-li Li Institute of High Performane Computing, Department of Computer Siene and Tehnology, Tsinghua University, Beijing,

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method Measurement of the stereosopi rangefinder beam angular veloity using the digital image proessing method ROMAN VÍTEK Department of weapons and ammunition University of defense Kouniova 65, 62 Brno CZECH

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks 62 Uplink Channel Alloation Sheme and QoS Management Mehanism for Cognitive Cellular- Femtoell Networks Kien Du Nguyen 1, Hoang Nam Nguyen 1, Hiroaki Morino 2 and Iwao Sasase 3 1 University of Engineering

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Z8530 Programming Guide

Z8530 Programming Guide Z8530 Programming Guide Alan Cox alan@redhat.om Z8530 Programming Guide by Alan Cox Copyright 2000 by Alan Cox This doumentation is free software; you an redistribute it and/or modify it under the terms

More information

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters Parallelization and Performane of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters F. Zhang, A. Bilas, A. Dhanantwari, K.N. Plataniotis, R. Abiprojo, and S. Stergiopoulos Dept. of Eletrial

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

Episode 12: TCP/IP & UbiComp

Episode 12: TCP/IP & UbiComp Episode 12: TCP/IP & UbiComp Hannes Frey and Peter Sturm University of Trier Outline Introdution Mobile IP TCP and Mobility Conlusion Referenes [1] James D. Solomon, Mobile IP: The Unplugged, Prentie Hall,

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

RS485 Transceiver Component

RS485 Transceiver Component RS485 Transeiver Component Publiation Date: 2013/3/25 XMOS 2013, All Rights Reserved. RS485 Transeiver Component 2/12 Table of Contents 1 Overview 3 2 Resoure Requirements 4 3 Hardware Platforms 5 3.1

More information

New Channel Allocation Techniques for Power Efficient WiFi Networks

New Channel Allocation Techniques for Power Efficient WiFi Networks ew Channel Alloation Tehniques for Power Effiient WiFi etworks V. Miliotis, A. Apostolaras, T. Korakis, Z. Tao and L. Tassiulas Computer & Communiations Engineering Dept. University of Thessaly Centre

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Quad copter Control Using Android Smartphone

Quad copter Control Using Android Smartphone International Journal of Researh (IJR) e-issn: 2348-6848, p- ISSN: 2348-795X Volume 3, Issue 05, Marh 2016 Available at http://internationaljournalofresearh.org Quad opter Control Using Android Smartphone

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

High Speed Area Efficient VLSI Architecture for DCT using Proposed CORDIC Algorithm

High Speed Area Efficient VLSI Architecture for DCT using Proposed CORDIC Algorithm International Journal of Innovative Researh in Siene, Engineering and Tehnology Website: www.ijirset.om High Speed Area Effiient VLSI Arhiteture for DCT using Proposed CORDIC Algorithm Deepnarayan Sinha

More information

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes Redued-Complexity Column-Layered Deoding and Implementation for LDPC Codes Zhiqiang Cui 1, Zhongfeng Wang 2, Senior Member, IEEE, and Xinmiao Zhang 3 1 Qualomm In., San Diego, CA 92121, USA 2 Broadom Corp.,

More information

Xpander Rack Mount 2 Gen 3 HPC Version User Guide

Xpander Rack Mount 2 Gen 3 HPC Version User Guide Xpander Rak Mount 2 Gen 3 HPC Version User Guide Xpander Rak Mount 2 is a 2U rak mount PCI Express (PCIe) expansion enlosure that enables onnetion of two passively-ooled aelerators to a host omputer. The

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks International Journal of Computer Theory and Engineering, Vol. 5, No. 2, April 213 A Multi-Head Clustering Algorithm in Vehiular Ad Ho Networks Shou-Chih Lo, Yi-Jen Lin, and Jhih-Siao Gao Abstrat Clustering

More information

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC Priya N. Werahera and Anura P. Jayasumana Department of Eletrial Engineering Colorado State University

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 22 BioTehnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(22), 2014 [13995-14001] Improvement of low illumination image enhanement

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

User-level Fairness Delivered: Network Resource Allocation for Adaptive Video Streaming

User-level Fairness Delivered: Network Resource Allocation for Adaptive Video Streaming User-level Fairness Delivered: Network Resoure Alloation for Adaptive Video Streaming Mu Mu, Steven Simpson, Arsham Farshad, Qiang Ni, Niholas Rae Shool of Computing and Communiations, Lanaster University

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

High-level synthesis under I/O Timing and Memory constraints

High-level synthesis under I/O Timing and Memory constraints Highlevel synthesis under I/O Timing and Memory onstraints Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn, Eri Martin To ite this version: Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn,

More information

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 Next Generation Technical Computing Unit Fujitsu Limited Contents Overview of the PRIMEHPC FX10 Supercomputer 2 SPARC64 TM IXfx: Fujitsu-Developed

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

Dynamic Backlight Adaptation for Low Power Handheld Devices 1 Dynami Baklight Adaptation for ow Power Handheld Devies 1 Sudeep Pasriha, Manev uthra, Shivajit Mohapatra, Nikil Dutt and Nalini Venkatasubramanian 444, Computer Siene Building, Shool of Information &

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

An Approach to Physics Based Surrogate Model Development for Application with IDPSA An Approah to Physis Based Surrogate Model Development for Appliation with IDPSA Ignas Mikus a*, Kaspar Kööp a, Marti Jeltsov a, Yuri Vorobyev b, Walter Villanueva a, and Pavel Kudinov a a Royal Institute

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

ICC: An Interconnect Controller for the Tofu Interconnect Architecture

ICC: An Interconnect Controller for the Tofu Interconnect Architecture : An Interconnect Controller for the Tofu Interconnect Architecture August 24, 2010 Takashi Toyoshima Next Generation Technical Computing Unit Fujitsu Limited Background Requirements for Supercomputing

More information