High performance CUDA based CNN image processor
|
|
- Charity Burke
- 6 years ago
- Views:
Transcription
1 High pefomance UDA based NN image pocesso GEORGE VALENTIN STOIA, RADU DOGARU, ELENA RISTINA STOIA Depatment of Applied Electonics and Infomation Engineeing Univesity Politehnica of Buchaest -3, Iuliu Maniu Blvd., Secto 6, Buchaest ROMANIA Abstact: - ellula neual netwoks (NNs) have been adopted as solution in vaious fields due to thei poweful yet simple achitectue. Pactical implementations using VLSI o FPGA ae vey efficient but difficult to use in development o simulation stages, when wide spead cost effective, easy to lean, high pefomance solutions ae equied. GPU and moe specific UDA based simulatos can povide the computing powe equied fo developing, simulating and unning NNs. This pape investigates solutions to optimize the utilization of nvidia s Keple achitectue to achieve pefomance up to 9. Million /s. Key-Wods: - UDA enabled GPU, high pefomance NN simulato, image pocessing Intoduction Developing and simulating cellula neual netwoks help finding the ight genes fo specific poblems o discoveing potential new applications. Speeding up the simulation is desiable but this should come with minimal development and implementation costs. UDA enabled GPU seems the pefect solution: its massive paallel achitectue matches the NN achitectue, its thoughput-oiented design can suffice the equied computing powe fo unning NN, its compatibility with cuent pogamming languages (e.g.,, Python, Fotan) and libaies o middlewaes (e.g. OpenA, Matlab, OpenL) ease the migation of the applications fom PU to GPU platfoms. Also the availability and cost of UDA enabled GPUs helps choosing this oute fo implementing high pefomance, high poductivity solutions [], [2]. Thee is an inceased inteest in adopting UDA as high pefomance, high poductivity platfom and combining this with the continuous development of the UDA enabled GPUs equies a continuous eseach and investigation in finding efficient implementations fo specific poblems [3]. Pevious wok elated to NN implementations on GPUs uses pevious UDA achitectues (e.g. Tesla o Femi achitectues) with notable esults ove PU o dedicated image pocessing libaies (e.g OpenV) when typical acceleation of 7-2 wee obtained [4], [5], [9], []. Although the NN specific data-paallel computation model fits GPU achitectue, high pefomance implementations must conside the GPU esouces and thei specific limitations. As visible in Fig., thee ae some notable diffeences when compaing GPU with PU achitectues: smalle cache memoy and simple contol units which leads to highe global memoy access latency, customizable memoy types (shaed memoy, constant memoy, tetue memoy, egistes). This pape deals with such aspects and poposes a new implementation model fo the NN discete time image pocesso on the UDA platfom using a moe ecent nvidia s Keple achitectue. Fig. : PU vs. GPU achitectual diffeences [6] This pape analyses the implementation of NN image pocesso on nvidia s Keple achitectue. The discete time NN model as descibed in [7], [8] is pesented. Memoy types (e.g. global memoy, shaed memoy, tetue memoy) and access pattens ae analyzed to find the optimal configuation fo the implementation of the discete NN model. The memoy access patten of the NN simulation makes this poblem a memoybandwidth bounded poblem [5]. Specific techniques can be applied to impove the pefomance (e.g. the use of shaed memoy, ISBN:
2 memoy cache, coalesced memoy eads as pesented in [3]) but we can t go beyond some limiting factos: is desiable but we can t fit the image into the fast, low latency, on-chip shaed memoy, is desiable but we can t fit all the eads fom within a block of theads into a single coalesced memoy ead because thee is a limited 28 bytes pe ead tansaction, and even so still thee is a significant 2-4 clock cycles ead latency [6], is desiable but we can t avoid global memoy ead/wites fo the cell states since the initial image and the final esult ae placed into the global memoy. Ou appoach is to conside the compute to global memoy access (GMA) atio defined as the numbe of floating-point calculations pefomed fo each access to the global memoy: inceasing this atio we can impove the pefomance of the implementation model. 2 The NN image pocesso The discete time NN model is descibed by the following equation [5]: ( t ) ( ) ( ) h A(, y B(, u ( z ) () whee (t) epesents the state of a cell at t, ( is an element in the S () neighbohood, A(, and B(, ae the feed-back and feed-fowad templates, u l (t) is the input image, z is the offset and y l (t) is the output that is calculated accoding to the following fomula: yi,.5( ) (2) o using the equivalent fom:, yi,, < (3), Assuming that the image is constant duing the iteations (i.e. u l (t) u l ), () can be divided in two pats: the feed-fowad and the feed-back pat. The feed-fowad pat must be computed only once, at the beginning of the iteative pocess: g B(, u z (4) ( ) The NN pocess can be epessed as follows: ( t ) ( ) A(, y ( h g ) (5) G(A,B,z) ae called genes and a specific combination of values fo A, B and z detemines the behavio of the NN (i.e. a specific image filte): shapening, softening, edge detection, theshold, ditheing, etc. 2. The implementation model Efficient GPU pogamming pattens ae based on dividing the poblem into a lage numbe of theads, each thead eecuting the same code but on diffeent data. Rathe than dividing the poblem in few lage blocks as accustomed in multitheading PU implementations, GPU allows (and benefits) fom computing each cell in a sepaate thead thus obtaining thousands of theads that will be efficiently managed by the GPU contol unit. A two dimensional stuctue of blocks of theads pocess a two dimensional egion of cell. Nomalize and compute gi ompute (t)i Denomalize Load image : [, N] t : [, N] t<t : [, N] Save image PU (sequentia implementation model Load image i NN Synchonize t i NN Synchonize t<t i NN Synchonize Save image GPU (paalle implementation model Fig. 2: PU sequential and GPU paallel implementation model of NN 3 High pefomance NN The implementation model descibed in pevious section is based on an iteative pocess, in each iteation (t) each thead compute the new cell state based on the pevious cell state, on it s neighbohood state and coesponding feedfowad constant value. Based on (5), assuming we have an A(33) mati and the state ae stoed ISBN:
3 into the global memoy then each thead pefoms 33 eads and one wite to global memoy and pefoms 33 floating point multiplications and additions packet into a single FMA opeation (fused multiply add) and two additions. We can compute the GMA atio fo single cell iteation as follows: 33 GMA NN GM (6) 33 In ode to incease the GMA ation, thus inceasing the efficiency, we can educe the numbe of global memoy eads by oganizing the memoy eads at block level and splitting the cell iteation in two pats: each thead eads the coesponding cell state fom the global memoy and save it into a shaed memoy and, afte a synchonization point, each thead compute the new cell state eading data fom the shaed memoy. This appoach inceases Eecution time (us) Hoizontal (piels) 4 6 Vetical (piels) Fig. 3: Eecution time (T GM ) using global memoy fo 2424 image size the GMA fom to : 2 33 GMANN SM (7) Note that the eads fom shaed memoy ae ignoed while computing GMA atio since the shaed memoy access has much lowe latency than eads fom the global memoy. Also note that (7) does not include the special case fo bode when the coesponding thead must pefom anothe ead fo the cell outside the block o the case fo cone when two eads ae equied. In a simila way we can obtain the GMA atio fo the case when the data is placed into the tetue memoy (tetues ae stoed also into the global memoy): 33 GMANN TM (8) We simulate the thee cases descibed above on the same 2424 piels image. Woking on a gay scale image, each piel is a byte data containing the gay level in the [,255] ange of intege values. Befoe stating the iteative pocess descibed the Section 2., the data must be nomalized, i.e. tansfoming the [, 255] intege piel values into [-.,.] floating point values coesponding to the state initial value. Also at this moment we can compute the constant g accoding to (4). Sepaate UDA kenels ae eecuted by the GPU, each one using only global memoy, global memoy and shaed memoy, and tetue memoy and shaed memoy espectively. Epeimental esults ae focused on measuing the eecution time on GPU. Eecution time (us) Hoizontal (piels) 4 6 Vetical (piels) Fig. 4: Eecution time (T SM ) using global memoy and shaed memoy fo 2424 image size Eecution time (us) Hoizontal (piels) 6 Vetical (piels) Fig. 5: Eecution time (T TM ) using tetue memoy fo 2424 image size 4 ISBN:
4 As pesented in Fig. 3, 4, and 5, thee is a consistency between the calculated GMA and the eecution time. ompaing (6) and (7) fo eample, we can notice that using shaed memoy to stoe intemediate eads fom the global memoy, thee is about an ode of magnitude incease of the GMA which confimed in the epeimental esults pesented in Fig. 3 and 4. Note that Eecution time ais values fom the Fig. 3 ae times bigge than the values pesented in the Fig. 4. Using shaed memoy combined with global memoy o tetue memoy poduces simila esults but two impotant obsevations must be made. Fist obsevation is that eads fom global memoy can be coalesced, meaning that eads fom theads within a block of theads ae packed into a single tansactions if ae made on consecutive bytes fom the memoy. By convention in ou epeiments the two dimensional image is stoed into the global memoy in a ow mao configuation. In this case eads fom hoizontal block of theads ae packed into a single tansaction and the memoy access latency is educed ove the case of the vetical blocks configuation [3]. Second obsevation is that the tetue eads ae not coalesced but can benefit fom the locality access optimizations: eads ae faste if neighbohood memoy locations ae accessed. Fig. 5 show that iespective the use of vetical, squaed o hoizontal blocks the eecution time is consistent when compaed with the access pattens fom the global memoy as pesented in Fig. 3 o Fig. 4. A deepe investigation shows that the best pefomance is obtained when using hoizontal block of theads and global memoy, as pesented in Table. Table : Eecution time compaison fo shaed memoy (T SM ) and tetue memoy (T TM ) depending on image size and Eecution time (μs) image size T SM T TM image size T SM T TM image size T SM T TM image size T SM T TM The new model A new appoach is poposed in ode to futhe impove the GMA atio. Analyzing the eisting NN implementation model thee is a incemental pocess in which each iteation is computed in one step and consists in the following oppeations: eading the cuent state fom the memoy (global memoy o tetue memoy), computing the new state, save the new state back into the memoy and synchonize among all blocks of theads, as descibed in Fig. 6. Read fom global memoy/tetue memoy Save to shaed memoy ompute new state Wite to global memoy Fig. 6: One iteation pe step i NN Synchonize (between the theads within a block) i NN i NN Synchonize (between kenel calls) Inceasing the GMA atio by educing the global memoy opeations can be obtained if we combine moe iteation into a single step. One iteation pe stage must ead the state fom the block plus the oute laye of neighbohood and compute the new cell state only fo the within the block. Two iteations pe step must ead the state fom the block plus the two oute layes of neighbohood and compute two iteations fo the new cell state and one iteation fo the fom the fist oute laye as pesented in Fig. 7. Moe iteations can be pefomed into a single step with additional layes to be ead and computed. Neighbohood MN block Fig. 7: One and two iteation pe step Neighbohood Neighbohood MN block The GMA atio at block level fo one iteation pe step and MN can be calculated as follows: ISBN:
5 33MN GMABlock IpS (9) MN 2( M 2) 2N In a simila way the GMA atio at block level fo one iteation in the case of two iteations pe step can be calculate: GMABlock 2IpS 233MN 2( M 2) 2N () 2( MN 2( M 2) 2N 2( M 4) 2( N 2) ) Assuming a NN pocess consisting in T iteations, using two iteations pe step then T/2 steps ae equied so less memoy eads and wites fom and to the global memoy poduces bette GMA atio impacting the eecution time, as pesented Table 2. Table 2. Eecution time fo one, two and fou iteations pe step, fo 2424 piels image size and T2 iteations One ite./step Two ite./step Fou ite./step Eecution time (ms) Speed-up Eecution time pe cell and iteation T cit (ns).3.8. ell iteations/s ( 6 ) onclusion UDA enabled GPU platfoms povides to the developes a diffeent achitectue when compaed with the taditional PU. Highly paallel computing powe, lage but high latency global memoy, low latency but limited cache and shaed memoy, locality optimized and cached tetue memoy could be efficiently used and combined to implement high pefomance algoithms. Measuements wee pefomed on the following hadwae/softwae achitectue: Windows 7/32 bit opeating system, nvidia UDA Toolkit v5.5, PU Intel oe 2Duo E632 PU unning at.86 GHz, 2GB DDR2 DRAM, nvidia GeFoce GTX 65 Ti Boost GPU using Keple achitectue compatibility 3., 768 coes in fou 98 MHz base clock multipocessos, GB GDDR5 DRAM with 44.2 GB/s bandwidth. Refeences: [] R. Dogau, I. Dogau, High Poductivity ellula Neual Netwok Implementation on GPU using Python, Poceedings of the Wokshop on Infomation Technology and Bionics Symposium in Memoy of Tamas Roska, Budapest, Hungay, June, 25, pp [2] R. Dogau, I. Dogau, A Low ost High Pefomance omputing Platfom fo ellula Nonlinea Netwoks using Python fo UDA, 2th Intenational onfeence on ontol Systems and Science, 25, pp [3] G.V. Stoica, R. Dogau,.E. Stoica, Speedingup Image Pocessing in Reaction-Diffusion ellula Neual Netwoks using UDA enabled GPU Platfoms, Intenational onfeence on Electonics, omputes and Atificial Intelligence, Buchaest, Oct. 24, Vol. 2, pp [4] K.V. Kalgin, Implementation of algoithms with a fine-gained paallelism on GPUs, Numeical Analysis and Applications, Vol.4, No., pp 46-55, 2. [5] E. Laszlo, P. Szolgay and Z. Nagy, Analysis of a GPU based NN implementation, 3th Intenational Wokshop on ellula Nanoscale Netwoks and Thei Applications (NNA), Tuin, Aug. 29-3, 22. [6] UDA Pogamming Guide, [7] Roska, T. and hua, L.O., The NN univesal machine: an analogic aay compute, in IEEE Tansactions on icuits and Systems II: Analog and Digital Signal Pocessing, vol. 4, no. 3, 993, pp [8] L.O. hua and L. Yang, ellula Neual Netwok: Theoy, in IEEE Tansactions on icuits and Systems, vol. 35, no., 988, pp [9] R. Dolan and G. DeSouza, GPU-Based Simulation of ellula Neual Netwoks fo Image Pocessing, in Poceedings of Intenational Joint onfeence on Neual Netwoks, Atlanta, Geogia, USA, 29, pp [] S. Potlu A. Fasih, L. K. Vutukuu, F. Al Machot, K. Kyamakya, NN Based High Pefomance omputing fo Real Time Image Pocessing on GPU, Nonlinea Dynamics and Synchonization (INDS) & 6th Int'l Symposium on Theoetical Electical Engineeing (ISTET), Klagenfut, Austia, 2, pp. -7 ISBN:
A Memory Efficient Array Architecture for Real-Time Motion Estimation
A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN
More informationJournal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012
2011, Scienceline Publication www.science-line.com Jounal of Wold s Electical Engineeing and Technology J. Wold. Elect. Eng. Tech. 1(1): 12-16, 2012 JWEET An Efficient Algoithm fo Lip Segmentation in Colo
More informationIP Network Design by Modified Branch Exchange Method
Received: June 7, 207 98 IP Netwok Design by Modified Banch Method Kaiat Jaoenat Natchamol Sichumoenattana 2* Faculty of Engineeing at Kamphaeng Saen, Kasetsat Univesity, Thailand 2 Faculty of Management
More informationControlled Information Maximization for SOM Knowledge Induced Learning
3 Int'l Conf. Atificial Intelligence ICAI'5 Contolled Infomation Maximization fo SOM Knowledge Induced Leaning Ryotao Kamimua IT Education Cente and Gaduate School of Science and Technology, Tokai Univeisity
More informationANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS
ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS Daniel A Menascé Mohamed N Bennani Dept of Compute Science Oacle, Inc Geoge Mason Univesity 1211 SW Fifth
More informationColor Interpolation for Single CCD Color Camera
Colo Intepolation fo Single CCD Colo Camea Yi-Ming Wu, Chiou-Shann Fuh, and Jui-Pin Hsu Depatment of Compute Science and Infomation Engineeing, National Taian Univesit, Taipei, Taian Email: 88036@csie.ntu.edu.t;
More informationSegmentation of Casting Defects in X-Ray Images Based on Fractal Dimension
17th Wold Confeence on Nondestuctive Testing, 25-28 Oct 2008, Shanghai, China Segmentation of Casting Defects in X-Ray Images Based on Factal Dimension Jue WANG 1, Xiaoqin HOU 2, Yufang CAI 3 ICT Reseach
More informationSYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH
I J C A 7(), 202 pp. 49-53 SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH Sushil Goel and 2 Rajesh Vema Associate Pofesso, Depatment of Compute Science, Dyal Singh College,
More informationCOSC 6385 Computer Architecture. - Pipelining
COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped
More informationOptical Flow for Large Motion Using Gradient Technique
SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 3, No. 1, June 2006, 103-113 Optical Flow fo Lage Motion Using Gadient Technique Md. Moshaof Hossain Sake 1, Kamal Bechkoum 2, K.K. Islam 1 Abstact: In this
More informationPositioning of a robot based on binocular vision for hand / foot fusion Long Han
2nd Intenational Confeence on Advances in Mechanical Engineeing and Industial Infomatics (AMEII 26) Positioning of a obot based on binocula vision fo hand / foot fusion Long Han Compute Science and Technology,
More informationA New Finite Word-length Optimization Method Design for LDPC Decoder
A New Finite Wod-length Optimization Method Design fo LDPC Decode Jinlei Chen, Yan Zhang and Xu Wang Key Laboatoy of Netwok Oiented Intelligent Computation Shenzhen Gaduate School, Habin Institute of Technology
More informationDetection and Recognition of Alert Traffic Signs
Detection and Recognition of Alet Taffic Signs Chia-Hsiung Chen, Macus Chen, and Tianshi Gao 1 Stanfod Univesity Stanfod, CA 9305 {echchen, macuscc, tianshig}@stanfod.edu Abstact Taffic signs povide dives
More informationLecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining
EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining
More informationPoint-Biserial Correlation Analysis of Fuzzy Attributes
Appl Math Inf Sci 6 No S pp 439S-444S (0 Applied Mathematics & Infomation Sciences An Intenational Jounal @ 0 NSP Natual Sciences Publishing o Point-iseial oelation Analysis of Fuzzy Attibutes Hao-En hueh
More informationLecture 8 Introduction to Pipelines Adapated from slides by David Patterson
Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.
More informationImage Enhancement in the Spatial Domain. Spatial Domain
8-- Spatial Domain Image Enhancement in the Spatial Domain What is spatial domain The space whee all pixels fom an image In spatial domain we can epesent an image by f( whee x and y ae coodinates along
More informationUCB CS61C : Machine Structures
inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called
More informationAn Extension to the Local Binary Patterns for Image Retrieval
, pp.81-85 http://x.oi.og/10.14257/astl.2014.45.16 An Extension to the Local Binay Pattens fo Image Retieval Zhize Wu, Yu Xia, Shouhong Wan School of Compute Science an Technology, Univesity of Science
More informationCS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia
CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned
More informationA ROI Focusing Mechanism for Digital Cameras
A ROI Focusing Mechanism fo Digital Cameas Chu-Hui Lee, Meng-Feng Lin, Chun-Ming Huang, and Chun-Wei Hsu Abstact With the development and application of digital technologies, the digital camea is moe popula
More informationANN Models for Coplanar Strip Line Analysis and Synthesis
200 IJCSNS Intenational Jounal of Compute Science and Netwok Secuity, VOL.8 No.10, Octobe 2008 Models fo Coplana Stip Line Analysis and J.Lakshmi Naayana D.K.Si Rama Kishna D.L.Patap Reddy Chalapathi Institute
More informationCellular Neural Network Based PTV
3th Int Symp on Applications of Lase Techniques to Fluid Mechanics Lisbon, Potugal, 6-9 June, 006 Cellula Neual Netwok Based PT Kazuo Ohmi, Achyut Sapkota : Depatment of Infomation Systems Engineeing,
More informationModule 6 STILL IMAGE COMPRESSION STANDARDS
Module 6 STILL IMAE COMPRESSION STANDARDS Lesson 17 JPE-2000 Achitectue and Featues Instuctional Objectives At the end of this lesson, the students should be able to: 1. State the shotcomings of JPE standad.
More informationSpiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks
Spial Recognition Methodology and Its Application fo Recognition of Chinese Bank Checks Hanshen Tang 1, Emmanuel Augustin 2, Ching Y. Suen 1, Olivie Baet 2, Mohamed Cheiet 3 1 Cente fo Patten Recognition
More informationA modal estimation based multitype sensor placement method
A modal estimation based multitype senso placement method *Xue-Yang Pei 1), Ting-Hua Yi 2) and Hong-Nan Li 3) 1),)2),3) School of Civil Engineeing, Dalian Univesity of Technology, Dalian 116023, China;
More informationA VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM
Accepted fo publication Intenational Jounal of Flexible Automation and Integated Manufactuing. A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM Nagiza F. Samatova,
More informationIntroduction To Pipelining. Chapter Pipelining1 1
Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?
More informationTowards Adaptive Information Merging Using Selected XML Fragments
Towads Adaptive Infomation Meging Using Selected XML Fagments Ho-Lam Lau and Wilfed Ng Depatment of Compute Science and Engineeing, The Hong Kong Univesity of Science and Technology, Hong Kong {lauhl,
More informationEffects of Model Complexity on Generalization Performance of Convolutional Neural Networks
Effects of Model Complexity on Genealization Pefomance of Convolutional Neual Netwoks Tae-Jun Kim 1, Dongsu Zhang 2, and Joon Shik Kim 3 1 Seoul National Univesity, Seoul 151-742, Koea, E-mail: tjkim@bi.snu.ac.k
More informationA Novel Automatic White Balance Method For Digital Still Cameras
A Novel Automatic White Balance Method Fo Digital Still Cameas Ching-Chih Weng 1, Home Chen 1,2, and Chiou-Shann Fuh 3 Depatment of Electical Engineeing, 2 3 Gaduate Institute of Communication Engineeing
More informationLecture #22 Pipelining II, Cache I
inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html
More informationPrediction of Time Series Using RBF Neural Networks: A New Approach of Clustering
138 The Intenational Aab Jounal of Infomation Technology, Vol. 6,. 2, Apil 2009 Pediction of Time Seies Using RBF Neual Netwoks: A New Appoach of Clusteing Mohammed Awad 2, Hécto Pomaes 1, Ignacio Rojas
More informationA Two-stage and Parameter-free Binarization Method for Degraded Document Images
A Two-stage and Paamete-fee Binaization Method fo Degaded Document Images Yung-Hsiang Chiu 1, Kuo-Liang Chung 1, Yong-Huai Huang 2, Wei-Ning Yang 3, Chi-Huang Liao 4 1 Depatment of Compute Science and
More informationHigh Performance Computing on GPU for Electromagnetic Logging
Intenational Confeence "Paallel and Distiuted Computing Systems" High Pefomance Computing on GPU fo lectomagnetic Logging Glinskikh V.N. Kontoovich A.. pov M.I. Tofimuk Institute of Petoleum Geology and
More informationOn Error Estimation in Runge-Kutta Methods
Leonado Jounal of Sciences ISSN 1583-0233 Issue 18, Januay-June 2011 p. 1-10 On Eo Estimation in Runge-Kutta Methods Ochoche ABRAHAM 1,*, Gbolahan BOLARIN 2 1 Depatment of Infomation Technology, 2 Depatment
More informationAny modern computer system will incorporate (at least) two levels of storage:
1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe
More informationDynamic Multiple Parity (DMP) Disk Array for Serial Transaction Processing
IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 9, SEPTEMBER 200 949 Dynamic Multiple Paity (DMP) Disk Aay fo Seial Tansaction Pocessing K.H. Yeung, Membe, IEEE, and T.S. Yum, Senio Membe, IEEE AbstactÐThe
More informationA Recommender System for Online Personalization in the WUM Applications
A Recommende System fo Online Pesonalization in the WUM Applications Mehdad Jalali 1, Nowati Mustapha 2, Ali Mamat 2, Md. Nasi B Sulaiman 2 Abstact foeseeing of use futue movements and intentions based
More informationMapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma
apreduce Optimizations and Algoithms 2015 Pofesso Sasu Takoma www.cs.helsinki.fi Optimizations Reduce tasks cannot stat befoe the whole map phase is complete Thus single slow machine can slow down the
More informationInput Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer
Using the Gow-And-Pune Netwok to Solve Poblems of Lage Dimensionality B.J. Biedis and T.D. Gedeon School of Compute Science & Engineeing The Univesity of New South Wales Sydney NSW 2052 AUSTRALIA bbiedis@cse.unsw.edu.au
More informationAnd Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,
(IJCSIS) Intenational Jounal of Compute Science and Infomation Secuity, Efficient Candidacy Reduction Fo Fequent Patten Mining M.H Nadimi-Shahaki 1, Nowati Mustapha 2, Md Nasi B Sulaiman 2, Ali B Mamat
More informationA New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE
5th Intenational Confeence on Advanced Mateials and Compute Science (ICAMCS 2016) A New and Efficient 2D Collision Detection Method Based on Contact Theoy Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai
More informationPerforming real-time image processing on distributed computer systems
Pefoming eal-time image pocessing on distibuted compute systems RADU DOBRESCU, MATEI DOBRESCU, DAN POPESCU "Politehnica" Univesity of Buchaest, Faculty of Contol and Computes, 313 Splaiul Independentei,
More informationi-pcgrid Workshop 2016 April 1 st 2016 San Francisco, CA
i-pcgrid Wokshop 2016 Apil 1 st 2016 San Fancisco, CA Liang Min* Eddy Banks, Bian Kelley, Met Kokali, Yining Qin, Steve Smith, Philip Top, and Caol Woodwad *min2@llnl.gov, 925-422-1187 LDRD 13-ERD-043
More informationApproximating Euclidean Distance Transform with Simple Operations in Cellular Processor Arrays
00 th Intenational Wokshop on Cellula Nanoscale Netwoks and thei Applications (CNNA) Appoximating Euclidean Distance Tansfom with Simple Opeations in Cellula Pocesso Aas Samad Razmjooei and Piot Dudek
More informationTitle. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information
Title CALCULATION FORMULA FOR A MAXIMUM BENDING MOMENT AND THE TRIANGULAR SLAB WITH CONSIDERING EFFECT OF SUPPO UNIFORM LOAD Autho(s)NOMURA, K.; MOROOKA, S. Issue Date 2013-09-11 Doc URL http://hdl.handle.net/2115/54220
More informationa Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives
SPARK: Soot Reseach Kit Ondřej Lhoták Objectives Spak is a modula toolkit fo flow-insensitive may points-to analyses fo Java, which enables expeimentation with: vaious paametes of pointe analyses which
More informationIllumination methods for optical wear detection
Illumination methods fo optical wea detection 1 J. Zhang, 2 P.P.L.Regtien 1 VIMEC Applied Vision Technology, Coy 43, 5653 LC Eindhoven, The Nethelands Email: jianbo.zhang@gmail.com 2 Faculty Electical
More informationOPTIMAL KINEMATIC SYNTHESIS OF CRANK & SLOTTED LEVER QUICK RETURN MECHANISM FOR SPECIFIC STROKE & TIME RATIO
OPTIMAL KINEMATIC SYNTHESIS OF CRANK & SLOTTED LEVER QUICK RETURN MECHANISM FOR SPECIFIC STROKE & TIME RATIO Zeeshan A. Shaikh 1 and T.Y. Badguja 2 1,2 Depatment of Mechanical Engineeing, Late G. N. Sapkal
More informationMultidimensional Testing
Multidimensional Testing QA appoach fo Stoage netwoking Yohay Lasi Visuality Systems 1 Intoduction Who I am Yohay Lasi, QA Manage at Visuality Systems Visuality Systems the leading commecial povide of
More informationShortest Paths for a Two-Robot Rendez-Vous
Shotest Paths fo a Two-Robot Rendez-Vous Eik L Wyntes Joseph S B Mitchell y Abstact In this pape, we conside an optimal motion planning poblem fo a pai of point obots in a plana envionment with polygonal
More informationKeith Dalbey, PhD. Sandia National Labs, Dept 1441 Optimization & Uncertainty Quantification
SAND 0-50 C Effective & Efficient Handling of Ill - Conditioned Coelation atices in Kiging & adient Enhanced Kiging Emulatos hough Pivoted Cholesky Factoization Keith Dalbey, PhD Sandia National Labs,
More informationTransmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design
Tansmission Lines Modeling Based on Vecto Fitting Algoithm and RLC Active/Passive Filte Design Ahmed Qasim Tuki a,*, Nashien Fazilah Mailah b, Mohammad Lutfi Othman c, Ahmad H. Saby d Cente fo Advanced
More information17/5/2009. Introduction
7/5/9 Steeo Imaging Intoduction Eample of Human Vision Peception of Depth fom Left and ight eye images Diffeence in elative position of object in left and ight eyes. Depth infomation in the views?? 7/5/9
More informationCS 2461: Computer Architecture 1 Program performance and High Performance Processors
Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks
More informationECE331: Hardware Organization and Design
ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam
More informationFrequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters
Optics and Photonics Jounal, 016, 6, 94-100 Published Online August 016 in SciRes. http://www.scip.og/jounal/opj http://dx.doi.og/10.436/opj.016.68b016 Fequency Domain Appoach fo Face Recognition Using
More informationLecture # 04. Image Enhancement in Spatial Domain
Digital Image Pocessing CP-7008 Lectue # 04 Image Enhancement in Spatial Domain Fall 2011 2 domains Spatial Domain : (image plane) Techniques ae based on diect manipulation of pixels in an image Fequency
More informationModelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set
Modelling, simulation, and pefomance analysis of a CAN FD system with SAE benchmak based message set Mahmut Tenuh, Panagiotis Oikonomidis, Peiklis Chachalakis, Elias Stipidis Mugla S. K. Univesity, TR;
More informationAn Unsupervised Segmentation Framework For Texture Image Queries
An Unsupevised Segmentation Famewok Fo Textue Image Queies Shu-Ching Chen Distibuted Multimedia Infomation System Laboatoy School of Compute Science Floida Intenational Univesity Miami, FL 33199, USA chens@cs.fiu.edu
More informationSlotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System
Slotted Random Access Potocol with Dynamic Tansmission Pobability Contol in CDMA System Intaek Lim 1 1 Depatment of Embedded Softwae, Busan Univesity of Foeign Studies, itlim@bufs.ac.k Abstact In packet
More informationAn Optimised Density Based Clustering Algorithm
Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 An Optimised Density Based Clusteing Algoithm J. Hencil Pete Depatment of Compute Science St. Xavie s College, Palayamkottai,
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We ae IntechOpen, the wold s leading publishe of Open Access books Built by scientists, fo scientists,800 6,000 0M Open access books available Intenational authos and editos Downloads Ou authos ae among
More informationRANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES
RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES Svetlana Avetisyan Mikayel Samvelyan* Matun Kaapetyan Yeevan State Univesity Abstact In this pape, the class
More informationPrioritized Traffic Recovery over GMPLS Networks
Pioitized Taffic Recovey ove GMPLS Netwoks 2005 IEEE. Pesonal use of this mateial is pemitted. Pemission fom IEEE mu be obtained fo all othe uses in any cuent o futue media including epinting/epublishing
More informationMulti-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples
Multi-azimuth Pestack Time Migation fo Geneal Anisotopic, Weakly Heteogeneous Media - Field Data Examples S. Beaumont* (EOST/PGS) & W. Söllne (PGS) SUMMARY Multi-azimuth data acquisition has shown benefits
More informationFifth Wheel Modelling and Testing
Fifth heel Modelling and Testing en Masoy Mechanical Engineeing Depatment Floida Atlantic Univesity Boca aton, FL 4 Lois Malaptias IFMA Institut Fancais De Mechanique Advancee ampus De lemont Feand Les
More informationParallel processing model for XML parsing
Recent Reseaches in Communications, Signals and nfomation Technology Paallel pocessing model fo XML pasing ADRANA GEORGEVA Fac. Applied Mathematics and nfomatics Technical Univesity of Sofia, TU-Sofia
More informationSimulation and Performance Evaluation of Network on Chip Architectures and Algorithms using CINSIM
J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 2011, TextRoad Publication ISSN 2090-424X Jounal of Basic and Applied Scientific Reseach www.textoad.com Simulation and Pefomance Evaluation of Netwok on
More informationComputer Science 141 Computing Hardware
Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431
More informationTopological Characteristic of Wireless Network
Topological Chaacteistic of Wieless Netwok Its Application to Node Placement Algoithm Husnu Sane Naman 1 Outline Backgound Motivation Papes and Contibutions Fist Pape Second Pape Thid Pape Futue Woks Refeences
More informationCardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions
Cadiac C-Am CT SNR Enhancement by Combining Multiple Retospectively Motion Coected FDK-Like Reconstuctions M. Pümme 1, L. Wigstöm 2,3, R. Fahig 2, G. Lauitsch 4, J. Honegge 1 1 Institute of Patten Recognition,
More informationDYNAMIC STORAGE ALLOCATION. Hanan Samet
ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu
More informationHierarchically Clustered P2P Streaming System
Hieachically Clusteed P2P Steaming System Chao Liang, Yang Guo, and Yong Liu Polytechnic Univesity Thomson Lab Booklyn, NY 11201 Pinceton, NJ 08540 Abstact Pee-to-pee video steaming has been gaining populaity.
More informationXFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers
XFVHDL: A Tool fo the Synthesis of Fuzzy Logic Contolles E. Lago, C. J. Jiménez, D. R. López, S. Sánchez-Solano and A. Baiga Instituto de Micoelectónica de Sevilla. Cento Nacional de Micoelectónica, Edificio
More informationColor Correction Using 3D Multiview Geometry
Colo Coection Using 3D Multiview Geomety Dong-Won Shin and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 13 Cheomdan-gwagio, Buk-ku, Gwangju 500-71, Republic of Koea ABSTRACT Recently,
More informationA Novel Parallel Deadlock Detection Algorithm and Architecture
A Novel Paallel Deadlock Detection Aloithm and Achitectue Pun H. Shiu 2, Yudon Tan 2, Vincent J. Mooney III {ship, ydtan, mooney}@ece.atech.ed }@ece.atech.edu http://codesin codesin.ece.atech.eduedu,2
More informationUser Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)
PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,
More informationCOMPARISON OF CHIRP SCALING AND WAVENUMBER DOMAIN ALGORITHMS FOR AIRBORNE LOW FREQUENCY SAR DATA PROCESSING
COMPARISON OF CHIRP SCALING AND WAVENUMBER DOMAIN ALGORITHMS FOR AIRBORNE LOW FREQUENCY SAR DATA PROCESSING A. Potsis a, A. Reigbe b, E. Alivisatos a, A. Moeia c,and N. Uzunoglu a a National Technical
More informationBehavioral Modeling of a C-Band Ring Hybrid Coupler Using Artificial Neural Networks
RADIOENGINEERING, VOL. 19, NO. 4, DECEMBER 010 645 Behavioal Modeling of a C-Band Ring Hybid Couple Using Atificial Neual Netwoks Edem DEMIRCIOGLU 1, Muat H. SAZLI 1 R&D Satellite Design Depatment, Tuksat
More informationDEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson
DEADLOCK AVOIDANCE IN BATCH PROCESSES M. Tittus K. Åkesson Univesity College Boås, Sweden, e-mail: Michael.Tittus@hb.se Chalmes Univesity of Technology, Gothenbug, Sweden, e-mail: ka@s2.chalmes.se Abstact:
More informationDrag Optimization on Rear Box of a Simplified Car Model by Robust Parameter Design
Vol.2, Issue.3, May-June 2012 pp-1253-1259 ISSN: 2249-6645 Dag Optimization on Rea Box of a Simplified Ca Model by Robust Paamete Design Sajjad Beigmoadi 1, Asgha Ramezani 2 *(Automotive Engineeing Depatment,
More informationNew Algorithms for Daylight Harvesting in a Private Office
18th Intenational Confeence on Infomation Fusion Washington, DC - July 6-9, 2015 New Algoithms fo Daylight Havesting in a Pivate Office Rohit Kuma Lighting Solutions and Sevices Philips Reseach Noth Ameica
More informationTHE THETA BLOCKCHAIN
THE THETA BLOCKCHAIN Theta is a decentalized video steaming netwok, poweed by a new blockchain and token. By Theta Labs, Inc. Last Updated: Nov 21, 2017 esion 1.0 1 OUTLINE Motivation Reputation Dependent
More informationGoal. Rendering Complex Scenes on Mobile Terminals or on the web. Rendering on Mobile Terminals. Rendering on Mobile Terminals. Walking through images
Goal Walking though s -------------------------------------------- Kadi Bouatouch IRISA Univesité de Rennes I, Fance Rendeing Comple Scenes on Mobile Teminals o on the web Rendeing on Mobile Teminals Rendeing
More informationOn the Conversion between Binary Code and Binary-Reflected Gray Code on Boolean Cubes
On the Convesion between Binay Code and BinayReflected Gay Code on Boolean Cubes The Havad community has made this aticle openly available. Please shae how this access benefits you. You stoy mattes Citation
More informationAssessment of Track Sequence Optimization based on Recorded Field Operations
Assessment of Tack Sequence Optimization based on Recoded Field Opeations Matin A. F. Jensen 1,2,*, Claus G. Søensen 1, Dionysis Bochtis 1 1 Aahus Univesity, Faculty of Science and Technology, Depatment
More informationEmbeddings into Crossed Cubes
Embeddings into Cossed Cubes Emad Abuelub *, Membe, IAENG Abstact- The hypecube paallel achitectue is one of the most popula inteconnection netwoks due to many of its attactive popeties and its suitability
More informationTopic -3 Image Enhancement
Topic -3 Image Enhancement (Pat 1) DIP: Details Digital Image Pocessing Digital Image Chaacteistics Spatial Spectal Gay-level Histogam DFT DCT Pe-Pocessing Enhancement Restoation Point Pocessing Masking
More informationA Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components
A Neual Netwok Model fo Stong and Reteving 2D Images of Rotated 3D Object Using Pncipal Components Tsukasa AMANO, Shuichi KUROGI, Ayako EGUCHI, Takeshi NISHIDA, Yasuhio FUCHIKAWA Depatment of Contol Engineeng,
More informationHISTOGRAMS are an important statistic reflecting the
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 D 2 HistoSketch: Disciminative and Dynamic Similaity-Peseving Sketching of Steaming Histogams Dingqi Yang, Bin Li, Laua Rettig, and Philippe
More informationART GALLERIES WITH INTERIOR WALLS. March 1998
ART GALLERIES WITH INTERIOR WALLS Andé Kündgen Mach 1998 Abstact. Conside an at galley fomed by a polygon on n vetices with m pais of vetices joined by inteio diagonals, the inteio walls. Each inteio wall
More informationA Full-mode FME VLSI Architecture Based on 8x8/4x4 Adaptive Hadamard Transform For QFHD H.264/AVC Encoder
20 IEEE/IFIP 9th Intenational Confeence on VLSI and System-on-Chip A Full-mode FME VLSI Achitectue Based on 8x8/ Adaptive Hadamad Tansfom Fo QFHD H264/AVC Encode Jialiang Liu, Xinhua Chen College of Infomation
More information4.2. Co-terminal and Related Angles. Investigate
.2 Co-teminal and Related Angles Tigonometic atios can be used to model quantities such as
More informationGravitational Shift for Beginners
Gavitational Shift fo Beginnes This pape, which I wote in 26, fomulates the equations fo gavitational shifts fom the elativistic famewok of special elativity. Fist I deive the fomulas fo the gavitational
More informationAdaptation of TDMA Parameters Based on Network Conditions
Adaptation of TDMA Paametes Based on Netwok Conditions Boa Kaaoglu Dept. of Elect. and Compute Eng. Univesity of Rocheste Rocheste, NY 14627 Email: kaaoglu@ece.ocheste.edu Tolga Numanoglu Dept. of Elect.
More informationAn Improved Resource Reservation Protocol
Jounal of Compute Science 3 (8: 658-665, 2007 SSN 549-3636 2007 Science Publications An mpoved Resouce Resevation Potocol Desie Oulai, Steven Chambeland and Samuel Piee Depatment of Compute Engineeing
More informationMobility Pattern Recognition in Mobile Ad-Hoc Networks
Mobility Patten Recognition in Mobile Ad-Hoc Netwoks S. M. Mousavi Depatment of Compute Engineeing, Shaif Univesity of Technology sm_mousavi@ce.shaif.edu H. R. Rabiee Depatment of Compute Engineeing, Shaif
More informationAnnales UMCS Informatica AI 2 (2004) UMCS
Pobane z czasopisma Annales AI- Infomatica http://ai.annales.umcs.pl Annales Infomatica AI 2 (2004) 33-340 Annales Infomatica Lublin-Polonia Sectio AI http://www.annales.umcs.lublin.pl/ Embedding as a
More information