Architectures for high-performance FPGA implementations of neural models

Size: px
Start display at page:

Download "Architectures for high-performance FPGA implementations of neural models"

Transcription

1 INSTITUTE OFPHYSICS PUBLISHING JOURNAL OFNEURALENGINEERING J. Neurl Eng. 3 (2006) 2 34 doi:0.088/ /3//003 Architectures for high-performnce FPGA implementtions of neurl models Rndll K Weinstein nd Roert H Lee,2 Lortory for Neuroengineering, School of Electricl nd Computer Engineering, Georgi Institute of Technology, Atlnt, GA 30332, USA 2 Wllce H Coulter Deprtment of Biomedicl Engineering, Emory University, Atlnt, GA 30322, USA E-mil: rlee2@emory.edu Received 7 August 2005 Accepted for puliction 2 Novemer 2005 Pulished 9 Decemer 2005 Online t stcks.iop.org/jne/3/2 Astrct As the complexity of neurl models continues to increse (lrger popultions, vried ionic conductnces, more detiled morphologies, etc) trditionl softwre-sed models hve difficulty scling to rech the performnce levels desired. This pper descries the use of FPGAs, or field progrmmle gte rrys, to esily implement wide vriety of neurl models with the performnce of custom nlogue circuits or computer clusters, the reconfigurility of softwre, nd t cost rivlling personl computers. FPGAs rech this level of performnce y enling the design of neurl models s prllel processed dt pths. These rchitectures provide for wide rnge of single-comprtment, multi-comprtment nd popultion models to e redily converted to FPGA implementtions. Generlized rchitectures re descried for the efficient modelling of first-order, nonliner differentil eqution in throughput mximizing or ltency minimizing dt-pth configurtions. The homogeneity of popultion nd multicomprtment models is exploited to form deep pipelines for improved performnce. Limittions of FPGA rchitectures nd future reserch res re explored.. Introduction Tody s neurl circuit modeller is often constrined y the limited performnce of conventionl personl computers. While computing power might doule every 8 months ccording to Moore s lw, neurl modellers often chnge the computtionl requirements y orders of mgnitude. Popultion models cn lwys simulte more neurons or morphologicl models cn lwys simulte more comprtments. The electrophysiologist cn lwys simulte dditionl conductnces, while the systems iologist cn dd dditionl cellulr processes to model. In generl, current neurl circuit modellers mke sustntil trdeoffs y limiting complexity to reduce processing requirements. FPGAs, or field progrmmle gte rrys, offer high performnce, configurle, sclle pltform for enling current nd next genertion neurl models. FPGAs re specilized integrted circuits tht re designed with reconfigurle computtionl primitives cple of implementing ritrry clcultions nd logic. They re widely used in consumer nd industril products for ccelerting processor intensive lgorithms. Engineers designing networking, rdr, wireless communictions, video processing, erospce, militry, nd test equipment, medicl imging nd other computtionlly intensive pplictions often utilize FPGAs s hrdwre coprocessors (Prdeep et l 2005,Xioynget l 2004), DSP processors (Kmlizd et l 2003, Wlkeet l 2000) or strem processors (Krishnmurthy et l 2002,Leeet l 999). FPGAs hve een less commonly used in io-relted fields, with severl exceptions. Protein (Oliver et l 2005)nd DNA (Brown et l 2004) sequencing re strting to use FPGAs to reduce processing time. Rel-time processing, registrtion nd other imge nlyses from confocl microscopy re enled y FPGAs (Budge et l 2004, Rest et l 2004). Most modelling pplictions on FPGAs hve een limited to studying neurl networks consisting of reduced neurons (Blke et l 997, Botros nd Adul-Aziz 994, Chngjin nd Hmmerstrom 2003, Omondi nd Rjpkse 2002). More complex models hve een implemented in nlogue VLSI (Brgg et l 2002, Jung et l 200). Recently, in our lortory, severl implementtions of conductnce-sed neurl models /06/0002+4$ IOP Pulishing Ltd Printed in the UK 2

2 R K Weinstein nd R H Lee hve emerged including Hodgkin nd Huxley s (952) nd Booth nd Rinzel s (995) models (Grs et l 2004), nd ten-comprtment motoneuron model (Weinstein nd Lee 2005). However, ech of these designs ws somewht specific to the model eing implemented. This pper expnds on those efforts y descriing generlized lgorithms nd rchitectures tht provide migrtion pth for current softwre-sed neurl models into FPGA-sed implementtions. 2. Methods 2.. FPGA hrdwre All designs were trgeted towrds Xilinx XtremeDSP Development Kit-II sed on Nlltech BenONE crrier ord consisting of one DIME-II module slot nd populted with Nlltech BenADDA expnsion ord. The BenADDA is preconfigured with Xilinx Virtex-II FPGA (XC2V3000-4fg676), two 4 it 65 MS s nlogue-to-digitl converters (ADC), two 4 it 60 MS s digitl-to-nlogue converters (DAC) nd megyte of on ord SRAM. All model designs were constructed using System Genertor (ver. 6.3i), n dd-on toolkit for Mthworks Simulink (ver. 6.2). System Genertor provides lirry of locks tht cn e converted into n HDL (hrdwre description lnguge) for synthesis. For simple locks such s multiplexors, logic gtes nd registers, the tool does direct trnsltion into the HDL. For more complex structures such s memories, multipliers nd dders, the tool relies on the Xilinx CORE Genertor (CoreGen). CoreGen comines user specified design constrints (it width, depth, etc) with timing constrints (ltency) nd re constrints (prllel versus seril). The Xilinx ISE Foundtion 6.3i pckge ws utilized for synthesis, plce nd routing, nd genertion of itstrem for progrmming. For certin results, Synplicity s Simplify Pro (ver. 7.0), n lterntive 3rd-prty synthesis tool ws employed for comprison. System Genertor provides unique interfce for FPGA digitl design through its rich lirry of synthesizle locks. Clocking is implicitly defined through the setting of smple periods (of ritrry units) for ech lock. Reset sttes re esily defined through the interfce. Additionlly, utomted support for uses nd explicit declrtion of fixed-point type formt simplify wht would tke considerle mount of effort when progrmming in trditionl HDL. System Genertor comines oth n interfce helpful to the trditionl hrdwre designer while hiding underlying detils to the neurl modeller FPGA uilding locks The vst mjority of processing within the dt pth involves four locks: ddsu, mult, cmult nd lookup tles. The former three encompss the rithmetic opertions (ddition, sutrction, multipliction, multipliction y constnt) nd the ltter is for ll other opertions. Division is sent from this list s there is yet to e high-speed, low ltency implementtion. The disdvntge is miniml s divisions cn generlly e mpped to multipliction of the inverse of the denomintor. (The denomintor is very often constnt or prmeter in neurl models, mking for trivil implementtion.) Ech lock, when mpped into the FPGA, cn tke on numer of different forms, ech with its own trdeoffs. Ech lock cn e chrcterized y its throughput, ltency, re, it width nd sign formt. The throughput is mesure of the numer of opertions tht cn e performed y the unit in given mount of time. The ltency of n opertion is the dely represented s mximum numer of cycles the lock requires to propgte n input to n output. When lock is pipelined (i.e. roken into multiple suopertions ech of one cycle durtion) or cple of completing one opertion per cycle regrdless of ltency, there is often negtive correltion etween throughput nd ltency. Resources within n FPGA re utilized in vrying wys depending on the prmeters of the lock. Additionl pipeline stges require dditionl registers per lock, while wider dt pth requires more logic. Different rchitectures cn sve re while scrificing performnce, such s in the cse of sequentil multiplier using single ccumultor to sum prtil products (Yo nd Swrtzlnder 993). Optimlly, the dt pth will exploit s much prllelism in the model s possile. In the simple eqution + cd, the multipliction of nd cn occur in prllel to the multipliction of c nd d. Then the ddition of the two terms cn follow. The metric used for determining prllelism is not simply the numer of opertions. Insted, the ltency of the opertion hs to e considered. In the eqution y = + c + d + ke + f where k is constnt, the multipliction of nd c, the constnt multipliction, nd pirwise sum re done in prllel (see figure ). Additionl rithmetic steps re done ised towrds the pths of lesser ltency. If ech of the ddition opertions hs ltency of one cycle, then the skewing of the ddition steps wy from the multipliers enles lnced tree of three to five cycles of ltency per pth. If the numer of opertions ws the metric for lncing expression trees, the dditions cn e relnced shifting the ltency per pth rnge to etween 2 cycles ( y) nd 6cycles(k, e y) Lookup tles Often, the neurl models require computtions tht re difficult, either resource hevy or too slow, for use in neurl model simultion. These cn e trigonometric functions, exponentils, squre roots or ny other expression with difficult-to-evlute closed-form solution. When the difficultto-evlute expression is function of single input, lookup tle provides n re efficient nd high performnce wy to estimte the output of the function. For n exmple, the stedy stte of gting vrile cn e estimted vi sigmoid or Boltzmnn s eqution s defined y: m = +exp ( ) () V θ σ where θ is the hlf ctivtion voltge of the gte, σ is mesure of the slope of the sigmoid nd V is the memrne potentil. This eqution is difficult to solve directly for two resons. 22

3 FPGA neurl model rchitectures () () Figure. Exmple expression trees for the eqution + c + d + ke + f where k is constnt. Two opernd multiply opertions re shown with two units of dely while constnt multiplies hve four units of dely. All dders re given one unit dely. Additionl pipeline registers cn e dded ritrrily to ech opertion. () The tree is lnced with respect to the numer of opertions per pth. () The tree is lnced with respect to the numer of cycles of ltency per pth. First, the exponentil hs no simple closed-form solution tht is efficient in n FPGA. Second, the inverse function is s difficult s division, which is generlly solved itertively, not directly like multipliction. Division y σ cn e simplified y refrming the expression s multipliction y new prmeter, /σ. Since eqution (2) is difficult to evlute directly in hrdwre, it is good cndidte for lookup tle. A ROM indexed y V cn produce suitle estimtion of m, thus removing the need for ny of the rithmetic opertions within the eqution. A simple mpping is required to convert the V input to n ddress for the ROM. For the generl cse: 2 n dder(x) = (x min(x)) (2) min(x) +mx(x) where n is the numer of its of ddressility in the lookup tle. The implementtion requires sutrction lock nd multipliction y constnt to perform the liner trnsform. The output of this multipliction should e set to sturte to void overflows when ddressing the tle. This mpping cn e shred for multiple lookup tles tht use the sme input. In Virtex-II FPGA, lookup tles re chosen to utilize SelectRAM, or lock RAM within System Genertor. A XC2V3000 FPGA contins 96 SelectRAM of which ech contins 8 kits of configurle RAM. Ech SelectRAM cn e configured s its, k 8 its, 2k 9 its, 4k 4 its, 8k 2 its or 6k it. Prtil SelectRAMs re wsted, so ech lookup tle cn expnd to fill the full RAM. In generl, lookup tles fit within the k 8 it configurtion Arithmetic performnce Generting optiml models often requires trdeoffs etween pipeline depth nd clock rte. In generl, deeper pipeline enles high clock rtes. In cses where the logic is fully utilizing the pipeline, deeper pipeline should trnslte to higher performnce, up until the point re on the FPGA ecomes limited. In the cse where the overll execution ltency is to e minimized independent of the numer of clock cycles, the clock rte is no longer the determinte of performnce, ut rther the timing delys through the pipeline. This dely is clculted s the product of the clock period (T ) nd one plus the pipeline depth ( + d). This is conservtive estimte s it ssumes tht the opertion uses the entirety of the clock period prior to nd fter the internl pipeline. In order to determine the influence of ltency on opertion throughput nd re, ech opertion ws synthesized nd mpped in the trget Xilinx FPGA, nd is shown in figure 2. Disy chined logic locks (ritrry length chin of six locks) were utilized to otin verge performnce results per opertion. To mitigte ny performnce hit (y wire or logic dely) cused y interfcing the inputs nd outputs of the opertions, doule-uffered registers were used to mp directly to input nd output pins. All opertions were set t 4 it 2 s complement signed numers with the fixed point set t the 3th it. This llows rnge of ± with pproximtely four deciml plces of ccurcy. Synthesis ws performed with the stndrd Xilinx Synthesis Technology (XST) uilt into ISE. Following plce nd route nd genertion of itstrem, re utiliztion ws ssessed nd the criticl pth from the log files ws noted. The inverse of the criticl pth period ws used s the pek clock frequency nd is shown in figure 2(). The re per opertion ws split into two components, registers, or flip-flops, nd look-up tles (LUTs). Ech vlue plotted (see figure 2()) is the verge re per opertion; the se overhed (from the doule-uffering of dt converters) ws sutrcted from totl utiliztion of registers nd LUTs nd the result divided y six to otin the verge. System Genertor provides n optimiztion flg, Pipeline to Gretest Extent Possile which ws set for the dder opertions ut not set for the constnt multiplier nd stndrd multiplier tests. We did performnce comprisons for ech opertion with this flg set nd unset nd found negligile differences. When set, however, the tool restricts the rnge of ltencies tht re progrmmle, in this cse, limiting to t lest three cycles for multiplictions nd one cycle for dditions. Tle summrizes the results for optimizing to two different design gols (pek throughput nd minimum ltency) post synthesis nd plce & route. Are is depicted s the numer of slices utilized. In the Virtex-II rchitecture, the logic fric is roken up into CLBs, or Configurle Logic Blocks. Essentilly, ech CLB cn output 8 its of informtion. Within ech CLB, there exist four slices nd two tristte uffers. Ech slice contins two it registers, two it 23

4 R K Weinstein nd R H Lee () () Numer of LUTs Numer of FFs Cycles per Opertion 4 5 (c) (d) Clock Freq (MHz) Op Dely (ns) Cycles per Opertion Adder Cmult Mult (logic) Mult (HW) 5 Figure 2. Are versus performnce trdeoff. (), () Performnce nd (c), (d) re utiliztion of sic rithmetic opertions. The enchmrks were performed with 4 it opernds nd 4 it output. The inputs nd outputs were doule uffered nd ssigned to externl pins on the FPGA. All performnce results re derived post synthesis nd plce & route. () Clock frequency is found s the mximum operting frequency when synthesized t for ech cycle ltency nd opertion. () The opertion dely here is shown s the dely (in ns) for n output to djust following chnge in n input. (c), (d) Are utiliztion is split etween flip-flops nd lookup tles, wherey two of ech mke slice. The enlrged mrkers designte the proposed delys for ech lock to minimize the re versus performnce trdeoff. These plots lso demonstrte the performnce nd re dvntge of using dders nd hrdwre emedded multipliers versus constnt multipliers nd logic-sed multipliers. When throughput is to e mximized ove ll else, then dders nd logic-sed multipliers re preferred. Tle. Pek performnce of opertions. Trget Mx throughput Min ltency Adder Depth 2 cycles 0 cycles Frequency 2.2 MHz Dely 9.5 ns 2.9 ns Are 4 slices 6 slices Cmult Depth 4 cycles 0 cycles Frequency 20.9 MHz Dely 33. ns 2.4 ns Are 98 slices 59 slices Mult Depth 5 cycles 0 cycles Frequency 69.7 MHz Dely 29.5 ns 0.5 ns Are 35 slices 0 slices The pek throughput of the multiplier is chieved when implementing in logic while the minimum ltency is chieved using n emedded multiplier lock. lookup tles (LUTs), nd dedicted rithmetic crry chins nd SOP (sum of products) logic. Ech LUT cn e configured s one 4 it ddressle lookup tle, one 6 it RAM or 6 it shift register. (The XC2V3000 hs 3584 CLB s.) These slices my e prtilly utilized in this design ut my e shred etween multiple opertions in resource-constrined design. The slice count in tle shows the sum of ll fully nd prtilly utilized slices. These dt show tht performnce cn e optimized for either throughput or ltency depending upon the requirements, nd tht different design choices will hve lrge impct on overll model performnce. When the resources nd model rchitecture llow for deep pipeline, ech opertion cn e hevily pipelined mximizing throughput, where the frequency is the mximum operting frequency sed on the criticl pth period. When pipelining is not desired (see Single-Cycle Architecture), pek performnce is chieved with no cycle ltency nd miniml dely per opertion. The lock option flg Pipeline to Gretest Extent Possile hd no noticele effect on performnce under these conditions nd ws disled. While we hve not repeted this study for dt widths greter thn 4 its, it is expected tht similr trends will follow. Throughput performnce is generlly mximized when using lrger pipelines. It is expected tht the optiml throughput would e found when ltency is set to higher vlues s the it width increses. Are ecomes especilly constrined when dt pths ecome wider. The Resources Estimtion Tool, included s prt of System Genertor is n invlule resource for the FPGA modeller when implementing re-constrined designs (Shi et l 2004). We cn utilize forml pproch for defining ech opertion s discrete-time trnsfer function sed on the cycle ltency from input to output. Bsed on the timing results depicted in figure 2, ech opertion cn e represented y the following expressions, where x, y re intermedite vlues, sttes or prmeters, k is constnt, nd p is the stge in the execution pipeline. Add(x[p],y[p]) = x[p ] + y[p ] Su(x[p],y[p]) = x[p ] y[p ] CMult(k, x[p]) = kx[p 4] Mult(x[p],y[p]) = x[p 2] y[p 2]. These mppings redefine rithmetic opertions for n FPGA implementtion tking into ccount opertionl dely. The multiplier delys re vlid for it widths 8 (the size of the uilt in multipliers). Alterntive mppings re defined for ddition nd sutrction sed on prticulr rchitecture nd design constrints. (3) 24

5 FPGA neurl model rchitectures 2.5. Externl interfcing nd dt collection FPGA models execute t extremely high throughputs. Roughly speking, the performnce level (defined s simulted time/execution time) is the time step multiplied y the numer of models nd the FPGA clock frequency then divided y the pipeline depth. This numer is mximized only when the FPGA is completely utilized. All of the models presented here re intended s exmples nd s such re not very complex. Consequently, they ll hve on-chip execution times of less thn ms (even if they were ll plced on the chip simultneously). Thus, for the ske of convenience, the dt presented here re generlly from emultion of the FPGA directly in Simulink. 3. The se rchitecture 3.. FitzHugh Ngumo For the ese of presenttion we will present the rchitecture s n exmple implementtion of simple neuron model. However, the ides cn e pplied to ny neuron model. The FitzHugh Ngumo model (FitzHugh 96, Ngumo et l 962) is reduced, dimensionless representtion of the Hodgkin nd Huxley model (Hodgkin nd Huxley 952). This model mkes the following ssumptions: () the ctivtion gte of the sodium chnnel hs extremely fst kinetics nd therefore reches the stedy-stte vlue instntneously, nd (2) the potssium chnnel gte hs similr, ut reverse kinetics (timescle nd gte chrcteristics) to the inctivting gte of the sodium chnnel. The Hodgkin nd Huxley equtions centred round four ordinry differentil equtions of voltge (V m ), sodium ctivtion (m), sodium inctivtion (h) nd potssium ctivtion (n) cn e reduced to simple potentil stte nd recovery stte. When non-dimensionlized, the following coupled differentil equtions emerge: du = u dt 3 u3 w + I (4) dw = ε( 0 + u w) (5) dt where u is the potentil of the system nd w is the recovery stte. The prmeters ε, 0 nd modulte the shpe of the spike nd I is the input (in dimensionless current) to the system. This model, despite eing drsticlly simplified, hs chrcteristics tht mke it stereotypicl of neurl circuit models. Ech eqution is first-order ordinry differentil eqution with eqution (4) hving nonliner term. The equtions re coupled nd cnnot e solved nlyticlly. This system enles demonstrtion of our techniques for FPGA model development in n esy-to-understnd exmple Generting the dt pth Differentil equtions of the stndrd form cn e split into two terms: the differentil term or the time vrying stte vrile, nd the intermedite clcultion, or eqution for the rte of chnge of the stte vrile. In equtions (4) nd (5), the left-hnd side is the differentil term nd the right-hnd side is denoted y the intermedite term. It is the intermedite term tht will e converted into dt pth for clcultion. Defining the functions f nd g from equtions (4) nd (5), respectively, s follows: f(u,w)= u 3 u3 w + I g(u, w) = ε( 0 + u w) mkes cler the delinetion etween the stte nd the intermedite; the generted dt pth ecomes independent of ny prticulr numericl solving techniques. First- nd higher order, fixed time step nd vrile time step solvers cn e implemented round the dt pth without ny modifiction to the design. Additionlly this isoltes the dt pth from ny prticulr simultion or protocol requirements, such s strting, stopping nd resetting. As will e descried lter in this pper, this seprted dt pth provides generl cse for rpid mpping into vrious rchitectures including popultion nd multicomprtment modelling ODE to difference eqution Ech eqution defined in the continuous time domin must e mpped to discrete time for numericl nlysis. Simultion on generl-purpose computer cn utilize the following discrete time representtion y mens of forwrd-euler integrtion, where n is the itertion step: u[n +]= u[n]+ t ( u[n] 3 u[n]3 w[n]+i ) w[n +]= w[n]+ t ε( 0 + u w). The ove equtions cn redily e clculted in generlpurpose processor where instructions re executed sequentilly for ech itertion of the loop. These equtions re not explicit with respect to processing in n FPGA, where ech opertion hs timing requirements tht must e considered. Additionlly, there is no explicit prllelism defined. Utilizing the commuttive nd ssocitive property of ddition nd sutrction, we reorder the rithmetic opertions to enle prllelism sed on opertion ltency s descried in the expression tree in figure. f(u,w)= [ ((u w) + I) ( (u u) ( 3 u))] g(u, w) = [ε (( 0 w) + ( u))]. In generl, multiplictions should e performed s erly s possile in the clcultion s the ltency is gretest. This will skew the resulting expression tree to force fster dditions nd constnt multiplictions outside the mximum ltency pth, when possile. To formlly define the ove expressions in the opertion spce of the FPGA, we cn use the mppings defined in eqution (3) to the redefined functions f nd g nd otin: f(u[p],w[p]) = Su(Add(Su(u[p],w[p]), I[p]),..., Mult(Mult(u[p],w[p]), CMult(/3,u[p]))) g(u[p],w[p]) = Mult(ε[p], Add(Su( 0 [p],w[p]),..., Mult( [p],u[p]))). 25

6 R K Weinstein nd R H Lee () u[p] 2 w[p] Fix_2_8 Fix_2_8 xlddsu z -- Su z -2 () Mult z -2 () Fix_2_7 3 I UFix 8 Fix 9 UFix_8_6 Fix 7 xlddsu z -+ Add Fix_2_8 z -2 () Mult xlddsu z -- Su Fix_0_7 f(u[p],w[p]) Mult2 UFix over 3 () 0 2 w[p] 3 5 u[p] UFix_8_5 Fix_2_8 UFix_8_5 Fix_2_8 xlddsu z -- Su z -2 () Mult Fix 8 Fix_7_3 Fix_2_8 xlddsu z -+ Add 4 e UFix_8_7 z -2 () Mult Fix_9_8 g(u[p],w[p]) Figure 3. Simulink lock digrm of intermedite clcultions () f(u,w)nd () g(u, w). Ech opertion is identified y the lel on the lock. All fixed-point dt types re lelled on the wires interconnecting the locks, where the first portion (Fix/UFix) defines the vlue to e signed or unsigned, respectively. The second term is the numer of totl its in the representtion nd the lst term defines the numer of frctionl its, or the numer of its to the left of the deciml plce. Dt ports for the susystem re the numered ovl locks, where the inputs re prmeters or sttes nd the output is the intermedite clcultion. Delys through the system re expressed in the z domin where the superscript is the ltency in cycles from output ck to input. Evluting these mppings yields equtions (6) nd (7). This representtion is useful to descrie the overll dely through the dt pth, in this exmple, six cycles. Since there is skew etween the shortest ltency pths nd the longest ltency pths (from one cycle to six cycles), dditionl pipeline registers cn e dded in the shorter delys without incresing the pipeline depth, possily enling incresed throughput with fster clock rte. Additions nd sutrctions re set to hve no ltency, ut this cn e redily chnged y reindexing the prmeters nd sttes y the dditionl dely. [ ] ((u[p ] w[p ]) + I[p ]) f(u[p],w[p]) = ((u[p 6] u[p 6]) (/3u[p 4])) (6) g(u[p],w[p]) [ ( )] (0 [p 3] w[p 3]) + = ε[p 3]. (7) ( [p 6] u[p 6]) These equtions define the required cycle ltency for the input to rech the output synchronously. Once implemented, this dt pth cn e used for well-performing implementtions of single-comprtment models, multi-comprtment models, popultion models, etc. The dt pth cn e constructed identiclly in ll of the ove cses. Figure 3 shows the System Genertor dt pth corresponding to equtions (6) nd (7). For these different rchitectures, only the implementtion of the stte vriles will chnge s is expounded upon in the following sections. 4. Single model, single unit cse 4.. Multi-cycle rchitecture The generl model simultor will execute single version of model ccording to set protocol. The modeller will often run simultions to mnully tune prmeters, trying to replicte prticulr ehviour. Sometimes this model will hve prticulr performnce requirements such s reltime execution. For these cses, we developed generl rchitecture for running series of differentil equtions with ritrry dely s multi-cycle processor. We employed two synchronous clock domins, one providing slower outer loop to interfce with the outside world nd fster inner loop for the dt pth processing. In this wy, it is very much like multi-cycle computer rchitecture, overclocking the internl pipeline in hidden fshion from the outside. The originl equtions for the stte vriles, u nd w cn e expressed s difference equtions round the functions f nd g. We use forwrd-euler integrtion s numericl solver due to its ese of implementtion. Additionlly, numericl ccurcy cn e improved with smller step sizes, resonle trdeoff given the high performnce of n FPGA. This rchitecture is esily extendle to higher order ODE solvers including Rung Kutt, predictor corrector, or even vrile time-step solvers if desired. du = f(u,w) u[n +]= u[n]+ t f(u[n],w[n]) (8) dt 26

7 FPGA neurl model rchitectures () () f(u,w) Fix_0_7 UFix_8_ ts f(u,w) Fix_0_7 u[n] Fix_2_8 UFix_8_ ts Fix_0_9 z -2 () Mult xlusmp 7 Up Smple xldsmp 7 z - Down Smple Fix_2_8 Fix_0_9 () Mult xlddsu Fix_2_8 + Add 2 u[p] xlddsu Fix_2_8 + xlregister d z - Fix_2_8 q Add Register u[n] Figure 4. Simulink lock digrm of stte clcultions for single unit models. () The multi-cycle pproch to single unit forwrd-euler integrtion. The intermedite clcultion is scled y the time constnt nd dded to the previous stte. The output of the stte, u[n], is clocked once for every seven cycles of u[p]. The Up Smple lock is set to copy the slower clocked smples cross ll seven fst clock cycles while the Down Smple lock is implemented s register t the slower rte. () By removing the delys within the pipeline, up nd down smpling ecomes unnecessry, requiring only register to store stte etween itertions. There is only one clock rte in this scenrio. dw = g(u, w) w[n +]= w[n]+ t g(u[n],w[n]). dt (9) Equtions (8) nd (9) representing the stte clcultion cn e comined with the dt pth formultion in equtions (6) nd (7) to generte the full expression for the model. The dt pth needs to e modified to include the time-step multipliction dding nother two to four delys to the dt pth. The totl dely for the dt pth is incresed to seven clock cycles. The trnsltion of the stte eqution into hrdwre is only single register clocked t the slower clock rte. Equtions (0) nd () re the comined dt pth nd stte expression where n is the itertion of the outer, slower clock nd p is the itertion of the pipeline, or fster clock. u[n +,p] = u[n, p]+ [ ] ((u[n, p 2] w[n, p 2]) + I[n, p 2]) t ((u[n, p 7] u[n, p 7]) (/3u[n, p 5])) (0) w[n +,p] = w[n, p]+ [ t ε[n, p 3] ( (0 [n, p 3] w[n, p 3]) + ( [n, p 6] u[n, p 6]) )]. () In this rchitecture, simplifiction emerges enled y the slower clocked stte. All execution pths within the dt pth must settle y the time the next vlue is clocked into the stte register. Therefore, for pths of fewer cycles thn the criticl pth (in this cse, the pth with the longest ltency), there is no difference if the inputs rrive erlier thn required. Therefore, ll pths cn receive their input on the previous outer cycle nd ltch the new vlue t the following outer cycle. All timing requirements of the pipeline with respect to insertion dely in the pipeline ecome unnecessry. The new equtions follow from the simplifiction: [ ] ((u[n] w[n]) + I[n]) u[n +]= u[n]+ t ((u[n] u[n]) (/3u[n])) [ ( )] (0 [n] w[n]) + w[n +]= w[n]+ t ε[n]. ( [n] u[n]) The forwrd-euler method is reltively simple to implement within System Genertor, tking only four locks s shown in figure 4(). The stte t ech itertion is stored in single register clocked t the output rte. We use comintion of n up smple nd down smple lock (the clock multiplier/divider is set to the mximum dely through the dt pth, including ny clcultion within the stte). The up smple lock is configured to copy the vlue from ech input period to ll corresponding output periods. This is not necessry for hrdwre genertion, i.e. it does not trnslte directly to hrdwre, ut does llow the softwre to verify correct clocking of dt through the system. The down smple lock is configured to copy the lst frme of the input to the output, which in hrdwre is register clocked t the output rte. The reminder of the stte consists of multiplier lock t the output of the dt pth to scle the function y the time step nd n dder to dd to the previous vlue. The previous vlue is t the output of the up smple lock. More complex integrtion lgorithms cn e implemented s n extension to this se cse. The multipliction y the time step cn lso e incorported within the dt pth depending on the prticulr integrtion lgorithm possily sving multipliction step. This cn e shown for the function g(u, w) where the constnt multipliction of ε cn e sustituted y the constnt multipliction of the product ε t. This is only possile in the trivil cse of Euler integrtion with fixed step sizes. 27

8 R K Weinstein nd R H Lee Tle 2. Single model rchitecture design comprison. Frequency Output frequency Ltency Are (MHz) (MHz) (ns) (slices) Single-cycle Multi-cycle Single-cycle rchitecture The multi-cycle rchitecture enles reduction in stte registers while still utilizing fully-pipelined dt pth. It provides mens of integrting pipeline of ritrry depth into n integrtion stte susystem producing only one output per time step. There re three drwcks with this pproch: wsted re, reduced performnce nd high power consumption. First, re is not utilized efficiently s pipeline delys within the intermedite clcultions use registers tht could e used for dditionl logic or extr dt storge. Ech dely requires numer of registers equl to the it width of tht clcultion. Significnt re svings cn e relized y removing those extr delys. Second, performnce is reduced for two resons: () extr delys contriute n dditionl time dely in the form of setup nd hold time. The register setup time, or the minimum mount of dely etween the dt ecoming stle prior to the edge of the clock signl, for n XC2V3000 speed grde 4 FPGA is 370 ps. The hold time, or the minimum durtion following the clock edge for the dt to e redy to red is 90 ps. The sum of those vlues constitutes window round the clock edge where dt must e stle nd is wsted within the smple period. Long pipelines cn ccumulte this 460 ps ded-time in the period for ech dely in the pth. (2) The overll dely through the pipeline is equl to the product of the depth nd cycle period. Ignoring setup nd hold times, the dely through the pipeline is idelly the sum of ll the rithmetic comintoril logic delys. If the totl logic cn e eqully distriuted (dely-wise) etween n ritrry numer of registers, then ltency/throughput is independent of the pipeline depth. Prcticlly, the period is function of the longest comintoril pth etween registers. Therefore, shorter comintionl pths must execute within lrger clock periods, reducing efficiency. As the numer of pipeline registers increses, the more difficult it is to mintin symmetry etween comintoril pth delys. Third, power consumption nd clock frequency re directly proportionl for given model design. The power consumption of device is not generlly n issue for the modeller, ut does constrin the design of the device itself; the pek clock frequency of n FPGA is prtilly constrined y the limits of het dissiption s power consumption increses. Achieving the sme or incresed throughput t slower clock rte is generlly preferred. Therefore, to utilize miniml re nd power while chieving pek performnce requires the reduction of the pipeline depth to the minimum chievle. The mjority of neurl models cn e modified to execute in single cycle per time step itertion y chnging ll ltencies to zero. (Note tht one register lwys remins due to integrtion, resulting in the single-cycle designtion.) For multiplier locks, the Pipeline to Gretest Extent Possile flg must e unchecked. This chnge cn e mde to ll rithmetic locks. Then the up-smple nd down-smple locks cn e chnged to single register to complete the single-cycle design pproch. The results of comprison etween the two design pproches re shown in tle 2 with the re utiliztion nd performnce results determined post synthesis nd plce & route trgeting n XC2V3000-4fg676 FPGA. The toplevel design depicted in figure 5 ws used for testing nd synthesizing the single-cycle nd multi-cycle models. In the single-cycle version, the entire dt pth is executed within ech clock cycle nd requires only 7 ns to complete. When delys re distriuted in the multi-cycle, totl of 7 for the longest pths, the ltency is tripled. The mximum clock frequency is incresed y lmost 2.5 times, not enough to compenste for the dditionl cycles required per itertion. The mximum frequency supported y the XtremeDSP-II Development Bord is 20 MHz cpping the pek multi-cycle model throughput. In this exmple, the single-cycle model enles 87% improvement in performnce with 34% reduction in re over the multi-cycle model. In generl, the single-cycle method is preferred over the multi-cycle method when ll the locks within the dt pth cn e set to zero ltency. When tht is not the cse, the multi-cycle method is suitle fllck technique. 5. Multiple model, single unit cse Deep pipelines llow for multiple simultneous processes executing within the dt pth, where the numer of processes equls the depth of the pipeline. In other words, if the dt pth requires ltency of ten cycles until the first output ppers, ten simultneous models cn e executed without loss of performnce. The dt pth would produce the ten models interleved t the inner, fster clock rte. In contrst, within the single-model, multi-cycle rchitecture, the pipeline produces n output t the outer, slower clock frequency. Ultimtely, the throughput of ech model does not chnge, ech output for the sme model will e t the slower frequency, ut the ggregte ndwidth of the system will sustntilly improve. Two scenrios re common cndidtes for multiple model, single unit simultion: () there re set numer of models you re interested in simulting, for exmple, ll the neurons in prticulr nuclei or circuit or (2) when the numer of simultneous simultions is flexile nd more is etter, such s in utomted prmeter serching or popultion modelling. The first scenrio pplies more constrints to the model nd produces very deterministic output. Only the ville re on the FPGA limits the second scenrio. Within these rchitectures, chnge in the numer of models simulted requires strightforwrd modifiction to the model design. 5.. Pipelining the dt pth The structure of the rithmetic opertions in the dt pth in the multiple model cse is identicl to tht of the single 28

9 FPGA neurl model rchitectures u[p].5 I UFix_8_6 w[p] I Fix_0_7 f(u[p],w[p]) f(u,w) f(u,w) dudt Fix_2_8 u[n] Fix_2_8 u[p] xlregister z - Fix_2_8 xlregister d z - Fix_2_8 Fix_2_ d q q xlcst force Reinterpret Register Register DAC doule u Scope k =2 0 UFix_8_5 0 w e UFix_8_5 w[p] UFix_8_7 e u[p] Fix_9_8 g(u[p],w[p]) Fix_2_8 w[n] g(u,w) Fix_2_8 w[p] dwdt xlregister z - Fix_2_8 xlregister d z - Fix_2_8 Fix_2_ d q q xlcst force Reinterpret Register2 Register3 DAC2 doule g(u,w) System Genertor Figure 5. Top-level view of the FitzHugh Ngumo model. Prmeters re defined in Xilinx constnt locks (on the left) moving into the intermedite clcultion. The sttes re evluted next, with feedck pths to the inputs of the intermedite clcultions. The outputs, u nd w re doule-uffered for performnce nd scled for nlogue output vi the on-ord dt converters (on the right). model cse. The differences lie in distriuting ltencies nd mnging the prmeters in the pipeline. Ltencies, or dditionl clock cycle delys, re inserted to mximize throughput of the system. As the longest comintoril pth etween ny two registers is the sole determinte of clock frequency nd therefore model throughput, cre must e tken to distriute the dely s uniform s possile throughout the pipeline. The following lgorithm descries n pproch to distriuting the ltency within the dt pth:. The trget numer of simultneous models will set the depth of the susequent pipeline generted. 2. The longest, weighted rithmetic pth is isolted nd delys re judiciously dded such tht the totl dely does not exceed the depth of the pipeline. The longest, weighted rithmetic pth is defined s strting from single endpoint (prmeter of the system or stte) nd terminting with the completion of the differentil of the stte. () Delys re dded in n initil pss providing rtio of 4:2: cycles (see eqution (3)) of ltency to constnt multipliers, hrdwre multipliers nd dditions/sutrctions, respectively. Nonhrdwre emedded multipliers hve similr dely requirements to constnt multipliers. () Add n dditionl dely for ech opernd of multiplier tht is greter thn 8 its. (c) Tles, oth ROM (red-only memory) nd singlend multi-port RAM (rndom ccess memory) locks require one unit of dely regrdless of it-width or ddressility when fit into one SelectRAM. Additionl glue logic is required when more thn one SelectRAM is required which could enefit from n dditionl dely. (d) When the numer of delys in the pipeline exceeds the numer of simultneous models, remove delys evenly throughout the pth until the numer of delys equls the numer of models. 3. Repet on ll other pths tking cre to never use more ltency cycles thn the criticl pth. Adding extr delys such tht ll pths re lnced is not necessry nd will wste FPGA resources. This lgorithm post processes the expression trees mking up the dt pth solving the intermedite clcultion, s defined ove, with the timing informtion required to interfce with the stte solvers nd prmeter susystems. The leves of these expression trees re the prmeters nd the stte inputs. Ech lef hs n insertion dely ssocited with it defined s the sum of the delys long the pth from the lef to the root of the tree, including ny clcultion tht my occur within the stte solver (ex. multipliction y the time constnt). For exmple, in figure (), k,,, c, d, e nd f re mpped to k[p 5], [p 3], [p 4], c[p 4], d[p 4], e[p 5] nd f [p 4], respectively. Synchroniztion of the pths within the expression trees is therefore ccomplished y providing delyed version of sttes nd prmeters to the leves consistent with the insertion dely of the prticulr lef node Pipelining the sttes Executing n models simultneously requires the continuous storge of n sets of informtion within pipeline tht is n stges deep. In the previous work (Grs et l 2004), ll informtion ws stored within the pipeline vi dely locks, requiring creful synchroniztion of the expressions. Sttes were implemented s simple dely lock with n cycles of ltency. This rchitecture recognizes the sttes nd prmeters s forming sis set of model informtion. All intermedites 29

10 R K Weinstein nd R H Lee u stte u intermedite z - z - z - z - z - z - z - z - z - z over 3 z -2 Mult z -2 Mult2 () () xlddsu z -- Su I[n,p-4] z -2 Mult xlddsu z -+ Add du/dt =f(u,w)= ((u-w)+i)-/3*u^3 () xlddsu z -- Su ts f(u[n,p-2],w[n,p-2]) z -2 Mult3 () u[n,p] xlddsu + u[n+,p] u integrte w intermedite 3 [n,p-7] 2 0[n,p-6] z -2 Mult4 xlddsu z -- Su2 () xlddsu z -+ Add2 4 e[n,p-4] z -2 () Mult ts 2 g(u[n,p-2],w[n,p-2]) z -2 Mult6 () xlddsu w[n+,p] + w integrte dw/dt =g(u,w)= e*((0-w)+*u) w[n,p] w stte z - z - z - z - z - z - z - z - z - z Pipeline Stge Figure 6. Simulink lock digrm of multiple unit model. Ten itertions of the FitzHugh Ngumo model re run simultneously through the constructed ten stge pipeline. All locks re depicted to e in scle with respect to cycle ltency. Since the dt pth requires only seven cycles per itertion, the stte register chin must e t lest seven stges within this rchitecture. It ws chosen to e ten for this exmple. cn e shown to e function of the sttes nd prmeters. Therefore, only the sttes must e explicitly stored within ech time step. The delys of the previous work re replced with chin of n registers. This hs two enefits: first, ech register cn now contin n initil vlue for the stte tht cn e unique for ech model simulted. By convention, the til of the chin (lst output) contins the initil stte of the first model nd the hed register of the chin stores the stte for the lst model. Second, the outputs of the n registers represent the set of ll delyed versions of the stte. These outputs re now ccessile to e routed ck into the expression trees t the dely required for the prticulr opertion. For exmple, if chnge on prticulr stte lef of the expression tree hs n insertion dely of six cycles, then the input of tht stte should e tpped six registers deep in the stte pipeline. This llows the stte informtion to follow cycle y cycle the intermedite logic within the expression tree. This lgorithm pplied to the FitzHugh Ngumo model is shown in figure Shred versus unique prmeters When executing set of models, prmeters cn e either sttic cross ll models or ritrrily vrying cross ll models. In the simple cse where prmeter is sttic, it cn e represented s System Genertor constnt lock nd hs no prticulr timing requirements. Unique prmeters per model cn redily e exploited through the use of circulr uffer of length n counting through ech prmeter per cycle. These prmeters must e synchronously ville reltive to the trget model t the correct insertion point within the model. Given n input I, delyed y d cycles, s represented in the difference eqution s I[p d], within pipeline of depth n, the circulr uffer must e initited with prmeters forwrd rotted y n d steps in order to mintin synchroniztion. Therefore, fter n d increments of the pipeline, the prmeter I will e inserted into the pipeline such tht d cycles lter, the output for tht model will pper t the root of the expression tree. In System Genertor, this circulr uffer is implemented s count limited counter rnging from 0 to n ddressing ROM with the prmeter vlues pre-initilized. The prerottion of the circulr uffer cn e ccomplished in two wys, either through chnge in the initil vlue of the counter or y rottion in the initil vlues in the ROM. The former method requires dedicted counter per unique prmeter set in the model. Therefore, the ltter method is preferred s rottion in the ROM requires no extr resources llowing one counter to e shred for ll prmeter tles. This pproch ws used for the current input I, for the trces illustrted in figure 7. The ROM mcro in System Genertor ecomes synthesized s synchronous memory such tht the output is registered. Rotting the uffer y n d indices compenstes for this one-cycle ltency. When n is reltively smll (n < 5), it is dvisle to use distriuted RAM resources when ville. In distriuted RAM, ech slice cn store up to 32 its of dt. When n is lrge or when logic resources re limited, these prmeter tles cn e kept in lock RAM. The decision to use lock RAM or distriuted RAM lrgely depends on the prticulr limiting resource constrint in model. 30

11 FPGA neurl model rchitectures Figure 7. Output trces from cycle-ccurte, fixed-point simultion within Simulink of ten concurrently executing Fitzhugh Ngumo models. The trces were generted from the model shown in figure 6. All prmeters were kept constnt with the exception of I, which ws vried from 0.99 to 2.97 with step of 0.22 illustrted from top trce to ottom trce. The first 3000 points re shown. Note tht this simultion would execute in pproximtely 0.25 ms on the FPGA, nd 50 such designs could run concurrently for totl of 500 models. 6. Coupled, multiple unit cse Coupled, multiple unit models re strightforwrd extension to the isolted multiple model, single unit cse. Coupling cn occur etween comprtments in morphologiclly complex neuron model, s electricl connection etween neurons in the form of gp junctions, or s chemicl connections, or synpses within popultion. This cse dels exclusively with homogenous popultions of neurons or neurl comprtments with some form of coupling or reltime interction. A prticulr exmple of ten-comprtment motoneuron designed in this mnner is descried in the previous work within this lortory (Weinstein nd Lee 2005). A coupling is defined s stte vrile from one unit cting s term or fctor of n intermedite clcultion of different unit. A unit cn e coupled to ll other units of model in the cse of fully interconnected popultion model. In contrst, unit might only e coupled to its neighours in the cse of liner multi-comprtment chin. These two cses re considered the generl cses of unit coupling, where there is regulrity etween the connections. When modelling non-generl cses where few connections re creted, it is often simpler to mp the scenrio ck into generl cse when possile. This my not turn out to e the most optiml implementtion. For exmple, in model of 20 neurons such tht ech neuron is coupled to nother to form pirs of hlf-centre oscilltors, ech neuron will tke input from only one other neuron nd output to only one neuron. If djcent neurons within the pipeline re coupled, then odd neurons will couple to the next neuron nd even neurons will couple to the previous neuron. The following descries two such pproches to generting this coupling logic. The first pproch is literl conversion of the coupling lgorithm into System Genertor locks. An even odd test cn e ccomplished y using the LSB (lest significnt it) of the prmeter set ddress counter s select line into 2: multiplexer. The counter will go from 0 to 9 in this exmple, toggling the LSB t ech cycle. The inputs of the multiplexer will e the voltge sttes from the djcent units. Following the convention where the first model is t the til of the register chin, when the multiplexer select line is 0, the unit is odd nd requires the stte from the following unit. The stte used will e tpped from the stte register chin t the point of the insertion dely of the lef node plus one. When the multiplexer select line is one, then the unit is even, nd the previous unit s stte is used, which will e tpped t the insertion dely minus one. Any prmeters cting on the coupled stte cn e processed s usul with 20 element ROM. The second pproch generlizes the coupling nd removes the multiplexer from the implementtion. This pproch provides for two inputs to ech model, one from the previous unit nd one from the next unit. Very often neurl models hve n intensity prmeter in the form of mximl current or conductnce. These prmeters cn e set to zero for the cses where there is no coupling nd the proper vlue when there is coupling. Two prmeter tles will e required, one for the even units nd one for the odd. This pproch, while wsteful in resources, simplifies design s only the stndrd rithmetic locks re required. Models requiring full interconnection etween ll units re resonle nd strightforwrd to design ut re generlly resource constrining. In the cse of synpses for fully interconnected popultion of n neurons, given recurrent connections, the logic requirements include n 2 synpses with n implementtions of the synptic mechnism including t lest n stte solvers, n prmeter tles of n depth hold the synptic weights, nd n dders in tree with log 2 (n) levelstosum the synptic input. As n ecomes lrger, the synpses tke on n ever-incresing percentge of FPGA resources, quickly limiting the scle of popultion models. Future work is needed to consider lterntive design pproches for incresing the size of neurl popultion models. 7. Discussion This pper serves to provide the methods for implementing stereotypicl neurl circuit model elements in n FPGA. While designing model in itself is strightforwrd, there re some key limittions nd res of future work to mke it s esy to use s softwre simultion. This discussion documents the chllenges of converting floting-point clcultions to fixed-point representtions, interfcing the model to externl systems, nd the limits of sclility within n FPGA. 7.. Precision determintion System Genertor provides no mens for hndling rel or floting-point numers. Insted, ll prmeters, sttes nd intermedite clcultions utilize fixed-point numers, defined y sign, numer of totl its nd numer of frctionl its. Before executing in hrdwre, it is necessry to set ech opertion to hve sufficient precision to void overflows, 3

12 R K Weinstein nd R H Lee Tle 3. Clcultions to determine rnge vlues per opertion type. R = R R 2 High rnge vlue (H ) Low rnge vlue (L ) Addition + H + H 2 L + L 2 Sutrction mx(h L 2, L H 2 ) min(h L 2, L H 2 ) Multipliction mx(l L 2, L H 2, H L 2, H H 2 ) min(l L 2, L H 2, H L 2, H H 2 ) Division / mx(l /L 2, L /H 2, H /L 2, H /H 2 ) min(l /L 2, L /H 2, H /L 2, H /H 2 ) The division ssumes tht the rnge of the inputs does not cross zero. underflows or functionl mismtches due to quntiztion errors. Excessive precision should e voided s re utiliztion nd performnce will suffer. Optiml precision for ll opertions is difficult if not n impossile gol nd is still n ctive re of reserch. On the other end of the scle, the numer of its required to gurntee full precision continuously increses fter ech opertion. (Full precision here mens precision sed on the rgument precisions rther thn the rguments themselves.) Full precision for multipliction, s implemented in System Genertor, is the sum of the numer of its of ech opernd. An ddition hs full precision when the numer of integer its of the output is the mximum of the integer its of ech opernd. Similrly, the numer of frctionl its of the output is the mximum of the numer of frctionl its of ech opernd. This is worst-cse, pessimistic pproch to determining precision through the pipeline y ssuming tht ll the full rnge of vlues re vlid for ech opertion nd tht mximum frctionl precision is lwys necessry. As n exmple, 7 it multiply (7 it unsigned or 8 it signed inputs) requires just one emedded multiplier. A 34 it multiply requires 4 emedded multipliers to clculte the prtil products nd dders to sum the two prtil products. When moving to 5 it multiplier, the resources jump to 9 emedded multipliers (Shi et l 2004). Excess precision will require dditionl ltency to mintin the sme throughput nd wste logic resources tht could e used for dditionl prllel opertions. We hve found severl techniques to reduce the required precision ut hve yet to report on generl lgorithm for determining the optiml precision. First, we cn reduce the numer of integer its per opertion y ounding the prmeters nd sttes within prcticl rnges. For exmple, for prticulr neuron firing, the memrne potentil might rnge from mv to mv requiring one sign it, seven integer its nd numer of frctionl its to chieve the desired resolution. Those signed seven integer its llow rnge of 27 to 26. Full precision uses the full rnge of the fixed-point representtion, ut in relity, only the usle rnge is required. We define S to denote signed numer (versus n unsigned, positive only vlue) nd R i = (L i, H i ), where L i nd H i re the low nd high vlues of the usle rnge of the ith lef node. When L i nd H i re of different signs or oth negtive, the numer is signed nd requires n extr it to denote the sign. Formlly, the sign, S i, nd the numer of integer its, Z i, given rnge R i re determined y:, L i < 0 S i =, H i < 0 0, otherwise Given : mx int = mx L i, H i { log2 (mxint) + mxint > 0 Z i = 0, otherwise. (2) Using the rnge nottion, R i, insted of the full precision to represent ech vlue, the numer of integer its, Z i, required throughout the pipeline is generlly reduced. We reclculte these rnge vlues t the output of ech opertion using the expressions in tle 3. Without further priori knowledge of the rnge of intermedite vlues within the pipeline, this provides n pproch to determining the numer of integer its required throughout the dt pth, voiding ny overflow conditions. Underflow conditions re more difficult to optimize round. An underflow occurs when the frction precision of n output of n opertion is truncted from full precision such tht smll numer is represented s zero. Underflows my or my not cuse ny discrepncies in simultion, depending on where in the dt pth they occur. In the cse of Hodgkin nd Huxley style potssium chnnel with n 4 kinetics, given n ctivting gte with f frctionl its of precision, the output vi two cscding multiplictions will require 4f frctionl its when utilizing full precision. Relisticlly, fourth-order kinetics requires no dditionl precision over first-order expression, in which cse n underflow would e cceptle. An dder tree comining ll currents clculted per chnnel would lso e possile cndidte for llowing underflows s very smll currents re diminished y lrger trnsients or lekge pths. Other clcultions, such s scling y the time step when performing integrtion, must e free from underflows to mintin functionl ehviour. The remining clss of errors, quntiztion errors, is sustntilly more difficult to reson through intuitively nd requires more rigorous pproch. This cn e through error nlysis techniques such s propgting reltive nd solute error through the dt pth. The inherent feedck in the system vi the integrtion steps mkes this nlysis exceedingly difficult. Alterntively, simultions cn help isolte the differences etween floting- nd fixed-point representtions of the model. There is still considerle work to e done in investigting wys of nlyzing fixed-point neurl models to minimize quntiztion error. 32

Digital Signal Processing: A Hardware-Based Approach

Digital Signal Processing: A Hardware-Based Approach Digitl Signl Processing: A Hrdwre-Bsed Approch Roert Esposito Electricl nd Computer Engineering Temple University troduction Teching Digitl Signl Processing (DSP) hs included the utilition of simultion

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers?

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers? Questions About Numbers Number Systems nd Arithmetic or Computers go to elementry school How do you represent negtive numbers? frctions? relly lrge numbers? relly smll numbers? How do you do rithmetic?

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Computer Arithmetic Logical, Integer Addition & Subtraction Chapter

Computer Arithmetic Logical, Integer Addition & Subtraction Chapter Computer Arithmetic Logicl, Integer Addition & Sutrction Chpter 3.-3.3 3.3 EEC7 FQ 25 MIPS Integer Representtion -it signed integers,, e.g., for numeric opertions 2 s s complement: one representtion for

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

Accelerating 3D convolution using streaming architectures on FPGAs

Accelerating 3D convolution using streaming architectures on FPGAs Accelerting 3D convolution using streming rchitectures on FPGAs Hohun Fu, Robert G. Clpp, Oskr Mencer, nd Oliver Pell ABSTRACT We investigte FPGA rchitectures for ccelerting pplictions whose dominnt cost

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

5 Regular 4-Sided Composition

5 Regular 4-Sided Composition Xilinx-Lv User Guide 5 Regulr 4-Sided Composition This tutoril shows how regulr circuits with 4-sided elements cn be described in Lv. The type of regulr circuits tht re discussed in this tutoril re those

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-208 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization ECEN 468 Advnced Logic Design Lecture 36: RTL Optimiztion ECEN 468 Lecture 36 RTL Design Optimiztions nd Trdeoffs 6.5 While creting dtpth during RTL design, there re severl optimiztions nd trdeoffs, involving

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets

More information

Course Administration

Course Administration /4/7 Spring 7 EE 363: Computer Orgniztion Arithmetic for Computers Numer Representtion & ALU Avinsh Kodi Deprtment of Electricl Engineering & Computer Science Ohio University, Athens, Ohio 457 E-mil: kodi@ohio.edu

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-245 Technicl notes on using Anlog Devices DSPs, processors nd development tools Contct our technicl support t dsp.support@nlog.com nd t dsptools.support@nlog.com Or visit our

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

Digital Design using HDLs EE 4755 Final Examination

Digital Design using HDLs EE 4755 Final Examination Nme Solution Digitl Design using HDLs EE 4755 Finl Exmintion Thursdy, 8 Decemer 6 :3-4:3 CST Alis The Hottest Plce in Hell Prolem Prolem Prolem 3 Prolem 4 Prolem 5 Prolem 6 Exm Totl (3 pts) ( pts) (5 pts)

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

A Reconfigurable Arithmetic Datapath Architecture

A Reconfigurable Arithmetic Datapath Architecture A Reconfigurle Arithmetic Dtpth Architecture Reiner W. Hrtenstein, Riner Kress, Helmut Reinig University of Kiserslutern Erwin-Schrödinger-Strße, D-67663 Kiserslutern, Germny Fx: 49 631 205 2640, emil:

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Introduction to hardware design using VHDL

Introduction to hardware design using VHDL Introuction to hrwre esign using VHDL Tim Güneysu n Nele Mentens ECC school Novemer 11, 2017, Nijmegen Outline Implementtion pltforms Introuction to VHDL Hrwre tutoril 1 Implementtion pltforms Microprocessor

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementry Figure y (m) x (m) prllel perpendiculr Distnce (m) Bird Stndrd devition for distnce (m) c 6 prllel perpendiculr 4 doi:.8/nture99 SUPPLEMENTARY FIGURE Confirmtion tht movement within the flock

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Concepts Introduced. A 1-Bit Logical Unit. 1-Bit Half Adder (cont.) 1-Bit Half Adder

Concepts Introduced. A 1-Bit Logical Unit. 1-Bit Half Adder (cont.) 1-Bit Half Adder oncepts Introduced A -Bit Logicl Unit sic rithmetic/logic unit clocks ltches nd ip-ops registers SRAMs nd RAMs nite stte mchines Below is -it logicl unit tht performs AN nd OR opertions Both the AN nd

More information

Basics of Logic Design Arithmetic Logic Unit (ALU)

Basics of Logic Design Arithmetic Logic Unit (ALU) Bsics of Logic Design Arithmetic Logic Unit (ALU) CPS 4 Lecture 9 Tody s Lecture Homework #3 Assigned Due Mrch 3 Project Groups ssigned & posted to lckord. Project Specifiction is on We Due April 9 Building

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Tilt-Sensing with Kionix MEMS Accelerometers

Tilt-Sensing with Kionix MEMS Accelerometers Tilt-Sensing with Kionix MEMS Accelerometers Introduction Tilt/Inclintion sensing is common ppliction for low-g ccelerometers. This ppliction note describes how to use Kionix MEMS low-g ccelerometers to

More information

LAB L Hardware Building Blocks

LAB L Hardware Building Blocks LAB L Hrdwre Building Blocks Perform the following groups of tsks: LL1.v 1. In previous l we creted the 2-to-1 mux shown in the left prt of the figure elow nd found tht it cts s n if sttement. c c 0 1

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

TOWARDS GRADIENT BASED AERODYNAMIC OPTIMIZATION OF WIND TURBINE BLADES USING OVERSET GRIDS

TOWARDS GRADIENT BASED AERODYNAMIC OPTIMIZATION OF WIND TURBINE BLADES USING OVERSET GRIDS TOWARDS GRADIENT BASED AERODYNAMIC OPTIMIZATION OF WIND TURBINE BLADES USING OVERSET GRIDS S. H. Jongsm E. T. A. vn de Weide H. W. M. Hoeijmkers Overset symposium 10-18-2012 Deprtment of mechnicl engineering

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Network Interconnection: Bridging CS 571 Fall Kenneth L. Calvert All rights reserved

Network Interconnection: Bridging CS 571 Fall Kenneth L. Calvert All rights reserved Network Interconnection: Bridging CS 57 Fll 6 6 Kenneth L. Clvert All rights reserved The Prolem We know how to uild (rodcst) LANs Wnt to connect severl LANs together to overcome scling limits Recll: speed

More information

Enginner To Engineer Note

Enginner To Engineer Note Technicl Notes on using Anlog Devices DSP components nd development tools from the DSP Division Phone: (800) ANALOG-D, FAX: (781) 461-3010, EMAIL: dsp_pplictions@nlog.com, FTP: ftp.nlog.com Using n ADSP-2181

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples COMPUTER SCIENCE 123 Foundtions of Computer Science 6. Tuples Summry: This lecture introduces tuples in Hskell. Reference: Thompson Sections 5.1 2 R.L. While, 2000 3 Tuples Most dt comes with structure

More information

Fault injection attacks on cryptographic devices and countermeasures Part 2

Fault injection attacks on cryptographic devices and countermeasures Part 2 Fult injection ttcks on cryptogrphic devices nd countermesures Prt Isrel Koren Deprtment of Electricl nd Computer Engineering University of Msschusetts Amherst, MA Countermesures - Exmples Must first detect

More information

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases Overview Mobile computing nd dtbses Generl issues in mobile dt mngement Dt dissemintion Dt consistency Loction dependent queries Interfces Detils of brodcst disks thlis klfigopoulos Network rchitecture

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

M-Historian and M-Trend

M-Historian and M-Trend Product Bulletin Issue Dte June 18, 2004 M-Historin nd The M-Historin mnges the collection nd rchiving of trend dt, nd enles the presenttion of rchived trend dt in the ssocited softwre component. M-Historin

More information

Qubit allocation for quantum circuit compilers

Qubit allocation for quantum circuit compilers Quit lloction for quntum circuit compilers Nov. 10, 2017 JIQ 2017 Mrcos Yukio Sirichi Sylvin Collnge Vinícius Fernndes dos Sntos Fernndo Mgno Quintão Pereir Compilers for quntum computing The first genertion

More information

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it.

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it. 6.3 Volumes Just s re is lwys positive, so is volume nd our ttitudes towrds finding it. Let s review how to find the volume of regulr geometric prism, tht is, 3-dimensionl oject with two regulr fces seprted

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-295 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

Stained Glass Design. Teaching Goals:

Stained Glass Design. Teaching Goals: Stined Glss Design Time required 45-90 minutes Teching Gols: 1. Students pply grphic methods to design vrious shpes on the plne.. Students pply geometric trnsformtions of grphs of functions in order to

More information

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers Mth Modeling Lecture 4: Lgrnge Multipliers Pge 4452 Mthemticl Modeling Lecture 4: Lgrnge Multipliers Lgrnge multipliers re high powered mthemticl technique to find the mximum nd minimum of multidimensionl

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-204 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

IZT DAB ContentServer, IZT S1000 Testing DAB Receivers Using ETI

IZT DAB ContentServer, IZT S1000 Testing DAB Receivers Using ETI IZT DAB ContentServer, IZT S1000 Testing DAB Receivers Using ETI Appliction Note Rel-time nd offline modultion from ETI files Generting nd nlyzing ETI files Rel-time interfce using EDI/ETI IZT DAB CONTENTSERVER

More information

Many analog implementations of CPG exist, typically using operational amplifier or

Many analog implementations of CPG exist, typically using operational amplifier or FPGA Implementtion of Centrl Pttern Genertor By Jmes J Lin Introuction: Mny nlog implementtions of CPG exist, typiclly using opertionl mplifier or trnsistor level circuits. These types of circuits hve

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

Lexical analysis, scanners. Construction of a scanner

Lexical analysis, scanners. Construction of a scanner Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.

More information

Simrad ES80. Software Release Note Introduction

Simrad ES80. Software Release Note Introduction Simrd ES80 Softwre Relese 1.3.0 Introduction This document descries the chnges introduced with the new softwre version. Product: ES80 Softwre version: 1.3.0 This softwre controls ll functionlity in the

More information

On Computation and Resource Management in Networked Embedded Systems

On Computation and Resource Management in Networked Embedded Systems On Computtion nd Resource Mngement in Networed Embedded Systems Soheil Ghisi Krlene Nguyen Elheh Bozorgzdeh Mjid Srrfzdeh Computer Science Deprtment University of Cliforni, Los Angeles, CA 90095 soheil,

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Approximation by NURBS with free knots

Approximation by NURBS with free knots pproximtion by NURBS with free knots M Rndrinrivony G Brunnett echnicl University of Chemnitz Fculty of Computer Science Computer Grphics nd Visuliztion Strße der Ntionen 6 97 Chemnitz Germny Emil: mhrvo@informtiktu-chemnitzde

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

Improved Clock-Gating through Transparent Pipelining

Improved Clock-Gating through Transparent Pipelining 2. Improved Clock-Gting through Trnsprent Pipelining Hns M. Jcobson IM T.J. Wtson Reserch Center, Yorktown, NY. hnsj@us.ibm.com STRCT This pper re-exmines the well estblished clocking principles of pipelines.

More information

Video-rate Image Segmentation by means of Region Splitting and Merging

Video-rate Image Segmentation by means of Region Splitting and Merging Video-rte Imge Segmenttion y mens of Region Splitting nd Merging Knur Anej, Florence Lguzet, Lionel Lcssgne, Alin Merigot Institute for Fundmentl Electronics, University of Pris South Orsy, Frnce knur.nej@gmil.com,

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes by disks: volume prt ii 6 6 Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem 6) nd the ccumultion process is to determine so-clled volumes

More information

Computer-Aided Multiscale Modelling for Chemical Process Engineering

Computer-Aided Multiscale Modelling for Chemical Process Engineering 17 th Europen Symposium on Computer Aided Process Engineesing ESCAPE17 V. Plesu nd P.S. Agchi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Computer-Aided Multiscle Modelling for Chemicl Process

More information

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012 Fculty of Mthemtics Wterloo, Ontrio N2L 3G1 Grde 7/8 Mth Circles Geometric Arithmetic Octoer 31, 2012 Centre for Eduction in Mthemtics nd Computing Ancient Greece hs given irth to some of the most importnt

More information

Virtual Machine (Part I)

Virtual Machine (Part I) Hrvrd University CS Fll 2, Shimon Schocken Virtul Mchine (Prt I) Elements of Computing Systems Virtul Mchine I (Ch. 7) Motivtion clss clss Min Min sttic sttic x; x; function function void void min() min()

More information

Approximate computations

Approximate computations Living with floting-point numers Stndrd normlized representtion (sign + frction + exponent): Approximte computtions Rnges of vlues: Representtions for:, +, +0, 0, NN (not numer) Jordi Cortdell Deprtment

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

1 Quad-Edge Construction Operators

1 Quad-Edge Construction Operators CS48: Computer Grphics Hndout # Geometric Modeling Originl Hndout #5 Stnford University Tuesdy, 8 December 99 Originl Lecture #5: 9 November 99 Topics: Mnipultions with Qud-Edge Dt Structures Scribe: Mike

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

Address/Data Control. Port latch. Multiplexer

Address/Data Control. Port latch. Multiplexer 4.1 I/O PORT OPERATION As discussed in chpter 1, ll four ports of the 8051 re bi-directionl. Ech port consists of ltch (Specil Function Registers P0, P1, P2, nd P3), n output driver, nd n input buffer.

More information