A MultiCore Numerical Framework for Characterizing Flow in Oil Reservoirs


 Beatrix Wilkerson
 7 months ago
 Views:
Transcription
1 A MultCore Numercal Framework for Characterzng Flow n Ol Reservors The MIT Faculty has made ths artcle openly avalable. Please share how ths access benefts you. Your story matters. Ctaton As Publshed Publsher Leonard, Chrstopher R. et al. "A MultCore Numercal Framework for Characterzng Flow n Ol Reservors." n Papers of the 19th Hgh Performance Computng Symposum (HPC 2011) Boston, Massachusetts, USA Aprl 4 6, Socety for Modelng & Smulaton Internatonal Verson Author's fnal manuscrpt Accessed Wed Apr 24 21:51:12 EDT 2019 Ctable Lnk Terms of Use Creatve Commons AttrbutonNoncommercalShare Alke 3.0 Detaled Terms
2 A MultCore Numercal Framework for Characterzng Flow n Ol Reservors Chrstopher R. Leonard, Cvl and Envronmental Engneerng, Massachusetts Insttute of Technology, 77 Massachusetts Avenue, Cambrdge, MA Davd W. Holmes, Department of Mechancal Engneerng, James Cook Unversty, Angus Smth Drve, Douglas, QLD 4811, Australa John R. Wllams, Cvl and Envronmental Engneerng and Engneerng Systems, Massachusetts Insttute of Technology, 77 Massachusetts Avenue, Cambrdge, MA Peter G. Tlke, Department of Mathematcs and Modelng, SchlumbergerDoll Research Center, 1 Hampshre Street, Cambrdge, MA Keywords: Parallel computaton, multcore, smoothed partcle hydrodynamcs, lattce Boltzmann method, enhanced ol recovery Abstract Ths paper presents a numercal framework that enables scalable, parallel executon of engneerng smulatons on multcore, shared memory archtectures. Dstrbuton of the smulatons s done by selectve hashtablng of the model doman whch spatally decomposes t nto a number of orthogonal computatonal tasks. These tasks, the sze of whch s crtcal to optmal cache blockng and consequently performance, are then dstrbuted for executon to multple threads usng the prevously presented task management algorthm, HDspatch. Two numercal methods, smoothed partcle hydrodynamcs (SPH) and the lattce Boltzmann method (LBM), are dscussed n the present work, although the framework s general enough to be used wth any explct tme ntegraton scheme. The mplementaton of both SPH and the LBM wthn the parallel framework s outlned, and the performance of each s presented n terms of speedup and effcency. On the 24core server used n ths research, near lnear scalablty was acheved for both numercal methods wth utlzaton effcences up to 95%. To close, the framework s employed to smulate flud flow n a porous rock specmen, whch s of broad geophyscal sgnfcance, partcularly n enhanced ol recovery. 1. INTRODUCTION The extenson of engneerng computatons from seral to parallel has had a profound effect on the scale and complexty of problems that can be modeled n contnuum and dscontnuum mechancs. Tradtonally, such parallel computng has almost exclusvely been undertaken wth dstrbuted memory parallel archtectures, such as clusters of sngleprocessor machnes. A number of authors have reported on parallel partcle methods (of whch smoothed partcle hydrodynamcs, SPH, s an example) demonstratng scalablty on such archtectures, for example Walther and Sbalzarn [1] and Ferrar et al. [2]. The lattce Boltzmann method (LBM) has also been a popular canddate for dstrbuted computng whch s unsurprsng due to the naturally parallel characterstcs of ts tradtonally regular, orthogonal grd and local node operatons. For example, Vdal et al. [3] presented results ncorporatng fve bllon LBM nodes wth a speedup effcency of 75% on 128 processors. Götz et al. [4] smulated dense partcle suspensons wth the LBM and a rgd body physcs engne on an SGI Altx system wth 8192 cores (based on dualcore processors). At 7800 processors an effcency of approxmately 60% s acheved n a smulaton featurng 15.6 bllon LBM nodes and 4.6 mllon suspended partcles. Bernasch et al. [5] utlzed a GPU mplementaton of a multcomponent LBM and n 2D smulatons of 4.2 mllon nodes acheved a speedup factor of 13 over ther benchmark CPU performance. Of partcular relevance to ths study s the work of Zeser et al. [6] n whch a parallel cache blockng strategy was used to optmally decompose spacetme of ther LBM smulatons. In addton, ths strategy was purported to be cacheoblvous so that the decomposed blocks were automatcally matched to the cache sze of the hardware used, mnmzng the latency of memory access durng the smulaton. Shared memory multcore processors have emerged n the last fve years as a relatvely nexpensve commercalofftheshelf hardware opton for techncal computng. Ther development has been motvated by the current clockspeed lmtatons that are hnderng the advancement, at least n terms of pure performance, of sngle processors [7]. However, as a comparatvely young technology, there exsts lttle publshed work ([8] s one example) addressng the mplementaton of numercal codes on shared memory multcore processors. Wth the expense and hgh demand for compute tme on cluster systems, multcore represents an attractve and accessble HPC alternatve, but the known challenges of software development on such archtectures (.e. thread safety and memory bandwdth ssues [9, 10, 11, 12]) must be addressed. Multcore technologes are mportantly begnnng to nfltrate all levels of computng, ncludng wthn each node of modern crossmachne
3 clusters. As such, the development of scalable programmng strateges for multcore wll have wdespread beneft. A varety of approaches to programmng on multcore have been proposed to date. Commonly, concurrency tools from tradtonal cluster computng lke MPI [13] and OpenMP [14] have been used to acheve fast takeup of the new technology. Unfortunately, fundamental dfferences n the crossmachne and multcore archtectures mean that such approaches are rarely optmal for multcore and result n poor scalablty for many applcatons. In response to ths, Chrysanthakopoulos and coworkers [15, 16], based on earler work by Stewart [17], have mplemented multcore concurrency lbrares usng Port based abstractons. These mmc the functonalty of a message passng lbrary lke MPI, but use shared memory as the medum for data exchange, rather than exchangng seralzed packets over TCP/IP. Such an approach provdes flexblty n program structure, whle stll captalzng on the speed advantages of shared memory. Perhaps as a reflecton of the growng mportance of multcore software development, a number of other concurrency lbrares have been developed such as Axum and Clk++. In addton, the latest.net Framework ncludes a Task Parallel Lbrary (TPL) whch provdes methods and types, wth varyng degrees of abstracton, whch can be used wth mnmal programmatc dffculty to dstrbute tasks on multple threads. In an earler paper [18] we have shown that a programmng model developed usng such portbased technques descrbed n [15, 16] provdes sgnfcant performance advantages over approaches lke MPI and OpenMP. Importantly, t was found that the HDspatch dstrbuton model facltated adustable cacheblockng whch allowed performance to be tuned va the computatonal task sze. In ths paper, we apply the proposed programmng model to the parallelzaton of both partcle based methods and fxedgrd numercal methods on multcore. The unque challenges n parallel mplementaton of both methods wll be dscussed and the performance mprovements wll be presented. The layout of ths paper s as follows. In Secton 2 a bref descrpton of the multcore dstrbuton model, H Dspatch, s provded. Both the SPH and LBM numercal methods are outlned n Secton 3 and the relevant aspects of ther mplementaton n the multcore framework, ncludng thread safety and cache memory effcency, are dscussed. Secton 4 presents performance test results from both the SPH and LBM smulators as run on a 24core server and, fnally, an applcaton of the multcore numercal framework to a porous meda flow problem relevant to enhanced ol recovery s presented n Secton MULTICORE DISTRIBUTION One of the hndrances to scalable, crossmachne dstrbuton of numercal methods s the communcaton of ghost regons. These regons correspond to neghborng sectons of the problem doman (resdent n memory on other cluster nodes) whch are requred on a cluster node for the processng of ts own subdoman. In the LBM ths s typcally a 'layer' of grd ponts that encapsulates the local subdoman, but n SPH the layer of neghborng partcles requred s equal to the radus of the compact support zone. In 3D t can be shown that, dependng on the subdoman sze, the communcated fracton of the problem doman can easly exceed 50%. In ths stuaton Amdahl's Law [7], and the fact that tradtonal crossmachne parallelsm usng messagng packages s a seral process, dctates that ths type of dstrbuted memory approach wll scale poorly. If a problem s dvded nto spatal subdomans for multcore dstrbuton, ghost regons are no longer necessary because adacent data s readly avalable n shared memory. Further, the removal of relatvely slow network communcatons requred n cluster computng allows for an entrely new programmng paradgm. Subdomans can take on any smple shape or sze and threaded programmng means many small subdomans can be processed on each core from an events queue rather than needng to approxmate a sngle large, computatonally balanced doman for each processor. Consequently, dynamc doman decomposton becomes unnecessary and a partcle's poston n a doman can be as smple as a spatal hashng, allowng advecton to proceed wth mnmal management. Such characterstcs mean that multcore s perfectly suted to the parallel mplementaton of partcle methods, however, shared memory challenges such as thread safety and bandwdth lmtatons must be addressed. The decomposton of the spatal doman of a numercal method creates a number of computatonal tasks. Multcore dstrbuton of these tasks requres the use of a coordnaton tool to manage them onto processng cores n a load balanced way. Whle such tasks could easly be dstrbuted usng a tradtonal approach lke scattergather, here the H Dspatch programmng model of [18] has been used because of the demonstrated advantages for performance and memory effcency. A schematc llustratng the functonalty of the H Dspatch programmng model s shown n Fgure 1. The fgure shows three endurng threads (correspondng to three processng cores) that reman actve through each tme step of the analyss. A smple problem space wth nne decomposed tasks s dstrbuted across these threads by H Dspatch. The novel feature of HDspatch s the way n whch tasks are dstrbuted to threads. Rather than a scatter or push of tasks from the manager to threads, here threads request values when free. HDspatch manages requests and dstrbutes cells to the requestng threads accordngly. It s ths pull mechansm that enables the use of a sngle thread per core as threads only request a value when free, thus, there s never more than one task at a tme assocated wth a
4 gven endurng thread (and ts assocated local varable memory). Addtonally, when all tasks n the problem space have been dspatched and processed, HDspatch dentfes step completon (.e. synchronzaton) and the process can begn agan. Fgure 1. Schematc representaton of the HDspatch programmng model [18] used to dstrbute tasks to cores. An endurng processng thread s avalable for each core, whch s three n ths smplfed representaton, and H Dspatch coordnates tasks to threads n a load balanced way over a number of tme steps. The key beneft of such an approach from the perspectve of memory usage s n the ablty to mantan a sngle set of local varables for each endurng thread. The numercal task assocated wth analyss on a subdoman wll nevtably requre local calculaton varables, often wth sgnfcant memory requrements (partcularly for the case of partcle methods). Overwrtng ths memory wth each allocated cell means the number of local varable sets wll match core count, rather than total cell count. Consderng that most problems wll be run wth core counts n the 10's or 100's, but cell counts n the 1,000's or 10,000's, ths can sgnfcantly mprove the memory effcency of a code. Addtonally, n managed codes lke C#.NET and Java, because thread local varable memory remans actve throughout the analyss, t s not repeatedly reclamed by the garbage collector, a process that holds all other threads untl completon and degrades performance markedly (see [18]). 3. NUMERICAL METHODS The multcore numercal framework featured n ths paper has been desgned n a general fashon so as to accommodate any explct numercal method, such as SPH, LBM, the dscrete element method (DEM), the fnte element method (FEM) or fnte dfference (FD) technques. It s worth notng that t could be adapted to accommodate mplct, teratve schemes wth the correct data structures for thread safety but that s not the focus of ths work. Instead, ths study wll focus on SPH and LBM, however the performance of the multcore framework wth an FD scheme has been prevously reported [18] Smoothed Partcle Hydrodynamcs SPH s a meshfree Lagrangan partcle method whch was frst proposed for the study of astrophyscal problems by Lucy [19] and Gngold and Monaghan [20], but s now wdely appled to flud mechancs problems [21]. A key advantage of partcle methods such as SPH (see also dsspatve partcle dynamcs (DPD) [22]) s n ther ablty to advect mass wth each partcle, thus removng the need to explctly track phase nterfaces for problems nvolvng multple flud phases or free surface flows. However, the management of free partcles brngs wth t the assocated computatonal cost of performng spatal reasonng at every tme step. Ths requres a search algorthm to determne whch partcles fall wthn the compact support (.e. nteracton) zone of a partcle and then processng each nteractng par. Nevertheless, n many crcumstances ths expense can be ustfed by the versatlty wth whch a varety of multphyscs phenomena can be ncluded. SPH theory has been detaled wdely n the lterature wth varous formulatons havng been proposed. The methodology of authors such as Tartakovsky and Meakn [23, 24] and Hu and Adams [25] has been shown to perform well for the case of multphase flud flows. Ther partcle number densty varant of the conventonal SPH formulaton removes erroneous artfcal surface tenson effects between phases and allows for phases of sgnfcantly dfferng denstes. Such a method has been used for the performance testng n ths work. The dscretzed partcle number densty SPH equatons for some feld quantty, A, s gven as, A A W r r, h, (1) n along wth ts gradent, A A W r r, h, (2) n where n m Wr r, h s the partcle number densty term, whle W s the smoothng functon (typcally a Gaussan or some form of splne), h s the smoothng length and r and r are poston vectors. These expressons are appled to the NaverStokes conservaton equatons to determne the SPH equatons of moton. Computng densty drectly from (1) gves, m W r r, h, (3) where ths expresson conserves mass exactly, much lke the summaton densty approach of conventonal SPH. An approprate term for partcle velocty rate has been provded by Morrs et al. [26], and used by Tartakovsky and Meakn [23], where,
5 dv dt n whch vscosty, 1 m N 1 1 m P P dw F 2 2 d r N r r v v nn r r 1 P s the partcle pressure, v s the partcle velocty and 2 dw dr (4) s the dynamc F s the body th force appled on the partcle. Surface tenson s ntroduced nto the method va the supermposton of parwse nterpartcle forces followng Tartakovsky and Meakn [24], 3 r r s cos r r, r r h F 2h r r, (5) 0, r h r wheren s s the strength of force between partcles and, whle h s the nteracton dstance of a partcle. By defnng s as beng stronger between partcles of the same phase, than between partcles of a dfferent phase, surface tenson manfests naturally as a result of force mbalances at phase nterfaces. Smlarly, s can be defned to control the wettablty propertes of a sold. Sold boundares n the smulator are defned usng rows of vrtual partcles smlar to that used by Morrs et al [26], and noslp boundary condtons are enforced for low Reynolds number flow smulatons usng an artfcally mposed boundary velocty method developed n [27] and shown to produce hgh accuracy results MultCore Implementaton of SPH Because partcle methods necesstate the recalculaton of nteractng partcle pars at regular ntervals, algorthms that reduce the number of canddate nteractng partcles to check are crtcal to numercal effcency. Ths s acheved by spatal hashng, whch assgns partcles to cells or 'bns' based on ther Cartesan coordnates. Wth a cell sde length greater than or equal to the nteracton depth of a partcle, all canddates for nteracton wth some target partcle wll be contaned n the target cell, or one of the mmedately neghborng cells. The storage of partcle cells s handled usng hash table abstractons such as the Dctonary<Tkey, Tvalue> class n C#.NET [28], and parallel dstrbuton s performed by allocaton of cell keys to processors from an events queue. In cases where data s requred from partcles n adacent cells, t s addressed drectly usng the key of the relevant cell. Wth the descrbed partcle cell decomposton of the doman care must be taken to avod common shared memory problems lke race condtons and thread contenton. To crcumvent the problems assocated wth usng locks (coarse graned lockng scales poorly, whle fne graned lockng s tedous to mplement and can ntroduce deadlockng condtons [29]) the SPH data can be structured to remove the possblty of thread contenton altogether. By storng the present and prevous values of the SPH feld varables, necessary gradent terms can be calculated as functons of values n prevous memory, whle updates are wrtten to the current value memory. Ths reduces the number of synchronzatons per tme step from two (f the gradent terms are calculated before synchronzng, followed by the update of the feld varables) to one, and a rollng memory algorthm swtches the ndex of prevous and current data wth successve tme steps. An mportant advantage of the use of spatal hashng to create partcle cells s the ease wth whch the cell sze can be used to optmze cache blockng. By adustng the cell sze, the assocated computatonal task can be reszed to ft n cache close to the processor (e.g. L1 or L2 cache levels). It can be shown that cells fttng completely n cache demonstrate a sgnfcantly better performance (15 to 30%) than those that overflow cache causng an ncrease n cache msses, because cache msses requre that data then be retreved from RAM wth a greater memory latency The Lattce Boltzmann Method The lattce Boltzmann method (LBM) (see [30] for a revew) has been establshed n the last 20 years as a powerful numercal method for the smulaton of flud flows. It has found applcaton n a vast array of problems ncludng magnetohydrodynamcs, multphase and multcomponent flows, flows n porous meda, turbulent flows and partcle suspensons. The prmary varables n the LBM are the partcle dstrbuton functons, f x,t, whch exst at each of the lattce nodes that comprse the flud doman. These functons relate the probable amount of flud partcles movng wth a dscrete speed n a dscrete drecton at each lattce node at each tme ncrement. The partcle dstrbuton functons are evolved at each tme step va the twostage, colldestream process as defned n the lattcebhatnagar GrossKrook [31] equaton (LBGK), t eq f x ct, t t f x, t f x, t f x, t,(6) AG c n whch x defnes the node coordnates, t s the explct tme step, c x / t defnes the lattce veloctes, s the relaxaton tme, f eq x t, are the nodal equlbrum functons, G s a body force (e.g. gravty) and A s a massconservng constant. The collson process, whch s descrbed by the frst two terms n the RHS of (6), monotoncally relaxes the partcle dstrbuton functons towards ther respectve equlbra. The redstrbuted
6 functons are then adusted by the body force term, after whch the streamng process propagates them to ther nearest neghbor nodes. Spatal dscretzaton n the LBM s typcally based on a perodc array of polyhedra, but ths s not mandatory [32]. A choce of lattces s avalable n two and three dmensons wth an ncreasng number of veloctes and therefore symmetry. However, the benefts of ncreased symmetry can be offset by the assocated computatonal cost, especally n 3D. In the present work the D3Q15 lattce s employed, whose velocty vectors are ncluded n (7) c c (7) The macroscopc flud varables, densty, f, and momentum flux, f c, are calculated at each lattce node as velocty moments of the partcle dstrbuton functons. The defntons of the flud pressure and vscosty are byproducts of the ChapmanEnskog expanson (see [34] for detals), whch shows how the NaverStokes equatons are recovered n the nearncompressble lmt wth sotropy, Gallean nvarance and a velocty ndependent pressure. An sothermal equaton of state, p c 2 s, n whch c s c / 3 s the lattce speed of sound, s used to calculate the pressure drectly from the densty, whle the knematc vscosty, x, (8) 3 2 t s evaluated from the relaxaton and dscretzaton parameters. The requrement of postve vscosty n (8) mandates that 1 2 and to ensure nearncompressblty of the flow the computatonal Mach number s lmted, Ma u cs 1. The most straghtforward approach to handlng wall boundary condtons s to employ the bounceback technque. Although t has been shown to be generally frstorder n accuracy [35], as opposed to the second order accuracy of the lattce Boltzmann equaton at nternal flud nodes [30], ts operatons are local and the orentaton of the boundary wth respect to the grd s rrelevant. A number of alternatve wall boundary technques [36, 37] that offer generalzed secondorder convergence are avalable n the LBM, however these are at the expense of the localty and smplcty of the bounceback condton. In the present work, the mmersed movng boundary (IMB) method of Noble and Torczynsk [38] s employed to handle the hydrodynamc couplng of the flud and structure. In ths method the LBE s modfed to nclude an addtonal collson term whch s dependent on the proporton of the nodal cell that s covered by sold, thus mprovng the boundary representaton and smoothng the hydrodynamc forces calculated at an obstacle's boundary nodes as t moves relatve to the grd. Consequently, t overcomes the momentum dscontnuty of bounceback and lnkbouncebackbased [39] technques and provdes adequate representaton of nonconformng boundares at lower grd resolutons. It also retans two crtcal advantages of the LBM, namely the localty of the collson operator and the smple lnear streamng operator, and thus facltate solutons nvolvng large numbers of rregularshaped, movng boundares. Further detals of the IMB method and the couplng of the LBM to the DEM, ncludng an assessment of mxed boundary condtons n varous flow geometres, can be found n Owen et al. [40] MultCore Implementaton of the LBM Two characterstc aspects of the LBM often result n t beng descrbed as a naturally parallel numercal method. The frst feature s the regular, orthogonal dscretzaton of space, whch s typcal of Euleran schemes, and can smplfy doman decomposton. The second feature s the use of only local data to perform nodal operatons, whch consequently results n partcle dstrbuton functons at a node beng updated usng only the prevous values. However, t should be noted that ncluson of addtonal features such as flux boundary condtons and non Newtonan rheology, f not mplemented carefully, can negate the localty of operatons. The obvous choce for decomposton of the LBM doman s to use cubc nodal bundles, as shown schematcally n Fgure 2. The bundles are analogous to the partcle cells that were used n SPH, and smlarly H Dspatch s used to dstrbute bundle keys to processors. Data storage s handled usng a Dctonary of bundles, whch are n turn Dctonares of nodes. Ths technque s used, as opposed to a master Dctonary of all nodes, to overcome problems that can occur wth Collecton lmts (approxmately 90 mllon on the 64bt server used here). By defnton, the LBM nodal bundles can be used to perform cache blockng ust as the partcle cells were n SPH. Wth the correct bundle sze, the assocated computatonal task can be stored sequentally n processor cache and the latency assocated wth RAM access can be mnmzed. Smlar technques for the LBM have been reported n [41, 42] and extended to perform decomposton of spacetme [6] (as opposed to ust space) n a way that s ndependent of cache sze (a recursve algorthm s used to determne the optmal block sze). To ensure thread safety, two copes of the LBM partcle dstrbuton functons at each node are stored. Nodal processng s undertaken usng the current values, whch are
7 then overwrtten and propagated to the future data sets of approprate neghbor nodes. Technques such as SHIFT [43] have been presented whch employ specalzed data structures that remove the need for storng two copes of the partcle dstrbuton functons, however ths s at the expense of the flexblty of the code. Note that the colldepush sequence mplemented here can easly be reordered to a pullcollde sequence, wth each havng ther own subtle convenences and performance benefts dependng on the data structure and hardware employed [44, 42]. mcrotomographc mages of ol reservor rock (see Secton 5). Approxmately 1.4 mllon partcles were used n the smulaton and the executon duraton was defned as the tme n seconds taken to complete a tme step, averaged over 100 steps. For the doublesearch algorthm a speedup of approxmately 22 was acheved wth 24 cores, whch corresponds to an effcency of approxmately 92%. Ths, n conuncton wth the fact that the processor scalng response s near lnear, s an excellent result. Fgure 3. Multcore, parallel performance of the SPH solver contrastng the counterntutve scalablty of the snglesearch and doublesearch algorthms. Fgure 2. Schematc representaton of the decomposton of the LBM doman nto nodal bundles. Data storage s handled usng a Dctonary of bundles, whch are n turn Dctonares of nodes. 4. PARALLEL PERFORMANCE OF METHODS The parallel performance of SPH and the LBM n the multcore numercal framework was tested on a 24core Dell Server PER900 wth Intel Xeon CPU, 2.40 GHz, runnng 64bt Wndows Server Enterprse Here, two metrcs are used to defne the scalablty of the smulaton framework, namely SpeedUp t / t and 1proc nproc Effcency SpeedUp / N. Obvously, dealzed maxmum performance corresponds to a speedup rato equal to the number of cores at whch pont effcency would be 100%. Fgure 3 graphs the ncreasng speedup of the SPH solver wth ncreasng cores. The test problem smulated flow through a porous geometry determned from The results n Fgure 3 also provde an nterestng nsght nto the comparatve benefts of mnmzng computaton or mnmzng memory commt when usng multcore hardware. The snglesearch result s attaned wth a verson of the solver that performs the spatal reasonng once per tme step and stores the results n memory for use twce per tme step. Conversely, the doublesearch results are acheved when the code s modfed to perform the spatal search twce per tme step, as needed. Intutvely, the snglesearch approach requres a greater memory commt but the doublesearch approach requres more computaton. However, t s counterntutve to see that doublesearch sgnfcantly outperforms snglesearch, especally as the number of processors ncreases. Ths can be attrbuted to better cache blockng of the second approach and the smaller amount of data experencng latency when loaded from RAM to cache. The fact that such performance gans only manfest when more than 10 cores are used, suggests that for less than 10 cores, RAM ppelne bandwdth s suffcent to handle a global nteracton lst. As n the SPH testng, the LBM solver was assessed n terms of speedup and effcency. Fgure 4 graphs speedup aganst the number of cores for a 3D duct flow problem on
8 a doman. Perodc boundares were employed on the nflow and outflow surfaces and the bounceback wall boundary condton was used on the remanng surfaces. A constant body force was used to drve the flow. The multcore framework, usng the SPH solver, was appled to numercally determne the porostypermeablty relatonshp of a sample of Dolomte. The structural model geometry was generated from Xray mcrotomographc mages of the sample, whch were taken from a 4.95mm dameter cylndrcal core sample, 5.43mm n length and wth an mage resoluton of 2.8m. Ths produced a voxelated mage set that s n sze. Current hardware lmtatons prevent the full sample from beng analyzed n one numercal experment, therefore subblocks (see the nsets n Fgure 5) of voxel dmensons were taken from the full mage set to carry out flow testng and map the porostypermeablty relatonshp. Fgure 4. Multcore, parallel performance of the LBM solver for varyng bundle and doman szes. The sde length, n nodes, of the bundles was vared between 10 and 50 and the dfference n performance for each sze can clearly be seen. Optmal performance s acheved at a sde length of 20, where the speedup and effcency are approxmately 22 and 92%, respectvely, on 24 cores. Ths bundle sze represents the best cache blockng scenaro for the tested hardware. As the bundle sze s ncreased, performance degrades due to the computatonal tasks becomng too large for storage n cache whch results n slow communcaton wth RAM for data. The bundle sde length of 20 was then transferred to dentcal problems usng a and doman, and the results of these tests are also ncluded n Fgure 4. As n the smaller problem, near lnear scalablty s acheved and at 24 cores the speed ups are approxmately 23 and 22, and the effcences are approxmately 95% and 92% for and 400 3, respectvely. Ths s an mportant result, as t suggests that the optmum bundle sze for 3D LBM problems can be determned n an a pror fashon for a specfc archtecture. 5. APPLIED PERMEABILITY EXPERIMENT The permeablty of reservor rocks to sngle and multple flud phases s of mportance to many enhanced ol recovery procedures. Tradtonally, these data are determned expermentally from cored samples of rock. To be able to perform these experments numercally would present sgnfcant cost and tme savngs, and therefore t s the focus of the present study. Fgure 5. Results of the Dolomte permeablty tests undertaken usng SPH and the multcore framework. Expermental data [45] are ncluded for comparson. The assemblage of SPH partcles was ntalzed wth a densty of approxmately one partcle per voxel. Flud partcles (.e. those located n the pore space) were assgned the propertes of water ( 0 = 10 3 kgm 3, = 106 m 2 s 1 ) and boundary partcles located further than 6h from a boundary surface were deleted for effcency. The four doman surfaces parallel to the drecton of flow were desgnated noflow boundares and the nflow and outflow surfaces were specfed as perodc. Due to the ncompatblty of the two rock surfaces at these boundares, perodcty could not be appled drectly. Instead, the expermental arrangement was replcated by addng a narrow volume of flud at the top and bottom of the doman. Fnally, all smulatons were drven from rest by a constant body force equvalent to an appled pressure dfferental across the sample. The results of the permeablty tests are graphed n Fgure 5 and expermental data [45] relevant to the gran sze of the sample are ncluded for comparson. It can be
9 seen that the numercal results le wthn the expermental band, suggestng that the presented numercal procedure s approprate. As expected, each subblock exhbts a dfferent porosty and by analyzng a range of subblocks a porostypermeablty curve for the rock sample can be defned. 6. CONCLUDING REMARKS In ths paper a parallel, multcore framework has been appled to the SPH and LBM numercal methods for the soluton of fludstructure nteracton problems n enhanced ol recovery. Important aspects of ther mplementaton, ncludng spatal decomposton, data structurng and the management of thread safety have been brefly dscussed. Near lnear speedup over 24 cores was found n testng and peak effcences of 92% n SPH and 95% n LBM were attaned at 24 cores. The mportance of optmal cache blockng was demonstrated, n partcular n the LBM results, by varyng the dstrbuted computatonal task sze va the sze of the nodal bundles. Ths mnmzed the cache msses durng executon and the latency assocated wth accessng RAM. In addton, t was found that the optmal nodal bundle sze n the 3D LBM could be transferred to larger problem domans and acheve smlar performance, suggestng an a pror technque for determnng the best computatonal task sze for parallel dstrbuton. Fnally, the multcore framework wth the SPH solver was appled n a numercal experment to determne the porostypermeablty relatonshp of a sample of Dolomte (.e. a canddate reservor rock). Due to hardware lmtatons, a number of subblocks of the complete mcrotomographc mage of the rock sample were tested. Each subblock was found to have a unque porosty and correspondng permeablty, and when these were supermposed on relevant expermental data the correlaton was excellent. Ths result provdes strong support for the numercal expermentaton technque presented. Future work wll extend testng of the multcore framework to 64core and 256core server archtectures. However, the next maor numercal development les n the extenson of the flud capabltes n SPH and the LBM to multple flud phases. Ths wll allow the predcton of the relatve permeablty of rock samples whch s essental to dranage and mbbton processes n enhanced ol recovery. Acknowledgements The authors are grateful to the SchlumbergerDoll Research Center for ther support of ths research. References [1] J. H. Walther and I. F. Sbalzarn. Largescale parallel dscrete element smulatons of granular flow. Engneerng Computatons, 26(6): , [2] A. Ferrar, M. Dumbser, E. F. Toro, and A. Armann. A new 3D parallel SPH scheme for free surface flows. Computers & Fluds, 38(6): , [3] D. Vdal, R. Roy, and F. Bertrand. A parallel workload balanced and memory effcent lattceboltzmann algorthm wth sngle unt BGK relaxaton tme for lamnar Newtonan flows. Computers & Fluds, 39(8): , [4] J. Götz, K. Iglberger, C. Fechtnger, S. Donath, and U. Rüde. Couplng multbody dynamcs and computatonal flud dynamcs on 8192 processor cores. Parallel Computng, 36(23): , [5] M. Bernasch, L. Ross, R. Benz, M. Sbragagla, and S. Succ. Graphcs processng unt mplementaton of lattce Boltzmann models for flowng soft systems. Physcal Revew E, 80(6):066707, [6] T. Zeser, G. Wellen, A. Ntsure, K. Iglberger, U. Rude, and G. Hager. Introducng a parallel cache oblvous blockng approach for the lattce Boltzmann method. Progress n Computatonal Flud Dynamcs, 8(14): , [7] M. Herlhy and N. Shavt. The art of multprocessor programmng. Morgan Kaufman, [8] L. Valant. A brdgng model for multcore computng. Lecture Notes n Computer Scence, 5193:1328, [9] K. Poulsen. Software bug contrbuted to blackout. Securty Focus, [10] J. Dongarra, D. Gannon, G. Fox, and K. Kennedy. The mpact of multcore on computatonal scence software. CTWatch Quarterly, 3(1), [11] C. E. Leserson and I. B. Mrman. How to Survve the Multcore Software Revoluton (or at Least Survve the Hype). Clk Arts, Cambrdge, [12] N. Snger. More chp cores can mean slower supercomputng, Sanda smulaton shows. Sanda Natonal Laboratores News Release, Avalable: ore.html [13] W. Gropp, E. Lusk, and A. Skellum. Usng MPI: Portable Parallel Programmng Wth the MessagePassng Interface. MIT Press, Cambrdge, [14] M. CurtsMaury, X. Dng, C. D. Antonopoulos, and D. S. Nkolopoulos. An evaluaton of OpenMP on current and emergng multthreaded/multcore processors. In M. S. Mueller, B. M. Chapman, B. R. de Supnsk, A. D. Malony, and M. Voss, edtors, OpenMP Shared Memory Parallel Programmng, Lecture Notes n Computer Scence, 4315/2008: , Sprnger, Berln/Hedelberg, [15] G. Chrysanthakopoulos and S. Sngh. An asynchronous messagng lbrary for C#. In Proceedngs of the Workshop on Synchronzaton and Concurrency n ObectOrented Languages, 8997, San Dego, [16] X. Qu, G. Fox, G. Chrysanthakopoulos, and H. F. Nelsen. Hgh performance multparadgm messagng runtme on multcore systems. Techncal report, Indana
10 Unversty, Avalable: ptlupages/publcatons/ccraprl16open.pdf. [17] D. B. Stewart, R. A. Volpe, and P. K. Khosla. Desgn of dynamcally reconfgurable realtme software usng portbased obects. IEEE Transactons on Software Engneerng, 23: , [18] D. W. Holmes, J. R. Wllams, and P. G. Tlke. An events based algorthm for dstrbutng concurrent tasks on multcore archtectures. Computer Physcs Communcatons, 181(2): , [19] L. B. Lucy. A numercal approach to the testng of the fsson hypothess. Astronomcal Journal, 82: , [20] R. A. Gngold and J. J. Monaghan. Smoothed partcle hydrodynamcs: Theory and applcaton to nonsphercal stars. Monthly Notces of the Royal Astronomcal Socety, 181: , [21] G. R. Lu and M. B. Lu. Smoothed Partcle Hydrodynamcs: a meshfree partcle method. World Scentfc, Sngapore, [22] M. Lu, P. Meakn, and H. Huang. Dsspatve partcle dynamcs smulatons of multphase flud flow n mcrochannels and mcrochannel networks. Physcs of Fluds, 19:033302, [23] A. M. Tartakovsky and P. Meakn. A smoothed partcle hydrodynamcs model for mscble flow n threedmensonal fractures and the twodmensonal Raylegh Taylor nstablty. Journal of Computatonal Physcs, 207: , [24] A. M. Tartakovsky and P. Meakn. Pore scale modelng of mmscble and mscble flud flows usng smoothed partcle hydrodynamcs. Advances n Water Resources, 29: , [25] X. Y. Hu and N. A. Adams. A multphase SPH method for macroscopc and mesoscopc flows. Journal of Computatonal Physcs, 213: , [26] J. P. Morrs, P. J. Fox, and Y. Zhu. Modelng low Reynolds number ncompressble flows usng SPH. Journal of Computatonal Physcs, 136: , [27] D. W. Holmes, J. R. Wllams, and P. G. Tlke. Smooth partcle hydrodynamcs smulatons of low Reynolds number flows through porous meda. Internatonal Journal for Numercal and Analytcal Methods n Geomechancs, n/a. do: /nag.898, [28] J. Lberty and D. Xe. Programmng C# 3.0: 5 th Edton. O'Relly Meda, Sebastopol, [29] A. R. AdlTabataba, C. Kozyraks, and B. Saha. Unlockng concurrency. Queue, 4(10):2433, [30] S. Chen and G. D. Doolen. Lattce Boltzmann method for flud flows. Annual Revew of Flud Mechancs, 30: , [31] H. Chen, S. Chen, and W. H. Matthaeus. Recovery of the NaverStokes equatons usng a lattcegas Boltzmann method. Physcal Revew A, 45(8):R5339R5342, [32] X. He and G. D. Doolen. Lattce Boltzmann method on a curvlnear coordnate system: Vortex sheddng behnd a crcular cylnder. Physcal Revew E, 56(1):434440, [33] U. Frsch, B. Hasslacher, and Y. Pomeau. Lattcegas automata for the NaverStokes equaton. Physcal Revew Letters, 56(14): , [34] S. Hou, Q. Zou, S. Chen, G. Doolen, and A. C. Cogley. Smulaton of cavty flow by the lattce Boltzmann method. Journal of Computatonal Physcs, 118(2): , [35] R. Cornubert, D. d'humères, and D. Levermore. A Knudsen layer theory for lattce gases. Physca D, 47(12): , [36] T. Inamuro, M. Yoshno, and F. Ogno. A nonslp boundary condton for lattce Boltzmann smulatons. Physcs of Fluds, 7(12): , [37] D. R. Noble, S. Chen, J. G. Georgads, and R. O. Buckus. A consstent hydrodynamc boundary condton for the lattce Boltzmann method. Physcs of Fluds, 7(1): , [38] D. R. Noble and J. R. Torczynsk. A lattceboltzmann method for partally saturated computatonal cells. Internatonal Journal of Modern Physcs C, 9(8): , [39] A. J. C. Ladd. Numercal smulatons of partculate suspensons va a dscretzed Boltzmann equaton. Part 1. Theoretcal foundaton. Journal of Flud Mechancs, 271: , [40] D. R. J. Owen, C. R. Leonard, and Y. T. Feng. An effcent framework for fludstructure nteracton usng the lattce Boltzmann method and mmersed movng boundares. Internatonal Journal for Numercal Methods n Engneerng, n/a. do: /nme.2985, [41] T. Pohl, F. Deserno, N. Thurey, U. Rude, P. Lammers, G. Wellen, and T. Zeser. Performance evaluaton of parallel largescale Lattce Boltzmann applcatons on three supercomputng archtectures. In SC '04: Proceedngs of the 2004 ACM/IEEE conference on Supercomputng, 2133, Washngton, [42] G. Wellen, T. Zeser, G. Hager, and S. Donath. On the sngle processor performance of smple lattce Boltzmann kernels. Computers & Fluds, 35(89): , [43] J. Ma, K. Wu, Z. Jang, and G. D. Couples. SHIFT: An mplementaton for lattce Boltzmann smulaton n lowporosty porous meda. Physcal Revew E, 81(5):056702, [44] T. Pohl, M. Kowarschk, J. Wlke, K. Iglberger, and U. Rüde. Optmzaton and proflng of the cache performance of parallel Lattce Boltzmann codes. Parallel Processng Letters, 13(4): , [45] R. M. Sneder and J. S. Sneder. New ol n old places. Search and Dscovery, 10007, Avalable: der/
Parallelism for Nested Loops with Nonuniform and Flow Dependences
Parallelsm for Nested Loops wth Nonunform and Flow Dependences SamJn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseodong, Cheonan, Chungnam, 33080, Korea. seong@cheonan.ac.kr
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memorymanagement methods normally requres the entre process
More informationS.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?
S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION? Célne GALLET ENSICA 1 place Emle Bloun 31056 TOULOUSE CEDEX emal :cgallet@ensca.fr Jean Luc LACOME DYNALIS Immeuble AEROPOLE  Bat 1 5, Avenue Albert
More informationContent Based Image Retrieval Using 2D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSRJECE) eissn: 78834,p ISSN: 788735.Volume 9, Issue, Ver. IV (Mar  Apr. 04), PP 007 Content Based Image Retreval Usng D Dscrete Wavelet wth
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 8893 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley ChunHung
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationAn EntropyBased Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An EntropyBased Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs CheeHyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationDynamic wetting property investigation of AFM tips in micro/nanoscale
Dynamc wettng property nvestgaton of AFM tps n mcro/nanoscale The wettng propertes of AFM probe tps are of concern n AFM tp related force measurement, fabrcaton, and manpulaton technques, such as dppen
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 631, Nuku, Katsuskaku,
More informationSimulation of a Ship with Partially Filled Tanks Rolling in Waves by Applying Moving Particle SemiImplicit Method
Smulaton of a Shp wth Partally Flled Tanks Rollng n Waves by Applyng Movng Partcle SemImplct Method JenShang Kouh Department of Engneerng Scence and Ocean Engneerng, Natonal Tawan Unversty, Tape, Tawan,
More informationEfficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units
Invted Artcle Computer Scence & Technology March 2012 Vol.57 No.7: 707 715 do: 10.1007/s114340114908y SPECIAL TOPICS: Effcent parallel mplementaton of the lattce Boltzmann method on large clusters of
More informationAADL : about scheduling analysis
AADL : about schedulng analyss Schedulng analyss, what s t? Embedded realtme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationParallel matrixvector multiplication
Appendx A Parallel matrxvector multplcaton The reduced transton matrx of the threedmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8, 7 ISSN 5966 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. AlRababah and Mohammad A. AlRababah Faculty of IT, AlAhlyyah Amman Unversty,
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationAccounting for the Use of Different Length Scale Factors in x, y and z Directions
1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,
More informationA Fast ContentBased Multimedia Retrieval Technique Using Compressed Data
A Fast ContentBased Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of BozenBolzano Faculty of Computer Scence Academc Year 01101 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 Bsplne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationDetermining the Optimal Bandwidth Based on Multicriterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Multcrteron Fuson HaL Lang 1+, XanMn
More informationInteraction Methods for the SPH Parts (Multiphase Flows, Solid Bodies) in LSDYNA
13 th Internatonal LSDYNA Users Conference Sesson: Flud Structure Interacton Interacton Methods for the SPH Parts (Multphase Flows, Sold Bodes) n LSDYNA Jngxao Xu, Jason Wang Lvermore Software Technology
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationLoad Balancing for HexCell Interconnection Network
Int. J. Communcatons, Network and System Scences,,,  Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for HexCell Interconnecton Network Saher Manaseer,
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationAMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton GaussSedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationAnalysis on the Workspace of Sixdegreesoffreedom Industrial Robot Based on AutoCAD
Analyss on the Workspace of Sxdegreesoffreedom Industral Robot Based on AutoCAD Jnquan L 1, Ru Zhang 1,a, Fang Cu 1, Q Guan 1 and Yang Zhang 1 1 School of Automaton, Bejng Unversty of Posts and Telecommuncatons,
More informationWavefront Reconstructor
A Dstrbuted Smplex BSplne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 1412201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex BSplnes
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationLoadBalanced Anycast Routing
LoadBalanced Anycast Routng ChngYu Ln, JungHua Lo, and SyYen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For faulttolerance and loadbalance
More informationTerm Weighting Classification System Using the Chisquare Statistic for the Classification Subtask at NTCIR6 Patent Retrieval Task
Proceedngs of NTCIR6 Workshop Meetng, May 1518, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Chsquare Statstc for the Classfcaton Subtask at NTCIR6 Patent Retreval Task Kotaro Hashmoto
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAXSAT wth weghts, such that the
More informationHighBoost Mesh Filtering for 3D Shape Enhancement
HghBoost Mesh Flterng for 3D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, AzuWakamatsu 9658580 Japan y Computer Graphcs Group,
More informationAVO Modeling of Monochromatic Spherical Waves: Comparison to BandLimited Waves
AVO Modelng of Monochromatc Sphercal Waves: Comparson to BandLmted Waves Charles Ursenbach* Unversty of Calgary, Calgary, AB, Canada ursenbach@crewes.org and Arnm Haase Unversty of Calgary, Calgary, AB,
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn emal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of coregulated and functonally
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationThe Performance of Geothermal Field Modeling in Distributed Component Environment
Porows A., Peta A., Kowal A., Dane T.: The Performance of Geothermal Feld Modelng n Dstrbuted Component Envronment. In: Innovatons n Computng Scences and Software Engneerng, Sprnger,, pp 7983. 79 Ths
More informationActive Contours/Snakes
Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng
More informationX Chart Using ANOM Approach
ISSN 16848403 Journal of Statstcs Volume 17, 010, pp. 33 Abstract X Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationVery simple computational domains can be discretized using boundaryfitted structured meshes (also called grids)
Structured meshes Very smple computatonal domans can be dscretzed usng boundaryftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More information2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements
Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.
More informationy and the total sum of
Lnear regresson Testng for nonlnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth HernándezReyes, J.A. CarrascoOchoa, and J.Fco. MartínezTrndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationFinite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c
Advanced Materals Research Onlne: 03063 ISSN: 668985, Vol. 705, pp 4044 do:0.408/www.scentfc.net/amr.705.40 03 Trans Tech Publcatons, Swtzerland Fnte Element Analyss of Rubber Sealng Rng Reslence Behavor
More informationChapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and ZhHua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationModeling, Manipulating, and Visualizing Continuous Volumetric Data: A Novel Splinebased Approach
Modelng, Manpulatng, and Vsualzng Contnuous Volumetrc Data: A Novel Splnebased Approach Jng Hua Center for Vsual Computng, Department of Computer Scence SUNY at Stony Brook Talk Outlne Introducton and
More informationOptimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~3x Execute
More informationOverview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION
Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup
More informationFitting: Deformable contours April 26 th, 2018
4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationTopology Design using LSTaSC Version 2 and LSDYNA
Topology Desgn usng LSTaSC Verson 2 and LSDYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LSTaSC verson 2, a topology optmzaton tool
More informationRelatedMode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 RelatedMode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationResearch Article Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU
Hndaw Scentfc Programmng Volume 2017, Artcle ID 1205892, 16 pages https://do.org/10.1155/2017/1205892 Research Artcle Performance Optmzaton of 3D Lattce Boltzmann Flow Solver on a GPU NhatPhuong Tran,
More informationAn Application of the DulmageMendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 25492554 An Applcaton of the DulmageMendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationAutomatic selection of reference velocities for recursive depth migration
Automatc selecton of mgraton veloctes Automatc selecton of reference veloctes for recursve depth mgraton Hugh D. Geger and Gary F. Margrave ABSTRACT Wave equaton depth mgraton methods such as phaseshft
More informationChapter 1. Comparison of an O(N ) and an O(N log N ) N body solver. Abstract
Chapter 1 Comparson of an O(N ) and an O(N log N ) N body solver Gavn J. Prngle Abstract In ths paper we compare the performance characterstcs of two 3dmensonal herarchcal Nbody solvers an O(N) and
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationSolving twoperson zerosum game by Matlab
Appled Mechancs and Materals Onlne: 20110202 ISSN: 16627482, Vols. 5051, pp 262265 do:10.4028/www.scentfc.net/amm.5051.262 2011 Trans Tech Publcatons, Swtzerland Solvng twoperson zerosum game by
More informationSPH and ALE formulations for sloshing tank analysis
Int. Jnl. of Multphyscs Volume 9 Number 3 2015 209 SPH and ALE formulatons for sloshng tank analyss Jngxao Xu 1, Jason Wang 1 and Mhamed Soul*, 2 1 LSTC, Lvermore Software Technology Corp. Lvermore CA
More informationLSTaSC Version 2.1. Willem Roux Livermore Software Technology Corporation, Livermore, CA, USA. Abstract
12 th Internatonal LSDYNA Users Conference Optmzaton(1) LSTaSC Verson 2.1 Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LSTaSC verson 2.1,
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationKinematics of pantograph masts
Abstract Spacecraft Mechansms Group, ISRO Satellte Centre, Arport Road, Bangalore 560 07, Emal:bpn@sac.ernet.n Flght Dynamcs Dvson, ISRO Satellte Centre, Arport Road, Bangalore 560 07 Emal:pandyan@sac.ernet.n
More informationAPPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATERTABLE DATA
RFr"W/FZD JAN 2 4 1995 OST control # 1385 John J Q U ~ M Argonne Natonal Laboratory Argonne, L 60439 Tel: 7082525357, Fax: 7082523 611 APPLCATON OF A COMPUTATONALLY EFFCENT GEOSTATSTCAL APPROACH TO
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationPositive Semidefinite Programming Localization in Wireless Sensor Networks
Postve Semdefnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer
More informationReproducing Works of Calder
Reproducng Works of Calder Dongkyoo Lee*, HeeJung Bae*, Chang Tae Km*, DongChun Lee*, DaeHyun Jung*, NamKyung Lee*, KyooHo Lee*, Nakhoon Baek**, J. Won Lee***, Kwan Woo Ryu* and James K. Hahn*** *
More informationPreconditioning Parallel Sparse Iterative Solvers for Circuit Simulation
Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton
More informationQuality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation
Intellgent Informaton Management, 013, 5, 191195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on
More informationAn Efficient Garbage Collection for Flash MemoryBased Virtual Memory Systems
S. J and D. Shn: An Effcent Garbage Collecton for Flash MemoryBased Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash MemoryBased Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,
More informationApplication of Clustering Algorithm in Big Data Sample Set Optimization
Applcaton of Clusterng Algorthm n Bg Data Sample Set Optmzaton Yutang Lu 1, Qn Zhang 2 1 Department of Basc Subjects, Henan Insttute of Technology, Xnxang 453002, Chna 2 School of Mathematcs and Informaton
More informationExplicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements
Explct Formulas and Effcent Algorthm for Moment Computaton of Coupled RC Trees wth Lumped and Dstrbuted Elements Qngan Yu and Ernest S.Kuh Electroncs Research Lab. Unv. of Calforna at Berkeley Berkeley
More informationResource and Virtual Function Status Monitoring in Network Function Virtualization Environment
Journal of Physcs: Conference Seres PAPER OPEN ACCESS Resource and Vrtual Functon Status Montorng n Network Functon Vrtualzaton Envronment To cte ths artcle: MS Ha et al 2018 J. Phys.: Conf. Ser. 1087
More informationGPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method
GPU Accelerated Blood Flow Computaton usng the Lattce Boltmann Method Cosmn Nţă, Lucan Mha Itu, Constantn Sucu Department of Automaton Translvana Unversty of Braşov Braşov, Romana Constantn Sucu Corporate
More informationLecture #15 Lecture Notes
Lecture #15 Lecture Notes The ocean water column s very much a 3D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal
More informationCOMPARISON OF TWO MODELS FOR HUMAN EVACUATING SIMULATION IN LARGE BUILDING SPACES. University, Beijing , China
COMPARISON OF TWO MODELS FOR HUMAN EVACUATING SIMULATION IN LARGE BUILDING SPACES Bn Zhao 1, 2, He Xao 1, Yue Wang 1, Yuebao Wang 1 1 Department of Buldng Scence and Technology, Tsnghua Unversty, Bejng
More informationChapter 1. Introduction
Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal
More informationEvaluation of an Enhanced Scheme for Highlevel Nested Network Mobility
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hghlevel Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.
More informationPerformance Study of Parallel Programming on Cloud Computing Environments Using MapReduce
Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce WenChung Shh, ShanShyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan
More informationRealtime Faulttolerant Scheduling Algorithm for Distributed Computing Systems
Realtme Faulttolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,
More informationLocal Quaternary Patterns and Feature Local Quaternary Patterns
Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract  Ths paper presents
More informationLoop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation
Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop
More information