THE IMPACT OF SMT/SMP DESIGNS ON MULTIMEDIA SOFTWARE ENGINEERING - A WORKLOAD ANALYSIS STUDY

Size: px
Start display at page:

Download "THE IMPACT OF SMT/SMP DESIGNS ON MULTIMEDIA SOFTWARE ENGINEERING - A WORKLOAD ANALYSIS STUDY"

Transcription

1 THE IMPACT OF SMT/SMP DESIGNS ON MULTIMEDIA SOFTWARE ENGINEERING - A WORKLOAD ANALYSIS STUDY Yen-Kuang Chen, Raner Lenhart, Erc Debes, Matthew Hollman, and Mnerva Yeung Mcroprocessor Research Labs, Intel Corporaton ABSTRACT Ths paper presents the study of runnng several core multmeda applcatons on a smultaneous multthreadng (SMT) archtecture and derves desgn prncples for multmeda software engneerng. The multmeda worloads range from memory to computatonal-bounded ernels. A performance metrc to evaluate effectve SMT performance gan s ntroduced, and compared to smlar metrcs on symmetrc multprocessor (SMP) systems. In addton, we analyze and compare SMT versus SMP systems, and hghlght the advantages n the studed applcatons. The results ndcate that sharng the cache n SMT processors can provde better cache localty and thus better performance although sharng the cache can ntroduce cache conflcts and reduce the actual cache sze avalable for each logcal processor. We also propose mutually benefcal prefetchng -- a technque to schedule threads so that they prefetch data for each other n order to reduce cache mss penalty. 1. INTRODUCTION Whle processors nowadays are much faster than they used to be, the rapdly growng complexty of such desgns also maes achevng sgnfcant addtonal gans more dffcult. Consequently, processors/systems that can run multple software threads have receved ncreasng attenton as a means of boostng overall performance. In ths paper, we frst characterze the worloads of vdeo decodng, encodng, watermarng, and machne learnng on current superscalar archtectures, and then we characterze the same worloads on smultaneous multthreadng (SMT) archtectures. Specally, we use Intel Xeon processors wth Hyper-Threadng Technology, whch s one mplementaton of the SMT archtecture. Our goal s to provde a better understandng of performance mprovements on SMT processors. Fgure 1 shows a hgh-level vew of an SMT processor and compares t to a dual-processor system. In an SMT processor, one physcal processor exposes two logcal processors. Smlar to a dual-core or dual-processor system, a SMT processor appears to an applcaton as two processors. Two applcatons or threads can be executed n parallel. The maor dfference between SMT processors Intel s a regstered trademar of Intel Corporaton or ts subsdares n the Unted States and other countres. Xeon processor s a trademar of Intel Corporaton or ts subsdares n the Unted States and other countres. and dual-processor systems s the dfferent amounts of duplcated resources. In Intel Xeon processors wth Hyper-Threadng Technology, only a small amount of the hardware resources are duplcated, whle the front-end logc, executon unts, out-of-order retrement engne, and the memory herarchy components are shared. Thus, compared to processors wthout Hyper-Threadng Technology, the de-sze s ncreased by less than 5% [7]. Smultaneous mult-threadng archtectures may ncrease the latency of some sngle-threaded applcatons, but have the beneft of ncreasng overall throughput of multthreaded and mult-process applcatons. Multmeda applcatons tend to exhbt large amounts of computaton and parallelsm. We select a few representatve multmeda applcatons as our worloads and characterze ther performance on SMT. Although the worloads are well optmzed for Pentum 4 processors, due to the nherent consttuton of the algorthms, almost all multmeda applcatons cannot fully utlze all the executon unts avalable n the mcroprocessor. Some of the modules are memory-bounded, whle some are computaton-bounded. There s also no common metrc to measure the effcency of multthreaded applcatons on Hyper-Threadng Technology yet. We propose a metrc that helps us to clearly dentfy avalable performance mprovements. By understandng the performance mprovements that are possble n multmeda applcatons, we learn a number of technques n algorthms and applcatons to acheve better performance on Hyper-Threadng Technology. The paper s organzed as follows. Secton 2 descrbes our worloads, whle Secton 3 explans the performance speedup of our mult-thread meda worloads on SMT processor and dual-processor systems. In Secton 4, assumng the amount of parallelsm s the same on SMT processors and systems wth multple processors, we formulate a common performance metrc to measure effectve SMT speedup. In Secton 5, after examnng the performance numbers, we provde our observatons and descrbe some technques to ncrease applcaton performance on processors wth Hyper-Threadng Technology. Secton 6 concludes ths wor. Pentum s a regstered trademar of Intel Corporaton or ts subsdares n the Unted States and other countres.

2 Table 1: MPEG decodng ernel characterzaton on 2 GHz Pentum 4 processor (9 Mb/s MPEG-2, 720x480) Kernel IPC UPC MMX, SSE, SSE-2 per nstructons Cond. Branch/ nstr. Mspred. Cond./ Instr. Mspred. Cond./ Cloc L1 msses/ Instr. FSB actvty VLD /9 1/120 1/158 1/ % IDCT /141 1/2585 1/4381 1/ % MC /17 1/142 1/592 1/ % 2. WORKLOADS 2.1. MPEG Vdeo Encodng/Decodng The MPEG decodng ppelne conssts of the maor operatons of Varable-Length Decodng (VLD), Inverse Quantzaton (IQ), Inverse Dscrete Cosne Transform (IDCT), and Moton Compensaton (MC) [4], as shown n Fgure 2. Table 1 shows a hgh-level summary of the MPEG-2 decoder s behavor. The frst stage of the decodng ppelne, VLD/IQ, s characterzed by substantal data dependency, lmtng opportuntes for Logcal processor 1 Arch states (regsters) Executon Resources Cache(s) Logcal processor 2 Arch states (regsters) Man memory (a) Physcal Physcal processor 1 processor 2 Arch states (regsters) Executon resources Cache(s) Arch states (regsters) Executon resources Cache(s) System bus System bus Man memory (b) Fgure 1: Hgh-level dagram of (a) an SMT processor and (b) a dual-processor system. Btstream VLD & IQ IDCT Moton Comp. Reference frames Fgure 2: Bloc dagram of an MPEG decoder. Pctures nstructon, data, and thread-level parallelsm. The next stage, IDCT, s completely computaton-bound. The ernel s domnated by MMX/SSE/SSE2 (Streamng SIMD Extenson) operatons. Because 90% of the nstructons are executed n the MMX/SSE/SSE2 unt, the nteger executon unt s dle most of the tme n the IDCT module. The fnal stage of the decodng ppelne, MC, s memory ntensve compared to the other modules n the ppelne. Although the out-of-order executon core n the Intel Xeon processor can tolerate some memory latency, the module shows an equal dstrbuton of tme between computaton and memory latency because there are too many memory operatons. All these modules are welloptmzed, but stll cannot utlze all of the executon unts avalable n the mcroprocessors. Whle the Intel Pentum 4 and Intel Xeon processors can execute multple uops n one cycle, the uops retred per cycle (UPC) s only 0.74 n our optmzed vdeo decoder. After mult-threadng these worloads on SMT processors, they show better utlzaton of the executon unts Watermar Detecton Another applcaton that we studed s vdeo watermar detecton. Our watermar detector has two basc stages: MPEG vdeo decodng (as descrbed n Secton 2.1) and mage-doman watermar detecton. The watermar detecton scheme s optmzed wth the Intel IPL for the mage manpulatons used durng watermar detecton Support Vector Machnes Machne learnng plays a ey role n automatc content analyss of multmeda data [10]. A common tas s to predct the output y for an unseen nput sample x gven a tranng set {( x, y )} {1,..., N} consstng of nput x and ts desred output y. In other words, the goal s to learn the functonal relatonshp F: y = F(x) between nput x and output y. Predctng qualtatve output s called classfcaton, whle predctng quanttatve output s called regresson. In our nvestgatons, we concentrate on two recent machne learnng technques for classfcaton: Support vector machnes (SVMs) [1] and Boostng [3]. Both technques are qute dfferent. Together, however, they show off executon features of many machne learnng algorthms, and are thus good worloads to be analyzed. The evaluaton of traned SVMs s very structured and can, thus, be multthreaded at multple levels (see Fgure

3 3): On the lowest level, the dmensonalty K of the nput data can be very large. Typcal values of K range between a few hundreds to several thousands. Thus, the vector multplcaton n the lnear, polynomal and sgmod ernels as well as the L 2 dstance n the radal bass functon ernel can be multthreaded. On the next level, the evaluaton of each expresson n the sum s ndependent of each other. Fnally, n an applcaton several samples are tested and each evaluaton can be done n parallel. In the expermental results secton we wll research the effects of the dfferent level of parallelsm. N F( x) = sgn Φ + yα ( x, x ) b = 1 K where x, x R, y { 1, + 1}, α, b R, and Φ ( x, x ) s one of the followng ernels: L Lnear: Φ ( x, x ) = x x P T Polynomal of degree d: Φ ( x ( ) d, x ) = x x + 1 S Sgmod ernel: Φ ( x, x ) = tanh( ax x + c) Radal bass functon: Φ RBF ( x, x ) = exp( x T x 2 2 / σ ) Fgure 3: Support Vector Machne (SVM) classfcaton algorthm [1] Fgure 4 and 5 show two dfferent SVM mplementatons. The frst mplementaton wll result n mutually benefcal prefetchng, whle the latter mplementaton wll mae better use of the caches. const nt NUM_SUPP_VEC = 1000; // Number of support vectors const nt NUM_VEC_DIM = 24*24; // Feature vector sze; 24 by 24 pxel wndow // 1D sgnal scanned by sldng wndow for faces of sze 24 by 24 pxels const nt SIGNAL_SIZE = 320*240; const nt NUM_SAMPLES = SIGNAL_SIZE-NUM_VEC_DIM+1; Ipp32f supportvector[num_supp_vec][num_vec_dim]; Ipp32f coeffs [NUM_SUPP_VEC]; Ipp32f samples[signal_size]; // nput sgnal array Ipp32f result [NUM_SAMPLES]; // stores classfcaton result // LINEAR KERNEL float lnear_ernel(const Ipp32f* psrc1, nt len, nt ndex) { Ipp32f result; ppsdotprod_32f(psrc1, supportvector[ndex], len, &result); return result * coeffs[ndex]; } // non-blocng, mutually benefcal prefechtng code nt man() { #pragma omp parallel for for (nt =0 ; <NUM_SAMPLES; ++) { float sum=0; #pragma omp parallel for reducton (+:sum) for (nt =0 ; <NUM_SUPP_VEC; ++) { float tmp = lnear_ernel(&samples[], NUM_VEC_DIM, ); sum += tmp; T } result[] = sum; }} Fgure 4: Standard mult-threaded SVM mplementaton (wth mutually benefcal prefetchng) // blocng code, bloc sze = supvecblocsze nt man() { nt blocsze =...; for (nt =0 ; <NUM_SAMPLES; +=blocsze) { for (nt =0 ; <NUM_SUPP_VEC; +=1) { nt loopend_ = std::_min(num_samples, +blocsze); #pragma omp parallel for for (nt = ; <loopend_ ; ++) { float tmp = lnear_ernel(&samples[], NUM_VEC_DIM, ); result[] += tmp; }}}} Fgure 5: Re-arranged mult-threaded code wth hgher cache localtes (bloced SVM) 2.4. Boostng Boostng s a powerful learnng concept. It combnes the performance of many wea classfers to produce a powerful commttee [3]. A wea classfer s only requred to be better than chance, and thus can be very smple and computatonally nexpensve. Many of them smartly combned, however, result n a strong classfer, whch often outperforms most monolthc strong classfers such as SVMs and Neural Networs. Dfferent varants of boostng are nown such as Dscrete Adaboost, Real AdaBoost, LogtBoost, and Gentle AdaBoost [3]. All of them are almost dentcal from a worload perspectve. Therefore, we wll loo only at the standard two-class Dscrete AdaBoost algorthm as shown n Fgure 6. Learnng s based on N tranng examples K {( x, y)} {1,..., N} wth x R and y { 1, + 1}. x s a K-component vector. Each component encodes a feature relevant for the learnng tas at hand. The desred twoclass output s encoded as 1 and +1. In the case of face detecton, the nput pattern x could be a raw btmap, and an output of +1 and -1 would ndcate whether the nput pattern does contan a complete face. 1. Gven N examples x, y ),...,( x N, y ) wth ( 1 1 N x R, y { 1, + 1}. 2. Start wth weghts w = 1/N, = 1,..., N. 3. Repeat for m = 1,..., M a. Ft the classfer f m ( x) { 1, + 1} usng weghts w on the tranng data ( 1, 1),...,( N, N ) b. Compute errm = E w [ 1( y f m ( x)) ], cm = log(( 1 errm ) / errm ). c. Set w w exp( cm 1( y f m ( x)) ), = 1,..., N, and renormalze weghts so that w = 1.

4 4. Output the classfer F ( x) = sgn( c m f ( x)) M m= 1 Fgure 6: Two-class Dscrete AdaBoost algorthm: Tranng (step 1 to 3) and evaluaton (step 4) [3] Each sample s ntally assgned the same weght (step 2). Next a wea classfer f 1 s traned on the weghted tranng data (step 3a). Its weghted tranng error and scalng factor c m s computed (step 3b). The weghts are ncreased for tranng samples, whch have been msclassfed (step 3c). All weghts are then normalzed, and the process of fndng the next wee classfer contnues for another M-1 tmes. The fnal classfer F(x) s the sgn of the weghted sum over the ndvdual wea classfers f m (step 4). All tranng steps (2 and 3) n Fgure 2 can be parttoned to run n parallel: Step 2: Every ntal weght assgnment s ndependent of each other, and can thus be done n parallel. Step 3a: Ths step s computatonally most demandng. Fortunately, most wea classfers can be traned n parallel. In our research, we use stumps as wea classfers. A stump s a smple threshold classfer of the form y { x threshold} f( x threshold) x threshold f ( x) = y { x > threshold} else x > threshold where x denotes the th component of vector x and. the sze of the set { }. The threshold for the best classfcaton performance must be calculated for all K components of x n order to fnd the best wea classfer: (, threshold) = arg max { E[1 ]}. best, theshold m y= f ( x) Ths search over s ndependent and can thus be easly parallelzed. Step 3b: In ths step there are only two mnor dependences: the count of the msclassfed samples and the fnal calculaton of c m based on ths count. The needed classfcaton of each tranng sample n contrast s totally ndependent. Note that the msclassfed samples can be counted ndependently n multple threads. Only the fnal accumulaton of all partal counts has to be done n seral n order to calculate c m (reducton operaton). Step 3c: The pcture here s smlar to 3b. Weght updatng s ndependent n the samples. Only the accumulaton of the new weghts must ether be performed synchronzed or partally n each thread (reducton operaton). It s followed by a non-parallel calculaton of the nverse of the weght sum as the new normalzaton factor. Normalzaton of each weght s ndependent n the samples. Evaluaton can be multthread at multple levels. On a sample level, the dfferent wea classfers can be evaluated and scaled ndependently. However, often many samples have to be evaluated. Therefore on ths hgher level, each test sample can be evaluated ndvdually, too. In face detecton, for nstance, a wndow sldes over the whole mage at multple resolutons. At each poston the classfer s evaluated. Ths can be easly parallelzed. 3. PERFORMANCE CHARACTERISTICS Ths secton shows the performance analyss of our applcatons on multthreadng archtectures. In general, our results show that SMT offers an average 16%~25% performance mprovement. Ths s very cost effectve, as the 5% area cost for SMT s far less than the cost of doublng the hardware for dual-processor systems. Our SMT and dual-processor system has two 2.0GHz Intel Xeon processors wth Hyper-Threadng Technology, runnng Wndows XP. Each processor has a 512KB second-level cache. To contrast the performance wth sngle-thread performance on the system expermentally n lab settng, we dsable one physcal CPU and the support of Hyper-Threadng Technology for the other CPU. To contrast the performance of dual processors, we dsable the support of Hyper-Threadng Technology for both CPUs. Table 2 shows our expermental results. We acheve consstently more than 10% hgher performance on the processor wth Hyper-Threadng Technology across several worloads. The speedups reported n the second column are the worload speedups on a sngle processor wth Hyper-Threadng Technology. The speedups n the thrd column are the worload speedups on dual-processor systems. (We wll explan the last column n the next secton.) Multthreaded performance s better due to more effcent use of the executon unts. To verfy that resource utlzaton s better balanced on a processor wth Hyper- Threadng Technology, we compare UPC for snglethreaded and mult-threaded applcatons. UPC ncreases from 1.05 to 1.33 n vdeo encodng, from 0.78 to 0.85 n vdeo decodng, and from 1.01 to 1.21 n watermar detecton, confrmng the more effcent resource utlzaton possble wth Hyper-Threadng Technology, as shown n Table EFFECTIVE SMT SPEEDUP Before we contnue our dscusson on SMT speedup, ths secton formulates our performance evaluaton crteron of

5 the worloads on SMT processors. As mentoned earler, SMT processors only duplcate a small amount of the resources whle dual-processor systems duplcate almost every processor resource. Thus, t maes lttle sense to compare speedups on SMT processors wth those on systems wth two processors [8]. In order to have a far performance evaluaton of mult-threaded applcatons on SMT archtectures, we defne a metrc to measure the effectve speedup. Ths metrc can be extended to measure not only the effectve speedup of SMT processors, but also those of other mult-threadng capable processors or systems, such as chp mult-processor (CMP) archtectures. For example, consder two worloads wth a 1.17x and 1.11x speedup on Hyper-Threadng Technology (the bloced versons of the support vector machne worload wth RBF ernel and lnear ernel, as shown n Table 2). The same two worloads show speedups on dualprocessor systems of 1.78x and 1.54x, respectvely. Obvously the frst worload has more parallelsm n the code and thus exhbts better speedup on both SMT and SMP systems. However, f we tae the rato of the Hyper- Threadng speedup over the dual-processor speedup, we observe that 1.17 < 1.11, whch seems to suggest that the second worload would better explot the SMT archtecture. In fact, counter-ntutvely, we observed that the SMT system actually better explots the parallelsm n the frst worload. Thus comparng ratos of SMT/SMP performance does not necessarly tell the whole story when t comes to assessng the gans acheved va the SMT archtecture. Ths motvates us to defne a new metrc that s ndependent of the amount of parallelsm avalable n the code, n order to assess the gans acheved from SMT for a gven worload. Accordng to Amdahl s Law the speedup of a program on a mult-threaded system s: 1 Speedup =, parallelsm (1 parallelsm) + ParallelSpeedup where Speedup s the overall applcaton speedup, parallelsm s the fracton of code that can be executed n parallel, and ParallelSpeedup s the effectve speedup acheved for the parallel secton. The formulaton s true for both SMT and SMP. We would le to use the effectve speedup as the metrc to compare the performance of an applcaton on dfferent mult-threadng archtectures. By measurng the applcaton speedup on an SMT processor and combnng that wth nowledge of the amount of avalable thread level parallelsm, we can quantfy the effectve speedup due to SMT, whch we denote as ParallelSpeedup MT. Snce computatonal resources are shared between the parallel threads, ths value s unpredctable and dependent upon the worload tself. We thus determne t as follows. We assume that parallelsm s dentcal for both SMP and SMT systems. Gven an applcaton s overall speed-up on an SMP system, we estmate the amount of parallelsm achevable under the smplfyng assumpton that the speedup of the parallel code s approxmately a factor of two on dual-processor systems. Under these assumptons, the effectve SMT speedup of the worload can be expressed as the followng: ParallelSpeedup MT 2 2 Speedup DP = Speedup Speedup where Speedup MT and Speedup DP are the overall applcaton speedups on the SMT processor and the dualprocessor system. Usng the above equaton, we measure the effectve SMT speedups as shown n the last column n Table 2. We consstently acheve at least 15-30% hgher performance on the SMT processor for threaded code across several worloads. For the SVM worloads wth mutual prefetchng (dscussed n the next sesson), even better results are obtaned. Our metrc shows that a mult-threaded worload that has a 1.17x Hyper-Threadng speedup and a 1.78 dualprocessor speedup (SVM RBF) uses Hyper-Threadng more effcently than another worload that has a 1.11x Hyper-Threadng speedup and a 1.54 dual-processor speedup (SVM lnear). The frst obtans 20% mprovement for threaded code, whle the latter obtans a 16% speedup for threaded sectons. Thus despte the ntal ntuton that the latter worload benefts more from Hyper-Threadng, n fact we see that the former worload shows a greater relatve performance gan. 5. DISCUSSION There are a number of factors that can nfluence the effectve SMT speedup. For example, because two logcal processors share one physcal SMT processor, the effectve szes of the caches for each logcal processor, largely nfluenced by the cache footprnt of each applcaton, seem smaller. Thus, t s mportant for multthreaded applcatons to mae udcous use of the cache and comprehend possble thrashng ssues. For nstance, when consderng code sze optmzaton, excessve loop unrollng should be avoded. Please refer to [6] for more detals. MT DP

6 Whle sharng caches may decrease the effectve cache szes seen by some applcatons runnng on processors wth Hyper-Threadng Technology, sharng caches can provde better cache localty between the two logcal processors for other applcatons. As many applcatons are memory-bound, havng good cache localty can provde a sgnfcant ncrease n applcaton performance. For example, n our study we dentfed two cases n whch cache sharng between logcal processors on one physcal SMT processor produced very benefcal data localty effects. Ths s the reason why we observe a large speed-up for some threaded worloads on an SMT processor Dynamc Slce Schedulng As shown n Fgure 7, a pcture n a vdeo bt stream can be dvded nto slces of macroblocs. Each slce, consstng of blocs of pxels, s a unt that can be decoded ndependently. Here we compare two methods to decode the pctures n parallel: 1. Statc parttonng: In ths method, one thread s statcally assgned the frst half of the pcture, whle another thread s assgned the other half of the pcture (as shown n Fgure 7(a)). Assumng that the complexty of the frst half and second half are Table 2: Speedups of our meda worloads on systems wth Hyper-Threadng Technology & systems wth dualprocessors, and, the effectve Hyper-Threadng Speedup. Worload Speedup Worload Speedup Effectve Hyper-Threadng Dual-processors vs. SMT vs. Sngle-Thread Sngle-processor Speedup Vdeo Encoder [2, 4] Vdeo Decoder [2, 4] Vdeo Watermarng [2] SVM lnear [1] (bloced) SVM RBF [1] (bloced) SVM lnear [1] (mutally benefcal prefetchng) SVM RBF [1] (mutally benefcal prefetchng) Boostng Tranng [3] Boostng Detecton [3] Table 3: The worload characterstcs of our applcatons on sngle-threaded processors and processors wth Hyper- Threadng Technology MPEG encodng MPEG decodng Vdeo watermarng Event Snglethread Hyperthreadng Snglethread Hyperthreadng Snglethread Hyperthreadng Cloctcs (Mllons) 13,977 11,688 7,467 6,687 23,942 20,162 Instructons retred (Mllons) 11,253 11,674 3,777 3,921 17,728 17,821 Uops retred (Mllons) 14,735 15,539 5,489 5,667 24,120 24,333 IPC (nstructons per cloc) UPC (uops per cloc) Floatng-pont/SIMD (Mllons) 6,226 6,220 1,119 1,120 5,334 5,341 L1 cache msses (Mllons) Front Sde Bus utlzaton rate 8.5% 8.5% 14.7% 16.4% 14.2% 22.3% SVM (lnear) wth mutually SVM (RBF) wth mutually Boostng Tranng Event benefcal prefetchng benefcal prefetchng Snglethread Hyperthreadng Snglethread Hyperthreadng Snglethread Hyperthreadng Cloctcs (mllons) 12,758 6,901 13,943 8,627 5,995 4,852 Instructons Retred (mllons) 3,158 3,162 4,367 4,357 4,077 4,215 Uops Retred (mllons) 5,393 5,423 7,563 7,571 5,178 5,677 IPC (nstructons per cloc) UPC (uops per cloc) Floatng-pont/SIMD (mllons) 1,778 1,767 2,890 2,888 1, L1 cache msses (mllons) Front Sde Bus utlzaton rate 47.4% 56.2% 43.3% 43.1% 0.3% 0.1%

7 smlar, these two threads wll fnsh the tas at roughly the same tme. However, some areas of the pcture may be easer to decode than others. Ths may lead to one thread beng dle whle the other thread s stll busy. 2. Dynamc parttonng: In ths method, slces are dspatched dynamcally. A new slce s assgned to a thread when the thread has fnshed ts prevously pcture slces (b) Dynamc schedulng (a) Statc schedulng Thread 1 Thread 2 Thread 1 Thread 2 Assgned slces Fgure 7: Two slce-based tas parttonng schemes between two threads: (a) statc schedulng and (b) dynamc schedulng. Frame t Frame t+1 (a) All local cache hts Frame t Frame t+1 (b) Some local cache msses Frame t Frame t+1 (c) Fgure 8: Cache localtes, durng (a) moton compensaton, n (b) statc parttonng, and n (c) dynamc parttonng. assgned slce. In ths case, we don t now whch slces wll be assgned to whch thread. Instead, the assgnment depends on the complexty of the slces assgned. As a result, one thread may decode a larger porton of the pcture than the other f ts assgnments are easer than those of the other thread. The executon tme dfference between two threads, n the worst case, s the decodng tme of the last slce. The foremost advantage of the dynamc schedulng scheme s ts good load balance between the two threads. Because some areas of the pcture may be easer to decode than others, one thread under the statc parttonng scheme may be dle whle another thread stll has a lot of wor to do. In the dynamc parttonng scheme, we have very good load balance. As we assgn a new slce to a thread only when t has fnshed ts prevous slce, the executon tme dfference between the two threads, n the worst case, s the decodng tme of a slce. We now llustrate the advantage of sharng caches n our applcaton. On dual-processor systems, each processor has a prvate cache. Thus, there may be a drawbac to dynamc parttonng n terms of cache localty. Fgure 8 llustrates the cache localty n multple frames of vdeo. Durng moton compensaton, the decoder uses part of the prevous pcture, the referenced part of whch s roughly co-located n the prevous reference frame, to reconstruct the current frame. It s faster to decode the pcture when the co-located part of the pcture s stll n the cache. In the case of a dual-processor system, each thread s runnng on ts own processor, each wth ts own cache. If the colocated part of the pcture n the prevous frame s decoded by the same thread, t s more lely that the local cache wll have the pctures that have ust been decoded. Snce we dynamcally assgn slces to dfferent threads, t s more lely that the co-located porton of the prevous pcture may not be n the local cache when each thread s runnng on ts own physcal processor and cache, as shown n Fgure 8(c). More cache msses may ncur more bus transactons. In contrast, the cache s shared between logcal processors on a processor wth Hyper-Threadng Technology and thus cache localtes are preserved. We obtan the best of both worlds wth dynamc schedulng: there s load balancng between the threads, and there s the same effectve cache localty as for statc schedulng on a dual-processor system Mutually Benefcal Prefetchng Although the Intel Xeon processor can execute multple uops n one cycle, the number of retred uops per cycle (UPC) s only 0.42 and 0.54 for the sngle-threaded SVM worloads [1]. Low UPC n the sngle-threaded worload ndcates the underutlzaton of the executon unts avalable n the mcroprocessor [5] due to the hgh rate of L1/L2 msses. The large number of L1/L2 msses result

8 from the large worng set of the SVM worloads. The sze of the whole set of support vectors s around 2MBytes, whch s larger than the L2 caches of most modem mcroprocessors. Although out-of-order executon can reduce the problem of data dependency on cache msses, memory latences are too long to be hdden n ths partcular worload. As a result, executon unts are under-utlzed. Sharng executon unts therefore ncreases the utlzaton rate of the resources and thus mproves the worload s throughput. Wth Hyper-Threadng Technology, the UPC ncreases to 0.79 and 0.88, respectvely. Both threads are prefetchng data for each other on a sngle SMT processor. In [9], Wang et al. use one thread to prefetch data for another thread on SMT systems when caches are shared. In ther wor, the prefetchng thread does not generate useful results. In the SVM worloads, both threads requre the same support vectors for dfferent ncomng samples at roughly the same tme. Thus, when one thread runs faster than the other thread, t wll access the L2 or man memory to get the next support vector. So, when the other thread catches up n the executon, the support vector s already n L1. In ths case, both threads are prefetchng data for each other on a sngle SMT processor. We call ths mutual prefetchng. Because of ths mutual prefetchng effect, the dualthreaded worload on the SMT processor has fewer L1/L2 msses than the sngle-threaded worload and ts performance s thus much better. In fact, the speed-up s almost the same as that acheved on a dual-processor system (see Table 2). 6. CONCLUSIONS In ths paper, the characterstcs of several ey multmeda applcatons have been presented and ther performance on a smultaneous mult-threadng (SMT) archtecture studed. A metrc to evaluate the effectve speedup due to SMT has been defned, and an example shows that smply comparng worload performance on SMT vs. SMP systems rather than usng the metrc can gve msleadng mpressons of the relatve performance on the SMT archtecture. Our results show that the effectve speedup acheved on the SMT archtecture we studed gves very consstent results across several worloads. The dfferences between SMT and SMP archtectures, and n partcular the mpact of sharng the cache n SMT archtectures, have been dscussed. On SMT, sharng the cache provdes cache localty between threads. Ths nterestng characterstc has been exploted to reduce the mpact of cache msses by schedulng threads to prefetch data for each other. Usng ths technque, some worloads show speed-ups on SMT systems compettve wth those seen on SMP systems, despte the small hardware cost assocated wth the SMT archtecture. 7. ACKNOWLEDGEMENTS We would le to than Doug Carmean for hs valuable comments to ths wor. We would le to than Sergey Zheltov, Alexander Knyazev, Stanslav Bratanov, Roman Belenov, and Valery Kuran for ther exceptonal efforts n developng the mult-threaded meda worloads used n ths study. Addtonally, we than Me Upton, Per Hammarlund, Russell Arnold, Shhong Kuo, and George K. Chen for valuable dscussons durng ths wor. 8. REFERENCES [1] C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery 2, p , [2] Y.-K. Chen, M. Hollman, E. Debes, S. Zheltov, A. Knyazev, S. Bratanov, R. Belenov, and I. Santos, Meda Applcatons on Hyper-Threadng Technology, Intel Technology Journal, Q [3] Y. Freund and R. E. Schapre, Experments wth a New Boostng Algorthm, n Proc. of Int l Conf. on Machne Learnng, pp , [4] B. G. Hasell, A. Pur, and A. N. Netraval, Dgtal Vdeo: An Introducton to MPEG-2, MA: Kluwer, [5] G. Hnton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyer, and P. Roussel, "The Mcroarchtecture of the Pentum 4 Processor, Intel Technology Journal, Q [6] Intel Corp., Intel Pentum 4 Processor Optmzaton Reference Manual, Order Number: (also avalable on-lne: pentum4/manuals/ pdf) [7] D. Marr, F. Bnns, D. L. Hll, G. Hnton, D. A. Koufaty, J. A. Mller, and M. Upton, Hyper-Threadng Technology Mcroarchtecture and Performance, Intel Technology Journal, Q [8] E. Palmer, Hyper-Threadng Characterzaton, prvate communcatons, Apr [9] H. Wang, P. Wang, R. D. Weldon, S. Ettnger, H. Sato, M. Grar, S. Lao, and J. Shen, Speculatve Precomputaton: Explorng the Use of Multthreadng Technology for Latency, Intel Technology Journal, Q [10] * Performance tests and ratngs are measured usng specfc computer systems and/or components and reflect the approxmate performance of Intel products as measured by those tests. Any dfference n system hardware or software desgn or confguraton may affect actual performance. Buyers should consult other sources of nformaton to evaluate the performance of systems or components they are consderng purchasng. For more nformaton on performance tests and on the performance of Intel products, vst or call (U.S.) or

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Performance Evaluation

Performance Evaluation Performance Evaluaton [Ch. ] What s performance? of a car? of a car wash? of a TV? How should we measure the performance of a computer? The response tme (or wall-clock tme) t takes to complete a task?

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

RADIX-10 PARALLEL DECIMAL MULTIPLIER

RADIX-10 PARALLEL DECIMAL MULTIPLIER RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier Floatng-Pont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES 1 Fetosa, R.Q., 2 Merelles, M.S.P., 3 Blos, P. A. 1,3 Dept. of Electrcal Engneerng ; Catholc Unversty of

More information

Learning-based License Plate Detection on Edge Features

Learning-based License Plate Detection on Edge Features Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

THE low-density parity-check (LDPC) code is getting

THE low-density parity-check (LDPC) code is getting Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION... Summary A follow-the-leader robot system s mplemented usng Dscrete-Event Supervsory Control methods. The system conssts of three robots, a leader and two followers. The dea s to get the two followers to

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Classification Based Mode Decisions for Video over Networks

Classification Based Mode Decisions for Video over Networks Classfcaton Based Mode Decsons for Vdeo over Networks Deepak S. Turaga and Tsuhan Chen Advanced Multmeda Processng Lab Tranng data for Inter-Intra Decson Inter-Intra Decson Regons pdf 6 5 6 5 Energy 4

More information

Design and Implementation of an Energy Efficient Multimedia Playback System

Design and Implementation of an Energy Efficient Multimedia Playback System Desgn and Implementaton of an Energy Effcent Multmeda Playback System Zhjan Lu, John Lach, Mrcea Stan, Kevn Skadron, Departments of Electrcal and Computer Engneerng and Computer Scence, Unversty of Vrgna

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs Utlty-Based Acceleraton of Multthreaded Applcatons on Asymmetrc CMPs José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ECE Department The Unversty of Texas at Austn Austn, TX, USA {joao, patt}@ece.utexas.edu

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

Article RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

Article RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization Artcle RGCA: a Relable GPU Cluster Archtecture for Large-Scale Internet of Thngs Computng Based on Effectve Performance-Energy Optmzaton Yulng Fang, Qngku Chen *, Neal N. Xong, Deyu Zhao and Jngjuan Wang

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information