Optimisation of Statistical Feature Extraction Algorithms

Size: px
Start display at page:

Download "Optimisation of Statistical Feature Extraction Algorithms"

Transcription

1 Optimisation of Statistical Feature Extraction Algorithms C. Lombard*, W.A. Smit and J.P. Maré Kentron Dynamics Mache Vision Group Keywords: statistical feature extraction, real-time implementation, optimisation. Abstract * This paper describes the process of optimisg statistical feature extraction algorithms for use object recognition. The focus is on real time implementation of these algorithms on applicable processors. Different processors were evaluated, of which the TigerSharc was chosen to be discussed this paper. One sgle and two double wdow features are discussed for object of terest recognition. It is demonstrated here that a large improvement the execution time can be obtaed by implementg several optimisation techniques C, some seemgly consequential. Also demonstrated, is the improvement the use of assembly language can make.. Introduction Before object recognition on an image can be implemented a system, the algorithm must be real-time implementable. In [] and [2] possible features are discussed for detectg pot objects simulated images, and example IR (frared) images are shown Figure. From the origal features tested, only three will be taken as examples for the purpose of this paper. Ways of optimisg the code used for feature extraction, and benchmarkg of the old and new code, are also discussed. The three example features selected are: () Maximum Grey Level [2] (a sgle wdow feature), (2) Average Gradient Strength [2] (a double wdow feature) and (3) Variance Ratio (a double wdow feature). They are reviewed section 2 to provide a basis for the optimisation discussion that follows section Features Two classes of features were used, namely sgle-wdow and double wdow features. Double-wdow features are calculated usg parameters derived from both an ner (target) and an outer (local background) wdow, while * P.O. Box 742, Centurion, 0046, South-Africa. cecilia.lombard@kentron.co.za All images courtesy of Kentron s SIMIS environment. sgle-wdow features are calculated by only operatg on the target wdow. Please note that the outer wdow is "donut"-shaped, i.e. it excludes the region of the ner wdow. Figure : Simulated IR images illustratg the objects with low and high cluttered backgrounds. Outer Wdow Inner Wdow Image Figure 2: The feature extraction procedure, showg the direction of movement of the slidg wdow(s) across the image. The feature extraction procedure is shown Figure 2. The slidg wdow(s) moves across a grey scale image from pixel to pixel, from left to right and from top to bottom. At each new pixel position the three features are calculated over the wdow(s). 2.. Maximum Grey Level This feature searches through the ner wdow for the highest grey level value. Thus, the IR example, if there is a part of the object that is significantly warmer than the rest of the object and the background, the value at that pot will be the value assigned to this feature. An example image, and the features obtaed from that image, is shown Figure 3.

2 3. Optimisation a. IR image b. Maximum Grey Level c. Average Gradient Strength d. Varace Ratio Figure 3: An image and the three features obtaed from it Average Gradient Strength This feature described by [2] relies on the occurrence of sharper ternal detail man-made objects when compared to natural objects, even if the average tensity of the man-made and natural objects is similar. The average gradient strength of the local background is subtracted from the average gradient strength of the object region to prevent large regions of background that exhibits a larger than normal variation, from yieldg a high value for this feature. In [2] the feature is calculated as Fij = ( k, N ( i, j) G( k, n ( k, N ( i, j) Gout ( k, out nout where h v G ( k, = G ( k, G ( k, l ), + G h ( k, = f ( k, f ( k, l + ), G v ( k, = f ( k, f ( k +, l ), and G out ( k, l ) is defed similarly. Here n out is the number of pixels N out ( i, j ) and n is the number of pixels, N ( i, j ) where N out and N respectively denotes the target and local background wdows. () 2.3. Variance Ratio This simple feature is given by: F ij = (2) out where out and respectively denotes the standard deviation values calculated for the local background and target wdows. 3.. Feature extraction In the direct implementation for generatg the features every feature value calculated uses every pixel the slidg wdow for the calculation; for double wdow features every pixel both the wdows are used. Sce adjog wdows overlap completely except for one column or row, this means that many of the calculations are repeated. If formation from the calculation of the previous (adjog) value of a feature was saved and transferred, it could be used for the new calculation, thus savg a large amount of processg time. It was decided to implement this by, for each row, dog the complete calculation for the first wdow and then to calculate the next value the row from that value and the third value from the second value and so forth. This means that when the wdow is shifted to the next pixel a row, the only change to the wdow is that a new column of pixels, on the right of the wdow, needs to be taken to account, and that an old column of pixels on the left of the wdow needs to be removed. This overlap between adjog wdows is shown Figure 4. Old Column Slidg Inner Wdow New Column Image Figure 4: The new wdow that cludes the new column and excludes the old column Maximum Gray Level In the direct method, each time that this feature is calculated, every pixel side the wdow is searched to check if it is higher than the runng maximum. In the less processor tensive implementation the previous maximum and the position of that maximum is passed to the new calculation. If the old maximum lies the overlap region then its value is compared to the values of the new column and then the new maximum is found. If the old maximum lies the discarded column of the previous wdow, then the whole of the new wdow is searched Average Gradient Strength This feature calculates the sums of the variations between consecutive pixels over both the ner and outer wdows,

3 both the vertical as well as the horizontal directions for each. These sums are then used to calculate the feature value. Because the four sums are lear combations of the values the slidg wdows, the ones obtaed for the previous pixel can be used as a basis for calculatg the value for the new pixel. By the same reasong as section 3..., the gradients associated with the new column/s need to be added to the previous gradient total and the old column/s needs to be subtracted. The outer wdow must have two columns added and two columns subtracted because of the "donut" shape of the wdow Variance ratio The formula for calculatg the standard deviation: = n n i= ( x i x) 2 - with n the number of values the wdow, x i the gray level pixel values the wdow and x the average of the values the wdow - represents a problem. The non-learness of the square the formula coupled with the fact that the average changes from wdow to wdow, makes the implementation of an optimisation method similar to the ones used for the other features impossible without an approximation. The approximations implemented for the total calculation were found to be too accurate (they also became more and more accurate the farther from the start pot a row). The only optimisation that could be used was the calculation of the average value of the new wdow usg the previous average. Another optimisation technique that was evaluated was to use the ratio of the variances, and not the ratio of the standard deviations of the two wdows. In other words this would entail removg the calculation of two square roots for every feature value calculated Processor-specific optimisation Number format The TigerSharc processor is a native floatg-pot processor; other words non-floatg-pot numbers are simulated with floatg-pot numbers. This means that extra processg power is required to handle these numbers. However, if assembly language optimisation is used, four 8-bit tegers could be processed parallel stead of one 32-bit floatg-pot number Indexg When usg numerous for-loops with memory dexg side the loops it makes sense to mimise any calculations needed to address a specific memory space. For example, usg two dexes to address a value a two-dimensional matrix - for example the image - seems natural, but the processor uses only one dex, hence every time a double dex is used it has to be converted to a sgle dex, which uses unnecessary processg power. Another place for-loops ( C) where processg power could be saved is at the test for endg a for-loop. The syntax for a for-loop C is as follows: for (x = a; x <last;x++) where 'a' is the start value of the dex, 'x'; the test is 'x<last' and each time the loop executes 'x' is cremented by one ('x++'). If 'last' was a calculation, for example '5*a-3', that calculation would be executed once for every time the loop executes, but if 'last' was a pre-calculated variable the calculation itself would only be executed once General functions There are several math functions C that were written for the general case. When calculatg the variance ratio, for example, a square needs to be calculated. This was origally done with the power function C's math library. The power function is a general function the sense that it is able to handle any power function, not just to the power of two. Hence it needs added logic to handle that, creasg the processg power overheads enormously Assembly language optimisation From the benchmarks it was determed that the most processor-tensive feature to calculate is the variance ratio. For this reason it was decided to focus on the variance ratio when implementg the assembly language optimisation. The formula for the standard deviation, on which the variance ratio is based, is discussed In terms of code the math then looks somethg like this: a. For an area calculate the average: Avg = (sum of pixels / number of pixels) b. Calculate the standard deviation of the wdow as follows: Pixel_std_dev = (pixel_value - Avg)^2 Std_Dev = Sqrt( (sum of Pixel_std_dev's) / (number of pixels-)) The development of a decent assembly implementation of the variance ratio subroute relies on the followg steps: Fd the assembly structions required to implement the function Optimize for multi-function structions (i.e., a CPU (Central Processg Unit) core optimize for multiple arithmetic units, and for the use of SIMD (Sgle Instruction/Multiple Data) where possible) Add software pipeles where applicable. Exploit the CPU architecture to account for multiple cores, and optimize the use of memory and the I/O (Input/Output) subsystem.

4 3.3.. Assembly Instructions Required For the purpose of this exercise the processg is divided to two subroutes, i.e. the average calculation and the standard deviation calculation. Both will code efficiently assembler, although it will be required to pass over the wdow twice. Please note that the code assumes that the data is available ternal memory. It is not concerned with the availability of I/O resources to move that data - the CPU sequencer will sort that out a. Average Average Subroute Author WA Smit Date : 5 Sept 2003 Syntax : Avg(Poter to offset image,num_rows, Width) Returns sum of the rows, C has to divide by number of pixels. Description : this route sums the number of rows as assigned, and across the width as assigned. It returns the sum of a number of pixels equal to (Num_Rows x Width) pixels. Save regs [J6+J]=XR0;; [J6+J]=XR;; [J6+J]=J2;; [J6+J]=J4;; Calculate number of pixels Setup loop XR0=XR8*XR2;; J4=XR4;; Setup DAG J2=;; XR4=0;; Zero sum reg XR8=0;; Zero data reg LC0=XR0;; AVG_LOOP: XR=[J4,+J2];XR4=XR4+XR;; IF NLCOE JUMP AVG_LOOP;; Value is returned XR4 J2=[J6-J];; J4=[J6-J];; XR=[J6-J;; XR=[J6-J];; Return;; The cycle budget is then as follows: Save registers - 4 cycles Set up DAG's (Data Address Generator) 2-3 cycles Zero assembly variables - 2 cycles Set up loop - cycles 2 Note: In [3] the DAG is called the IALU (Integer Arithmetic Logic Unit) *** Inner loop start Fetch data word and add to wdow total - cycle * number of pixels *** Inner loop end Restore registers - 4 cycles The total number of cycles required is then: Cycles overhead: 4 cycles (overhead) Inner loop: Cycles required = *number_pixels (Please note that the ner loop by ference uses a software pipele. Please refer to the std_dev description below for a description of a software pipele.) The above route does not explicitly accommodate the optimization for calculatg the average value of and Figure 4 above. The route does lend itself to be used that way however, if the callg parameters are changed slightly. When the total number of cycles needed to complete the variance ratio subroute was calculated for the results (section 5.), the above optimisation was cluded b. Standard Deviation This route is essence the same as the Average route, with the difference that the calculation per pixel is more complex. StdDev Subroute Author WA Smit Date : 5 Sept 2003 Syntax : StdDev(Poter to offset image,number of pixels, Average) Returns sum of standard of the rows, C has to divide by number of pixels and get the square root Description : this route calculates the standard deviation of the number of rows as assigned, and across the width as assigned. It returns the standard deviation of a number of pixels equal to (Num_Rows x Width) pixels. Save regs [J6+J]=XR0;; [J6+J]=XR;; [J6+J]=XR2;; [J6+J]=J2;; [J6+J]=J4;; Setup loop J4=XR;; Setup DAG J2=;; XR4=0;; Zero sum reg XR0=[J4+J2];; Start the pipele LC0=XR8;; STD_LOOP: XR0=[J4+J2];XR=XR0-XR2;;

5 XR2=XR*XR; XR4=XR4+XR2;; IF NLCOE JUMP STD_LOOP;; XR4=XR4+XR2; End the pipelle Value is returned XR4 J2=[J6-J];; J4=[J6-J];; XR2=[J6-J];; XR=[J6-J];; XR0=[J6-J]; Return;; The cycle budget is then as follows: Save registers - 5 cycles Set up DAG's - 2 cycles Set up assembly variables - 2 cycles Set up loop - cycle *** Inner loop start Fetch data word Subtract wdow_average Multiply result with self and add to wdow total *** Inner loop end Return std_dev - cycle Restore regs - 5 cycles Number of cycles overhead: 6 cycles Inner loop: It is clear that the ner loop requires some optimization. It is proposed that the ner loop use two steps. In the first step the data is fetched and the subtraction is done. The second step is then a multiply-add to complete the processg. It is further proposed that a software pipele be used, thereby ensurg a average throughput of 2 cycles per pixel for the ner loop. A software pipele is needed as the data that is fetched from memory only becomes available for processg the next cycle. The pipele then looks somethg like this: Fetch_n;Sub_empty Fetch_n+;Sub_n Mult_n;Add_n Fetch_n+2;Sub_n+ Mult_n+;... Etc. The ner loop cycles then become: 2 * number pixels Multi-function Instructions Already done Software Pipeles Already done CPU Optimizations The selected processor is a super scalar CPU with two dependent cores. In theory the number of cycles required should be half of what is required for a sgle core. In practice there is doubts on the ability of the CPU's I/O subsystem to support all the data transfers required. When the issue is pursued, the followg is found: Required per cycle: Average Data words: 2 * 32 bit (one each core - the accumulated total is stored a register each core) Instructions 2 * 28 bit. This however for the first iteration of each loop only, as the data thereafter resides the struction cache of each core. I.e. ignore. Standard deviation Much the same situation as Average. It appears then that the CPU efficiency depends only on the ability of the programmer to arrange the data memory such a way that each core has free access to its data. It is therefore advised that the ternal memory blocks be arranged as follows: Block 0 : Program Block : Image Block 2 : Image It is further proposed that the image size be restricted to a size that can fit to a sgle memory block and that these two blocks be swapped between the DMA (Direct Memory Access) subsystem and the cores. The cores can share a sgle 28-bit bus to move their data. Under these conditions full efficiency can be achieved Cycle Calculations When all the cycles that are required to execute the assembly portions of the variance ratio function are added, and the optimization for the average route is cluded, it is found that the assembly version executes four times faster than the C version. If the data variables are reduced to 8 bit variables, and SIMD the selected processor's CPU cores are exploited it should be possible to achieve a speedup of 0 to 6 times, dependg on the availability of data on the CPU ternal busses. 4. Results Several optimisation techniques were tested with the different functions and is dicated by number the results table (Table ):. Calculatg feature values from previous values with a row. 2. Removg the square roots when calculatg the variance ratio feature. 3. Usg floatg-pot numbers for all important variables. 4. Mimizg dexg calculations. 5. Replacg general functions with simple, direct implementations.

6 6. Assembly language optimisation. The ma function calls three functions, of which each calculates one feature. A 250 MHz clock was assumed. It was found that when the square roots are removed from the variance ratio function the execution time decreases, but the differentiation between object and non-object pots also decreases. The square roots were reimplemented because of this, but with one square root stead of two and the times were found to crease very little. 5. Summary A large improvement was obtaed the execution time of the feature extraction algorithm after implementg several optimisation techniques. The fal execution time obtaed for a 300 x 300 image is still fairly long, but a large improvement is expected if the assembly language optimisation was applied to the whole algorithm. Techniques to designate areas with high probabilities of contag objects, before calculatg the features those areas, could also be implemented and tested. 6. References [] Lombard C., van Wyk B.J. and Maré J.P., 2002, Detection of Infrared, Ground-Based Pot Objects: A Case Study, Proceedgs of the Thirteenth Annual Symposium of the Pattern Recognition Association of South Africa, Nov [2] Kwon H., Der S.Z. and Nasrabadi N.M., 2002, Adaptive multisensor target detection usg featurebased fusion, Society of Photo-Optical Instrumentation Engeers, Vol. 4, No., pp [3] Analog Devices, ADSP-TS0 Tigersharc Processor Programmg Reference, Revision.0, Jan Nr. Optimisation technique used: Time (with Image size Ma Maximum Grey Average Gradient Variance Ratio Clock cycles 250MHz (pixels) Function Level Function Strength Function Function clock) none none none none 75 x s x s 3 3,3, x s 4 3,4,3,4,3 3,4 75 x s 5 3,4,3,4,3 3,4,5 75 x ms 6 3,4,3,4,3,4 3,4,5 75 x ms 7 3,4,3,4,3,4 2,3,4,5 75 x ms 8 3,4,3,4,3,4 2,3,4,5 76 x ms 9 3,4,3,4,3,4,2,3,4,5 75 x ms 0 3,4,3,4,3,4,2,3,4,5 00 x ms 3,4,3,4,3,4,2,3,4,5 300 x s 2 3,4,3,4,3,4,3,4,5 75 x ms 3 3,4,3,4,3,4,3,4,5 00 x ms 4 3,4,3,4,3,4,3,4,5 300 x s 5 3,4,3,4,3,4,2,3,6 300 x s 6 n/a n/a n/a,2,3,4,5 300 x s 7 n/a n/a n/a,2,3,6 300 x s Table : Benchmarkg results obtaed with different optimisation techniques.

7.3.3 A Language With Nested Procedure Declarations

7.3.3 A Language With Nested Procedure Declarations 7.3. ACCESS TO NONLOCAL DATA ON THE STACK 443 7.3.3 A Language With Nested Procedure Declarations The C family of languages, and many other familiar languages do not support nested procedures, so we troduce

More information

15-451/651: Design & Analysis of Algorithms November 20, 2018 Lecture #23: Closest Pairs last changed: November 13, 2018

15-451/651: Design & Analysis of Algorithms November 20, 2018 Lecture #23: Closest Pairs last changed: November 13, 2018 15-451/651: Design & Analysis of Algorithms November 20, 2018 Lecture #23: Closest Pairs last changed: November 13, 2018 1 Prelimaries We ll give two algorithms for the followg closest pair problem: Given

More information

15-451/651: Design & Analysis of Algorithms April 18, 2016 Lecture #25 Closest Pairs last changed: April 18, 2016

15-451/651: Design & Analysis of Algorithms April 18, 2016 Lecture #25 Closest Pairs last changed: April 18, 2016 15-451/651: Design & Analysis of Algorithms April 18, 2016 Lecture #25 Closest Pairs last changed: April 18, 2016 1 Prelimaries We ll give two algorithms for the followg closest pair proglem: Given n pots

More information

Increasing the Determinism in Real-Time Operating Systems for ERC32 Architecture

Increasing the Determinism in Real-Time Operating Systems for ERC32 Architecture Proceedgs of the th WSEAS Int. Conf. on Software Engeerg, Parallel and Distributed Systems, Madrid, Spa, February -, (pp-) Increasg the Determism Real-Time Operatg Systems for ERC Architecture A. Viana

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

COP-5555 PROGRAMMING LANGUAGEPRINCIPLES NOTES ON RPAL

COP-5555 PROGRAMMING LANGUAGEPRINCIPLES NOTES ON RPAL COP-5555 PROGRAMMING LANGUAGEPRINCIPLES NOTES ON 1. Introduction is a subset of PAL, the Pedagogic Algorithmic Language. There are three versions of PAL:, LPAL, and JPAL. The only one of terest here is.

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

An Application-Specific Network-on-Chip for Control Architectures in RF Transceivers

An Application-Specific Network-on-Chip for Control Architectures in RF Transceivers An Application-Specific Network-on-Chip for Control Architectures RF Transceivers Siegfried Brandstätter DMCE GmbH & Co KG Freistädter Straße 4, 44 Lz, Austria Email: SiegfriedX.Brandstaetter@tel.com Mario

More information

indicates problems that have been selected for discussion in section, time permitting.

indicates problems that have been selected for discussion in section, time permitting. Page 1 of 17 Caches indicates problems that have been selected for discussion in section, time permitting. Problem 1. The diagram above illustrates a blocked, direct-mapped cache for a computer that uses

More information

Version 4, 12 October 2004 Project handed out on 12 October. Complete Java implementation due on 2 November.

Version 4, 12 October 2004 Project handed out on 12 October. Complete Java implementation due on 2 November. CS 351 Programmg Paradigms, Fall 2004 1 Project 2: gnuplot Version 4, 12 October 2004 Project handed out on 12 October. Compe Java implementation due on 2 November. 2.1 The task Implement a utility for

More information

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 This homework is to be done individually. Total 9 Questions, 100 points 1. (8

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

Final Exam CSCI 1101 Computer Science I KEY. Wednesday December 16, 2015 Instructor Muller Boston College. Fall 2015

Final Exam CSCI 1101 Computer Science I KEY. Wednesday December 16, 2015 Instructor Muller Boston College. Fall 2015 Fal Exam CSCI 1101 Computer Science I KEY Wednesday December 16, 2015 Instructor Muller Boston College Fall 2015 Please do not write your name on the top of this test. Before readg further, please arrange

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms Matthias Hofmann, Fabian Rensen, Ingmar Schwarz and Oliver Urbann Abstract Edge detection is a popular technique for extracting

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 5: Processors Our goal: understand basics of processors and CPU understand the architecture of MARIE, a model computer a close look at the instruction

More information

1 Announcements. 2 Scan Implementation Recap. Recitation 4 Scan, Reduction, MapCollectReduce

1 Announcements. 2 Scan Implementation Recap. Recitation 4 Scan, Reduction, MapCollectReduce Recitation 4 Scan, Reduction, MapCollectReduce Parallel and Sequential Data Structures and Algorithms, 15-210 (Sprg 2013) February 6, 2013 1 Announcements How did HW 2 go? HW 3 is out get an early start!

More information

Implemented by Valsamis Douskos Laboratoty of Photogrammetry, Dept. of Surveying, National Tehnical University of Athens

Implemented by Valsamis Douskos Laboratoty of Photogrammetry, Dept. of Surveying, National Tehnical University of Athens An open-source toolbox in Matlab for fully automatic calibration of close-range digital cameras based on images of chess-boards FAUCCAL (Fully Automatic Camera Calibration) Implemented by Valsamis Douskos

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Optimal Porting of Embedded Software on DSPs

Optimal Porting of Embedded Software on DSPs Optimal Porting of Embedded Software on DSPs Benix Samuel and Ashok Jhunjhunwala ADI-IITM DSP Learning Centre, Department of Electrical Engineering Indian Institute of Technology Madras, Chennai 600036,

More information

1 Motivation for Improving Matrix Multiplication

1 Motivation for Improving Matrix Multiplication CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Measures of Dispersion

Measures of Dispersion Lesson 7.6 Objectives Find the variance of a set of data. Calculate standard deviation for a set of data. Read data from a normal curve. Estimate the area under a curve. Variance Measures of Dispersion

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Zomerlust Systems Design (CK1997/001363/23) trading as ZSD

Zomerlust Systems Design (CK1997/001363/23) trading as ZSD Zomerlust Systems Design (CK1997/001363/23) tradg as ZSD Unit D11, Clareview Busess Park 236 Lansdowne Rd fo@zsd.co.za http://www.zsd.co.za P.O. Box 46827 Glosderry, 7702 South Africa +27-21-683-1388 +27-21-674-1106

More information

Computing Discrete Hartley Transform Using Algebraic Integers

Computing Discrete Hartley Transform Using Algebraic Integers Computg Discrete Hartle Transform Usg Algebraic Integers Vassil Dimtrov and Ram Baghaie Helsi Universit of Technolog Department of Electrical and Communications Engeerg P.O. BOX, 215 HUT, Fland E-mail:

More information

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value. AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

3. Sequential Logic 1

3. Sequential Logic 1 Chapter 3: Sequential Logic 1 3. Sequential Logic 1 It's a poor sort of memory that only works backward. Lewis Carroll (1832-1898) All the Boolean and arithmetic chips that we built previous chapters were

More information

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0.

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0. 5 MEMORY Figure 5-0. Table 5-0. Listing 5-0. Overview The ADSP-2191 contains a large internal memory and provides access to external memory through the DSP s external port. This chapter describes the internal

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Outline: System Development and Programming with the ADSP-TS101 (TigerSHARC)

Outline: System Development and Programming with the ADSP-TS101 (TigerSHARC) Course Name: Course Number: Course Description: Goals/Objectives: Pre-requisites: Target Audience: Target Duration: System Development and Programming with the ADSP-TS101 (TigerSHARC) This is a practical

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

Static Analysis for Fast and Accurate Design Space Exploration of Caches

Static Analysis for Fast and Accurate Design Space Exploration of Caches Static Analysis for Fast and Accurate Design Space Exploration of Caches Yun iang, Tulika Mitra Department of Computer Science National University of Sgapore {liangyun,tulika}@comp.nus.edu.sg ABSTRACT

More information

DC57 COMPUTER ORGANIZATION JUNE 2013

DC57 COMPUTER ORGANIZATION JUNE 2013 Q2 (a) How do various factors like Hardware design, Instruction set, Compiler related to the performance of a computer? The most important measure of a computer is how quickly it can execute programs.

More information

Second Exam CS 1101 Computer Science I Spring Section 03 KEY. Thursday April 14, Instructor Muller Boston College

Second Exam CS 1101 Computer Science I Spring Section 03 KEY. Thursday April 14, Instructor Muller Boston College Second Exam CS 1101 Computer Science I Sprg 2016 Section 03 KEY Thursday April 14, 2016 Instructor Muller Boston College Before readg further, please arrange to have an empty seat on either side of you.

More information

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT 2.1 BRIEF OUTLINE The classification of digital imagery is to extract useful thematic information which is one

More information

Cache Performance II 1

Cache Performance II 1 Cache Performance II 1 cache operation (associative) 111001 index offset valid tag valid tag data data 1 10 1 00 00 11 AA BB tag 1 11 1 01 B4 B5 33 44 = data (B5) AND = AND OR is hit? (1) 2 cache operation

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

Instruction Set Reference

Instruction Set Reference .1 QUICK LIST OF INSTRUCTIONS This chapter is a complete reference for the instruction set of the ADSP-2100 family. The instruction set is organized by instruction group and, within each group, by individual

More information

Lesson 12 - Operator Overloading Customising Operators

Lesson 12 - Operator Overloading Customising Operators Lesson 12 - Operator Overloading Customising Operators Summary In this lesson we explore the subject of Operator Overloading. New Concepts Operators, overloading, assignment, friend functions. Operator

More information

Implementation Of Harris Corner Matching Based On FPGA

Implementation Of Harris Corner Matching Based On FPGA 6th International Conference on Energy and Environmental Protection (ICEEP 2017) Implementation Of Harris Corner Matching Based On FPGA Xu Chengdaa, Bai Yunshanb Transportion Service Department, Bengbu

More information

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Adil Gheewala*, Jih-Kwon Peir*, Yen-Kuang Chen**, Konrad Lai** *Department of CISE, University of Florida,

More information

An Efficient Vector/Matrix Multiply Routine using MMX Technology

An Efficient Vector/Matrix Multiply Routine using MMX Technology An Efficient Vector/Matrix Multiply Routine using MMX Technology Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

A Bayes Learning-based Anomaly Detection Approach in Large-scale Networks. Wei-song HE a*

A Bayes Learning-based Anomaly Detection Approach in Large-scale Networks. Wei-song HE a* 17 nd International Conference on Computer Science and Technology (CST 17) ISBN: 978-1-69-461- A Bayes Learng-based Anomaly Detection Approach Large-scale Networks Wei-song HE a* Department of Electronic

More information

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P SIGNAL COMPRESSION 9. Lossy image compression: SPIHT and S+P 9.1 SPIHT embedded coder 9.2 The reversible multiresolution transform S+P 9.3 Error resilience in embedded coding 178 9.1 Embedded Tree-Based

More information

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes Todd A. Whittaker Ohio State University whittake@cis.ohio-state.edu Kathy J. Liszka The University of Akron liszka@computer.org

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Ensemble registration: Combining groupwise registration and segmentation

Ensemble registration: Combining groupwise registration and segmentation PURWANI, COOTES, TWINING: ENSEMBLE REGISTRATION 1 Ensemble registration: Combining groupwise registration and segmentation Sri Purwani 1,2 sri.purwani@postgrad.manchester.ac.uk Tim Cootes 1 t.cootes@manchester.ac.uk

More information

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT 9.1 Introduction In the previous chapters the inpainting was considered as an iterative algorithm. PDE based method uses iterations to converge

More information

Texture Sensitive Image Inpainting after Object Morphing

Texture Sensitive Image Inpainting after Object Morphing Texture Sensitive Image Inpainting after Object Morphing Yin Chieh Liu and Yi-Leh Wu Department of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taiwan

More information

This article was origally published a journal published by Elsevier, and the attached copy is provided by Elsevier for the author s benefit and for the benefit of the author s stitution, for non-commercial

More information

PERFORMANCE ANALYSIS OF ALTERNATIVE STRUCTURES FOR 16-BIT INTEGER FIR FILTER IMPLEMENTED ON ALTIVEC SIMD PROCESSING UNIT

PERFORMANCE ANALYSIS OF ALTERNATIVE STRUCTURES FOR 16-BIT INTEGER FIR FILTER IMPLEMENTED ON ALTIVEC SIMD PROCESSING UNIT PERFORMANCE ANALYSIS OF ALTERNATIVE STRUCTURES FOR -BIT INTEGER FIR FILTER IMPLEMENTED ON ALTIVEC SIMD PROCESSING UNIT Grzegorz Kraszewski Białystok Technical University, Department of Electric Engineering

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency Math 14 Introductory Statistics Summer 008 6-9-08 Class Notes Sections 3, 33 3: 1-1 odd 33: 7-13, 35-39 Measures of Central Tendency odd Notation: Let N be the size of the population, n the size of the

More information

HUE PRESERVING ENHANCEMENT ALGORITHM BASED ON WAVELET TRANSFORM AND HUMAN VISUAL SYSTEM

HUE PRESERVING ENHANCEMENT ALGORITHM BASED ON WAVELET TRANSFORM AND HUMAN VISUAL SYSTEM International Journal of Information Technology and Knowledge Management July-December 011, Volume 4, No., pp. 63-67 HUE PRESERVING ENHANCEMENT ALGORITHM BASED ON WAVELET TRANSFORM AND HUMAN VISUAL SYSTEM

More information

Computer Organization CS 206 T Lec# 2: Instruction Sets

Computer Organization CS 206 T Lec# 2: Instruction Sets Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Contextual Analysis (2) Limitations of CFGs (3)

Contextual Analysis (2) Limitations of CFGs (3) G53CMP: Lecture 5 Contextual Analysis: Scope I Henrik Nilsson University of Nottgham, UK This Lecture Limitations of context-free languages: Why checkg contextual constrats is different from checkg syntactical

More information

Introduction to MiniSim A Simple von Neumann Machine

Introduction to MiniSim A Simple von Neumann Machine Math 121: Introduction to Computing Handout #19 Introduction to MiniSim A Simple von Neumann Machine Programming languages like C, C++, Java, or even Karel are called high-level languages because they

More information

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.

More information

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a computer? 3. What is the use of buffer register? 4. Define

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

How to Do Word Problems. Building the Foundation

How to Do Word Problems. Building the Foundation Building the Foundation The notion that Mathematics is a language, is held by many mathematicians and is being expressed on frequent occasions. Mathematics is the language of science. It is unique among

More information

Gibbs Sampling. Stephen F. Altschul. National Center for Biotechnology Information National Library of Medicine National Institutes of Health

Gibbs Sampling. Stephen F. Altschul. National Center for Biotechnology Information National Library of Medicine National Institutes of Health Gibbs Sampling Stephen F. Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Optimization in High-Dimensional Space Smooth and simple landscapes

More information

SYSTOLIC IMPLEMENTATION OF SAMPLE-BY-SAMPLE CONJUGATE GRADIENT ALGORITHM

SYSTOLIC IMPLEMENTATION OF SAMPLE-BY-SAMPLE CONJUGATE GRADIENT ALGORITHM SYSOLIC IMPLEMENAION OF SAMPLE-BY-SAMPLE CONJUGAE GRADIEN ALGORIHM Ram Baghaie Helski University of echnolog Laboratory of elecommunications echnolog P.O. Box 3 15 HU, Fland ABSRAC In this paper, we consider

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision report University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision Web Server master database User Interface Images + labels image feature algorithm Extract

More information

Reviewing High-Radix Signed-Digit Adders

Reviewing High-Radix Signed-Digit Adders .9/TC.4.39678, IEEE Transactions on Computers Reviewg High-Radix Signed-Digit s Peter Kornerup University of Shern Denmark Abstract Higher radix values of the form β = r have been employed traditionally

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page. CS 433 Homework 4 Assigned on 10/17/2017 Due in class on 11/7/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Accelerating 3D Geometry Transformation with Intel MMX TM Technology

Accelerating 3D Geometry Transformation with Intel MMX TM Technology Accelerating 3D Geometry Transformation with Intel MMX TM Technology ECE 734 Project Report by Pei Qi Yang Wang - 1 - Content 1. Abstract 2. Introduction 2.1 3 -Dimensional Object Geometry Transformation

More information

write-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF

write-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF write-through v. write-back option 1: write-through 1 write 10 to 0xABCD CPU Cache ABCD: FF RAM 11CD: 42 ABCD: FF 1 2 write-through v. write-back option 1: write-through write-through v. write-back option

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

A Genetic Algorithm for the Number Partitioning Problem

A Genetic Algorithm for the Number Partitioning Problem A Algorithm for the Number Partitiong Problem Jordan Junkermeier Department of Computer Science, St. Cloud State University, St. Cloud, MN 5631 USA Abstract The Number Partitiong Problem (NPP) is an NPhard

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Computer Architecture 2/26/01 Lecture #

Computer Architecture 2/26/01 Lecture # Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the

More information

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data Vector Processors A vector processor is a pipelined processor with special instructions designed to keep the (floating point) execution unit pipeline(s) full. These special instructions are vector instructions.

More information

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Orlando HERNANDEZ and Richard KNOWLES Department Electrical and Computer Engineering, The College

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1 Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that

More information

Vector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has.

Vector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has. ASSIGNMENT 0 Introduction to Linear Algebra (Basics of vectors and matrices) Due 3:30 PM, Tuesday, October 10 th. Assignments should be submitted via e-mail to: matlabfun.ucsd@gmail.com You can also submit

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR UNIT I Digital Systems: Binary Numbers, Octal, Hexa Decimal and other base numbers, Number base conversions, complements, signed binary numbers, Floating point number representation, binary codes, error

More information

University of California at Berkeley. D. Patterson & R. Yung

University of California at Berkeley. D. Patterson & R. Yung 1 University of California at Berkeley College of Engineering Computer Science Division { EECS CS 152 Fall 1995 D. Patterson & R. Yung Computer Architecture and Engineering Midterm I Solutions Question

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

6.004 Tutorial Problems L14 Cache Implementation

6.004 Tutorial Problems L14 Cache Implementation 6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to

More information

DESIGNER S NOTEBOOK Proximity Calibration and Test by Kerry Glover August 2011

DESIGNER S NOTEBOOK Proximity Calibration and Test by Kerry Glover August 2011 INTELLIGENT OPTO SENSOR Number 37 DESIGNER S NOTEBOOK Proximity Calibration and Test by Kerry Glover August 2011 Overview TAOS proximity sensors are very flexible and are used in many applications from

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information