Image and speech recognition Exercises

Size: px

Start display at page:

Download "Image and speech recognition Exercises"

Anna Harper
6 years ago
Views:

1 Image and speech recognition Exercises Włodzimierz Kasprzak v.3, 5 Project is co-financed by European Union within European Social Fund Exercises. Content. Pattern analysis (3). Pattern transformation (4) 3. Pattern classification (7) 3A. Pattern sequence recognition (3) 4. Iconic processing (7) 5. Boundary-based image segmentation (5) 6. Region-based image segmentation (3) 7. Object recognition (3) 8. Speech pre-processing (4) 9. Speech features (3). Phonetic model (3). Word and sentence recognition ()

2 E. Pattern analysis Project is co-financed by European Union within European Social Fund Task. 4 spectrogram images (of resolution 4 frames 6 frequencies) for spoken words "Koniec (top row) and "OK" (bottom row) are given: A) Propose three types of feature vectors (of small size) for a direct classification of these images. B) For one feature type provide an approximation of class prototypes and run a classification process by a geometric minimum-distance classifier. W. Kasprzak: IASR Exercise 4

3 Task. The intensity distribution in the image is modelled as a stochastic variable with the pdf p X (A X B) and cumulated density F X ( x ) = p X ( X x). Let two images of resolution 4 4 are given: () Show the pdf-s of intensity variables Assuming, that two original analogue-valued variables are available, get their pdf-s. () What are the means and variances of all these distributions? W. Kasprzak: IASR Exercise 5 Task.3 Signal digitalization problem: original analogue value f j, discrete value: h j. Digitalization error: n j = f j h j. The signal-to-noise ratio (SNR) expresses the digitalization quality. () Estimate the SNR value in case of signal digitalization with B bits. () What is the change of the signal-to-noise ratio if the number of bits is increased by? W. Kasprzak: IASR Exercise 6

4 E. Pattern transformation Project is co-financed by European Union within European Social Fund Task. (PCA) A set of -D features is given below. Define the PCA problem and obtain the particular PCA-based transformation for given set of features feature (,) (,4) (3,5) (,) (3,) (4,-) W. Kasprzak: IASR Exercise 8

5 Task. (LDA) A set of -D features from classes is given below. Define the LDA problem and obtain the particular LDA-based transformation for given set of features feature (,) (,4) (3,5) (,) (3,) (4,-) class W. Kasprzak: IASR Exercise 9 Task.3 (ICA) The following 3 discrete-time source signals are available, S = "Triangle signal", S ="Step signal", S3="Noise : time source S S S () Compute the normalized kurtosis of each source signal. () Make 3 instantaneous mixtures of sources by using the matrix: A 3 3 = (3) Compute the correlation factor of pairs of sources and pairs of mixtures. W. Kasprzak: IASR Exercise

6 Task.4* () (ICA quality measures) Assume that we have tested a blind signal separation method and the following data are available. The (usually unknown) 3 sources: Time Source S S S The 3 separated (output) signals: Time Source Y Y Y W. Kasprzak: IASR Exercise The (usually unknown) mixing matrix: The final de-mixing matrixw : Task.4* () W 3 3. =.. Express the quality of separated results if: 3 =... Only the outputs can be used.. If both the outputs and sources can be used. 3. If both the mixing and de-mixing matrix can be used. A 3 W. Kasprzak: IASR Exercise

7 E3. Pattern classification Project is co-financed by European Union within European Social Fund Task 3. Define and solve the optimisation problem required for a linear potential functions classifier (define and solve it analytically). Consider the case of -D feature vectors and classes if following learning samples (feature vectors) are given: i c i = (c i, c i ) (; ) (; 3) (; 4) (; ) (; ) (3; ) class Ω Ω Ω Ω Ω Ω W. Kasprzak: IASR Exercise 3 4

8 Task 3. Let learning samples from 3 classes are given in the R space (-D feature vectors): (x,y) (;,5) (.5; 3) (; 3,5) (; 3) (4; 4) (3; 5) (; ) (; ) (3; ) (;) Class (A) Design the following classifiers:. the geometric minimum-distance classifier,. the Bayes classifier, 3. the k-nn classifier (k=4, the 4-NN classifier). (B) Give classification results obtained by these three classifiers for a feature (;,5). W. Kasprzak: IASR Exercise 3 5 Task 3.3 Let the single observation of unknown state parameter x is z = x + w, where noise: w N(, σ ). A) What value takes the ML estimate? B) What value takes the MAP estimate? C) When is the MAP estimate equivalent to the ML one? W. Kasprzak: IASR Exercise 6

9 Let a sequence of observations of unknown (constant) state parameter x is given, as: z(t) = x + w(t), t=,...,n; where noise w N(,σ ). Task 3.4 A) What is the LS (least square) estimator? B) What is the MMSE (minimum mean square error) estimator? W. Kasprzak: IASR Exercise 7 Task 3.5 Define and solve the optimisation problem for a linear SVM: (A) Define the primary problem analytically and solve it by Matlab; B) Define the dual problem and solve it like in case A); C) Using a custom method (e.g. the brute-force method) check candidates for support vectors and define the separating plane; D)Illustrate the support vectors and separating plane in graphical form. Following learning samples (feature vectors) are given: i c i = (c i, c i ) (; ) (; 3) (; 4) (; ) (; ) (3; ) class Ω Ω Ω Ω Ω Ω Compare the result with the classifier designed in task 3.. W. Kasprzak: IASR Exercise 3 8

10 Task 3.6 Simulate the first iteration of the backpropagation learning algorithm of a two-layer perceptron (one hidden layer), with 3 neurones in the hidden layer and two inputs and outputs, if: the learning rate is.; sigmoid activation function: z=θ ( y) = initial weight matricesw () andw () : + exp( y) = first learning sample:x=[,]; required output: f(x ) = [, ]. The activation values can be approximated from the table: () W () W = y z=θ(y) y z=θ(y) W. Kasprzak: IASR Exercise 3 9 Task 3.7 Clustering. Let 5 training samples are available, as given in the table. () Skip the class information and simulate the process of k-means clustering show the code-book results after first iterations if running the clustering for (a) and (b) 4 classes. () Compare the clustering results with the class information given in the last column of the table. W. Kasprzak: IASR Exercise 3 Index Class

11 E3A. Pattern sequence recognition Project is co-financed by European Union within European Social Fund Task 3A. In the enclosed problem graph the decision costs are associated with links, while the letters accepted by nodes are given in brackets. Simulate the prior path search process for this graph (from start node S to goal node J8) by the method of: dynamic programming. W. Kasprzak: IASR Exercise 3A

12 Task 3A. Let an image-based text recognition system considers letters from a limited alphabet: { a, i, e, o, u, b, d, f, r}. Design the problem graph (model) needed for dynamic programming-based recognition of words road and four, that tolerates selected letter errors: substitution errors of vowels ( e instead of o, u instead of a and vice versa) and consonants ( f instead of r, b instead of d and vice versa ), deletion errors of inner-word letters and inner-word inclusion errors ( i can be added). For simplicity assume single-letter errors only, i.e. that an error cannot follow directly after another error. Give the decision graph in dynamic programming for both models if the following observation sequence is given: t letter # r o i u r W. Kasprzak: IASR Exercise 3A 3 Task 3A.3 Simplify the structure of problem graph defined in task 3A. by adding observation costs for every node s, i.e. turn the problem graph into a discrete Hidden Markov Model (with scalar costs of symbol emission per state). Remark: skip the requirement of single-letter errors only. W. Kasprzak: IASR Exercise 3A 4

13 E4. Iconic processing Project is co-financed by European Union within European Social Fund The ideal colours (blue, green, red, yellow, magenta, white, dark) and specific skin colours ( dark skin and light skin ) are defined in the YUV space as follows (usually applied in camera s colour calibration): Task 4. W. Kasprzak: IASR Exercise 4 6

14 Task 4. () Assume that in a computer program we represent all the colour coefficients in terms of 8 bits using unsigned integer values from the interval [, 55]:. Perform transformations between RGB and YUV colour spaces: for blue, green and red colours.. Find the linear Y-based normalization of the skin colour. Solution.. Transform the coefficients Y(U-8)(V-8) into RGB. Colors: YUV RGB Red (85,, 93) Green (7,, 95) Blue (66, 76, 3) W. Kasprzak: IASR Exercise 4 7 Task 4. Let an 8-level image of resolution 64 x 64 has the following histogram: Pixel value w Histogram value h(w) p w (w)? Get the pdf of variable w. Perform histogram stretching. Assume an appropriate cut-off threshold. Give graphical illustration of both distributions before and after the stretching operation. W. Kasprzak: IASR Exercise 4 8

15 Task 4.3 Let a 8-level image of resolution 64 x 64 has the following histogram: Pixel value w Histogram value h(w) p w (w) Perform histogram equalization. Give graphical illustration of both distributions before and after the equalization operation. Determine the number of distinct levels in the output image. W. Kasprzak: IASR Exercise 4 9 Task 4.4 Let the binary image of some character pattern be given, as specified below. The image coordinate system is also shown. Simulate the moment-based normalization procedure for this pattern. Determine intermediate transformations and show intermediate results. W. Kasprzak: IASR Exercise 4 3

16 Let the following image be given.. Determine the threshold for image binarization using the Otsu method. Show intermediate results.. Perform image thresholding. Task W. Kasprzak: IASR Exercise 4 3 Apply different edge operators (Roberts cross, central differences, Sobel, Scharr) to the two images given below. Obtain the images of edge strengths and orientations. Compare the results. ) ) Task W. Kasprzak: IASR Exercise 4 3

17 Simulate 3 different edge thinning procedures as applied to the results of the Sobel operator for the first image in task Task 4.7 W. Kasprzak: IASR Exercise 4 33 E5. Boundary-based image segmentation Project is co-financed by European Union within European Social Fund

18 Task 5. Simulate the process of chain detection by the method of "hysteresis thresholds" for the edge image below. Assume the following thresholds: lower T =, upper T = 3, T = 3 o. The edge orientation number O NUM = 3. y \ x : (s, r) 3 4, 8, 8, 6, 3, 8 34, 8 35, 9, 3 6, 37, 8 36, 9 33, 4 3, 4, 5, 8 3, 5, 3,, 7, 8 Output data: ) Chain image with segment indices starting from. ) Every segment has a start edge and an end edge. 3) Every edge element has a successor element (or zero). W. Kasprzak: IASR Exercise 5 35 Task 5. Let 8 following edge elements [e i =(x i, y i ), r(e i )] be given (where the number of discrete directions is set to: O NUM = 8), where r( ) orientation index: i x i y i r(e i ) Obtain points in Hough space (designed for straight line detection) that correspond to given edge elements. What lines can been detected in this particular Hough space. Give them in parametric line form: y = ax + b. W. Kasprzak: IASR Exercise 5 36

19 Task 5.3 Fill the Hough accumulator (designed for circle centre detection) if following edge elements are given: i x i y i (r(e)) o.. o o o o o Find the candidate for circle s centre point (x c, y c ). W. Kasprzak: IASR Exercise 5 37 Task 5.4 Let the following points in the Hough accumulator, created for circle centre detection, are given: i a i b i Get the line equation (in Hough space) best approximated (by the LSE approach) by above observations. To which circle centre point in image space this line corresponds? W. Kasprzak: IASR Exercise 5 38

20 Task 5.5 Simulate the contour recognition process by a generalized Hough Transform, given a rectangle-like contour model with smooth vertices and with basic size 3. The particular edge image is shown below 8 orientations are distinguished (O NUM =8) (orientations are given only). Define the contour model in terms of 8 border points, alternatively for two scales (s=,) and two rotations (ϕ= o, -45 o ). Solution: Centre point C = (4, 4). s=,ϕ= -45 o. y C x W. Kasprzak: IASR Exercise 5 39 E6. Region-based image segmentation Project is co-financed by European Union within European Social Fund

21 4 Let the following image of size 4 4 image is given. Simulate step-by-step the combined region detection procedure (i.e. with SPLIT-and-MERGE steps and with region merging steps) applied to this image with the thresholds: for split-and-mergeθ=4, for region mergingθ A = 3, and for region merging with adaptive thresholdθ(f) = 3 - F /6 * ( 3 - ). Task 6. W. Kasprzak: IASR Exercise Let a texture f is given (of size 5 5 and 4 intensity levels). Define the intensity co-occurrence matrices (ICM) G(,) and G(,) (i.e. distance one for horizontal and vertical direction) for the texture below. Calculate the contrast features for such ICM. Calculate the combined ICM G() from G(,) and G(,). Shift the texture cyclically by one column (f shifted ) or produce a mirror texture (f mirror ) w.r.t. the Y axis. Are there any differences of the ICM s for the shifted or mirrored texture? Task 6. W. Kasprzak: IASR Exercise 6 = 3 3 f = 3 3 f shifted = 3 3 f mirror

22 43 Let a texture f is given (of size 5 5 and 4 intensity levels). Obtain the histograms of sums and differences for the displacement vector v = (,) T. Shift the texture cyclically by one column and produce a mirror texture along the Y axis. Are there any differences of the sum and difference histograms if compared to the previous histograms? Task 6.3 W. Kasprzak: IASR Exercise 6 = 3 3 f = 3 3 f shifted = 3 3 f mirror E7. Object recognition Project is co-financed by European Union within European Social Fund

23 Task 7. (Calibration of the camera) Let 3 points in 3D space are specified in global coordinates (X i, Y i, Z i ): Point X i Y i Z i P - 7 P - 6 P3-5 Let the corresponding image points (x i, y i ) are: Image point x i y i p - 4 p 3 p3 Assuming, there is no rotation between global coordinates and camera coordinates and an unary scaling coefficient: W. Kasprzak: IASR Exercise 7 45 Task 7. (). Show, that the 3 pairs of points are sufficient for camera calibration;. Check, if it leads to a system of linear equations that have an unique solution (i.e. perform LSE minimization and check the system s determinant) 3. How many points are required if there is a non-zero rotation around the Z axis? W. Kasprzak: IASR Exercise 7 46

24 Task 7. Assume, a single observation vector is given,z() = [4, ] T, of a hidden constant state vector, s = [x, x ] T, for a system with the following observation model: z(t) = As+ w(t), t=,...,n; where. A=. Gaussian distributions of p(s) N([, ] T,P) and p(w) N(,R), with covariance matrices:.. P= R=.3.3 A) Derive the analytic solution of iterative MAP estimation. B) Use the above data to compute the MAP estimate s MAP (). W. Kasprzak: IASR Exercise 7 47 Task 7.3 () Propose a general object recognition strategy for combined topdown and bottom-up matching of model with image data, and express it in terms of A* search. Solution A) Define a model (problem graph) that contains two hierarchies:. a vertical (is-part-of) hierarchy. a horizontal (is-specialization-of) hierarchy. The IMAGE DATA correspond to leafs of such problem graph. B) Define several INFERENCE steps (i.e. ways of matching DATA to MODEL NODES) to create modified concepts Q(A) and INSTANCES I(A) of model concepts A W. Kasprzak: IASR Exercise 7 48

25 Task 7.3 () Solution (cont.) C) For a GOAL concept define costs for A* search: the costs of already matched parts and the expected costs of matching yet unmatched parts. For an instance of some intermediate GOAL define expected remaining costs of reaching an instance of the ULTIMATE GOAL concept. D) Consider the following iterative strategy:. Select an intermediate goal node A create Q(A).. Make top-down model expansion for Q(A) 3. Make bottom-up model instantiation based on DATA. 4. If I(A) is ultimate goal then RETURN(I(A)) else infer next goal A from I(A): create Q(A ), set A=A and repeat from step. W. Kasprzak: IASR Exercise 7 49 E8. Speech pre-processing Project is co-financed by European Union within European Social Fund

26 Task 8. The Euler formula. Using the Euler formula compute the frequencybased decomposition of the following function:, if t < π / f ( t) =, if π / < t < 3π /, if 3π / < t π Solution. We assume that the function f(t) is a π periodic function. Then the Fourier coefficients can be computed using the following formulas: π a k = f ( t)cos(πf kt) dt π π b k = f ( t)sin(πf kt) dt π a k = π π π f ( t)cos(πf kt) dt= [ cos(πf kt) dt+ cos(πf kt) dt+ cos(πf kt) dt] π 3π π π 3π W. Kasprzak: IASR Exercise 8 5 Task 8. () Solution (cont.) Observe - the base frequency is: b k a k sin( kt) = ( π k sin( kt) k sin( kt) + k π 3π π π 3π, a k = 4 kπ sin( ), kπ if k is even otherwise ) = f = π [ sin( kπ ) sin( 3 kπ )] kπ cos( kt) π cos( kt) 3π cos( kt) π = ( ) [ cos( = kπ ) cos( kπ π π )] = [ ] = π k k k kπ kπ W. Kasprzak: IASR Exercise 8 5

27 Task 8. Prove the following statements:. "The DFT transformation matrixd M is a symmetric and unitary matrix (i.e.d M D M* = MI).". "If the input signal x(t) has a real part only, then the vector F of Fourier coefficients is a symmetric conjugate around M/, i.e. F M/-k = F* M/+k, for k =,..., M/ -. " 3. "Furthermore for a real-valued x(t) if M is an even number then F and F M/ are real numbers too." W. Kasprzak: IASR Exercise 8 53 Task 8.3 Simulate the steps of FFT computation (with decimation in frequency domain) for M = 8 and the following signal x[n]: n x[n] W. Kasprzak: IASR Exercise 8 54

28 The following signal x[n] is given: Task 8.4 n signal x[n] n signal x[n] ) Compute this signal s auto-correlation factors r k, k=,,. ) Estimate the fundamental frequency of this signal, assuming a sampling frequency of sample / ms. W. Kasprzak: IASR Exercise 8 55 E9. Speech features Project is co-financed by European Union within European Social Fund

29 Task 9. A) Get the Fourier coefficients for the signal x[n], given below: n x[n] B) Get the mean value of x[n] and generate the zero-mean signal: x [n] = x[n] mean(x[n]). Compute the DFT(x ). Compare results with A). C) By padding with 8 zeros the above signal is extended to 6 samples: x[n] = [x[n] ], i.e. n x[n] Compute the DFT(x). Compare results with A). D) Obtain the mean value of x[n] and generate the zero-mean signal: x [n] = x[n] mean(x[n]). Compute the DFT(x ). Compare results with A). W. Kasprzak: IASR Exercise 9 57 Task 9. Compute MFC features for the following signal frame: n x[n] Assume that the sampling rate is 8 khz. Use 3 triangle filters uniformly located according to the Mel-scale. W. Kasprzak: IASR Exercise 9 58

30 59 Compute the set of 4 LPC features for the following signal frame: Define and solve the linear system given by a Toeplitz matrix: Task 9.3 W. Kasprzak: IASR Exercise 9 n x[n] = m m m m m m m r r r a a a r r r r r r r r r r r r M M L M M L L 3 E. Phonetic speech model Project is co-financed by European Union within European Social Fund

31 Task. Using the 4 features: F, F, tongue location and mouth opening, find the expected locations of cluster centers of vowel samples. W. Kasprzak: IASR Exercise 6 Task. Give (A) the phonetic description (pronunciation) in terms of phonemes and (B) the expected spectrograms, of following spoken words: () "ok.", () "three", (3) "end". Solution. A) Word ok three end Pronunciation /ou/ /k c / /k h / /ei/ /T/ /9r/ /i:/ /E/ /n/ /d/ W. Kasprzak: IASR Exercise 6

32 Task.3 Create a ten-digit word recogniser specify the following:. the pronunciations of considered words,. the classification of phonemes into 8 context categories, 3. the number of sub-phonemes for each considered phoneme, 4. all sub-phonemes needed for signal frame classification. W. Kasprzak: IASR Exercise 63 E. Word and sentence recognition Project is co-financed by European Union within European Social Fund

33 Task. () Define the HMM model for phonetic representation of spoken words "yes and no" in terms of sub-phonemes given below. Assume a discrete and scalar form of the observation model with single class-per-state only. The observations are summarized in a sub-phoneme probability matrix, given on next page. By the Viterbi search method find the best path in this observation matrix, w.r.t. the yes - and no - models. Specify step-by-step actions and results of the Viterbi searchprocess. W. Kasprzak: IASR Exercise 65 Task. () W. Kasprzak: IASR Exercise 66

34 Task. A) Design a Markow Model (HMM), required for the recognition of the written string "wtorek" in images. Assume a discrete vector form of the emission (output) model. B) Describe and compute a particular decision space that is built by a Viterbi search for given word model, if (during the segmentation process) five letters (segments) have been detected with following probabilities resulting from single-letter classification: letter \ segment w o r a t k..... e..... W. Kasprzak: IASR Exercise 67 Thank you Project is co-financed by European Union within European Social Fund 68

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company