Wavetable Matching of Pitched Inharmonic Instrument Tones

Size: px

Start display at page:

Download "Wavetable Matching of Pitched Inharmonic Instrument Tones"

Owen Wood
6 years ago
Views:

1 Wavetable Matching of Pitched Inharmonic Instrument Tones Clifford So and Andrew Horner Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong and Abstract Wavetable matching is the process of finding the parameters needed to resynthesize a musical instrument tone using wavetable synthesis. The most important parameters are the wavetable basis spectra. Previous wavetable matching has often assumed the original tone was harmonic or nearly harmonic. This assumption does not hold for instruments such as the plucked strings. A semi-automated process has recently been proposed to group the partials based on their normalized frequency deviations, and then match each group. However, the user must adjust the group size manually to find the best match. The method is also inefficient at matching harmonic instruments. This paper introduces a new method for hierarchically grouping partials with similar normalized frequency deviations. Ordinary wavetable matching then finds the basis spectra for individual groups. The new method is fully automatic and requires no prior knowledge about the inharmonicity of the tone. Results for 8 instrument tones show the new method improves the perceived match on harmonic tones and pitched inharmonic tones compared to ordinary wavetable matching. 0. Introduction The term wavetable synthesis is currently often used synonymously with sampling synthesis in the music industry. We will use the classical computer music meaning of wavetable synthesis throughout this paper, where an oscillator table is loaded with

2 several partials and indexed depending on the fundamental frequency. Several weighted wavetables can be summed to produce spectral changes in multiple wavetable synthesis. When the partials are disjoint (i.e. each partial only appears in only one of the wavetables), this is called group additive synthesis []-[3]. Recent work has shown how to match wavetable parameters for resynthesizing instrument tones [4]-[7]. One of the most successful of these methods, wavetable matching, picks spectral snapshots from the original tone to determine the wavetables (additive synthesis of the spectral snapshots loads the wavetables). This approach is intuitive, and guarantees an exact match at the timepoints of the selected spectral snapshots and excellent matches at neighboring points as well. Figure shows the multiple wavetable block diagram, where the output of each wavetable is scaled by an amplitude envelope and summed into the output signal. Wavetable matching occurs in the frequency domain and determines the spectra of the wavetables, called the basis spectra. Previous work on wavetable matching has used genetic algorithm (GA) optimization to determine the best time points to select the basis spectra. GAs are combinatorial optimization techniques modeled on the idea of evolving solutions through processes analogous to natural selection and breeding [8][9]. These evolutionary processes guide the parameter search using populations of potential solutions. To judge the fitness of a particular combination of basis spectra, the fitness function uses a least-squares solution to find the amplitude envelopes of each basis spectra, and an average relative amplitude error (RAE) to measure the spectral difference between the original and matched tones: RAE = N frames N frames! j = N hars! k = ( b N hars! k = # b" 2 k, j k, j ) b 2 k, j () where the N frames is the number of analysis frames, N hars is the number of partials, b k!, j is the kth partial amplitude of the synthesized tone at the jth analysis frame, and b, is the k j original tone s amplitude on the same partial and analysis frame. A relative amplitude 2

3 error of zero indicates an exact match, while an error of 0. indicates an average relative difference of 0%. Equation is not a perfect perceptual error measure, but it is reasonably accurate and computationally inexpensive. The GA uses the relative amplitude error as the fitness value of the candidate basis spectra, mixing and matching basis spectra combinations from the best individuals in the population, returning the best set of basis spectra as the solution. The cost of computing the relative amplitude error for each set of basis spectra can be held down by limiting the match points to a small number of representative test spectra (usually about 20). However, wavetable synthesis makes a basic assumption: the partials are harmonics, i.e., they have frequencies which are at integer multiples of the fundamental frequency ( f k = kf). This causes problems when modeling inharmonic instruments, such as the plucked strings. Wavetable matching generates a dynamic spectrum as close to the original as possible. However, it does not take into account the tone s inharmonicity (the amount of frequency deviation between the partials and integer multiples of the fundamental frequency). The degree of inharmonicity is dependent on the instrument [0][]. Figure 2 shows the normalized frequency deviations (NFD) for four acoustic instruments based on McAulay- Quatieri analysis [2]. We define the NFD for partial k as the following: NFD k = N frames! j= b ' f k, j % j ( & kf k, N frames! j= b k, j $ " # (2) where F is the fixed fundamental frequency (specified by the user prior to spectrum analysis), and f k, j is the actual time-varying frequency of partial k at analysis frame j. A value of 0.0 represents a +% frequency deviation. The equation weights the frequency deviation by the relative amplitude of each frame (i.e., high amplitude frames contribute more to the overall deviation than low amplitude frames). Figure 2 shows that the 3

4 plucked yangqin and zheng tones have larger and more variable frequency deviations than the trumpet and saxophone tones. Lee and Horner [3] introduced a group additive synthesis method for matching piano tones with stretched partials. Piano tones have simpler a inharmonicity than plucked string tones, because the amount of average frequency deviation increases in a fairly continuous way from partial to partial on the piano. This is unlike the zheng tone, where a partial may be very flat and its neighbor very sharp (see Figure 2). Such deviations require a special model to represent them accurately. So and Horner [4] recently introduced an inharmonic wavetable matching technique that classifies partials with similar normalized frequency deviations. The method then uses ordinary wavetable matching for each class. However, the method is only semi-automatic and requires user intervention to determine the class size. The user inputs the value by trial-and-error seeking the best match. The method is also inefficient for matching harmonic tones, using much more wavetables than ordinary wavetable matching without any significant improvement. This paper introduces a new wavetable matching technique that adapts wavetable matching to both harmonic and pitched inharmonic tones without prior knowledge about the inharmonicity of the tone. The new method uses a hierarchical grouping technique. Section describes the new method, Section 2 gives results for 8 instrument tones, and Section 3 concludes the paper.. Wavetable Matching of Pitched Inharmonic Tones A new version of wavetable matching is presented in this section. The idea is to hierarchically group the original tone s partials based on their normalized frequency deviation (NFD), and then determine the basis spectra and amplitude envelopes of individual groups using ordinary GA wavetable matching. The user specifies the number of wavetables (N tabs ), and the algorithm minimizes the relative amplitude error (RAE) and 4

5 normalized frequency deviation error (NFDE). The procedure is fully automatic and requires no intervention from the user about the inharmonicity of the tone, compared to the previous version of the algorithm [4]. Figure 3 outlines the method. First, a short-time Fourier analysis such as the phase vocoder [5][6] transforms the tone from the time domain to the frequency domain. If the tones has severe pitch changes we apply McAulay-Quatieri [2] analysis. Beauchamp [7] gives more details about these procedures. Next, a group decomposition tree of the partials is formed according to the normalized frequency deviation (NFD k ). Then, wavetables are distributed among the groups, forming a wavetable allocation tree. Ordinary GA-wavetable matching is applied to each group on each level of the tree. The result is a set of basis spectra and amplitude envelopes, along with their frequency error (NFDE) and amplitude error (RAE) for each group. The final parameters are selected from the level with the lowest overall frequency and amplitude error. Section. discusses the hierarchical grouping method for the partials. Section.2 shows how to distribute the wavetables among the various groups. Section.3 shows how to find the level with the lowest overall error, and Section.4 describes how to resynthesize the signal.. Hierarchical Grouping of the Partials A good grouping algorithm is critical to the overall procedure, because the individual frequency deviations of the partials will be replaced with a single group frequency deviation. Therefore, partials with similar deviations should be grouped together. The goodness of a grouping can be measured by its normalized frequency deviation error (NFDE), which is defined as: NFDE = N! groups i= NFDE 2 i = N groups!!( NFDk # GNFDi ) i= k" group i 2 (3) 5

6 where N groups is the total number of groups, NFDE i is the normalized frequency deviation error of group i, and GNFD i is the average NFD k for group i, which is defined as GNFD i = NFD k" group = i H i! k" group NFD i k (4) where H i is the number of partials in group i. The lower the NFDE, the better the grouping. We propose a hierarchical grouping algorithm that minimizes the NFDE for a given number of groups N groups. The steps are as follows: i) Let array T be a set of ordered pairs where the first number represents the partial number, and the second its frequency deviation NFD k, where k N hars. Sort T by the frequency deviation values in ascending order (i.e., the flatest deviation will be first in T, and the sharpest last). ii) Construct a set of edges E that connect all the adjacent ordered pairs of T, where e x = (t x, t x+ ) for x < N hars. Set the number of current existing groups c =. iii) Connected ordered pairs are in the same group. Pick the group m with the largest NFDE m. iv) Split group m into left and right groups by breaking one edge. Test all the edges in group m to find which edge gives the largest NFDE m reduction after its removal. That is, find the edge in group m that gives the largest difference: NFDE! ( NFDE + NFDE ). m left right v) c = c +. Stop the algorithm if N groups = c, otherwise go to step iii and continue. An example run of the algorithm is shown in Figures 4, 5, and 6 for a hypothetical instrument with ten partials. Figure 4 shows amplitude versus normalized frequency deviation for each partial. Figure 5 shows the sorted partials along the x-axis and the connected edges between them. The amplitude information is not shown in the graph, as it is unnecessary in the grouping algorithm (though it will be used in the wavetable allocation algorithm). Each step in Figure 5 shows the successive removal of edges: e 4 is removed first, e 7 is removed second, and so on. Figure 6 shows the corresponding 6

7 hierarchical grouping tree with the relative amplitude () of each group. When N groups =, no edge is removed and the partials form a single group {, 8, 2, 5, 9, 0, 3, 6, 7, 4}, and thus = 00%. When N groups = 2, removing e 4 results in the largest frequency error reduction and the groups {, 8, 2, 5} ( = 53.7%) and {9, 0, 3, 6, 7, 4} ( = 46.3%). When N groups = 3, removing e 7 results in the largest error reduction and breaks the second group into {9, 0, 3} ( = 2%) and {6, 7, 4} ( = 25.3%), and so on. At the bottom of the tree, with N groups = N hars = 0, no edge remains and every partial is in a group by itself..2 Allocating the Wavetables After forming the group decomposition tree, ordinary GA wavetable matching on each group in the tree determines the basis spectra and amplitude envelopes. But how to best distribute the wavetables among groups, given the total number of wavetables (N tabs )? We set the number of wavetables in group i (#wt i ) to be proportional to the relative amplitude ( i ) of group i, that is # wt! N (5) i i tabs where the relative amplitude ( i ) comes from the group decomposition tree. The number of group wavetables #wt i is rounded to the nearest integer. Equation 5 distributes the wavetables according to the relative amplitude of each group. Proportional allocation of the wavetables insures that groups with higher relative amplitudes have higher matching accuracy, helping to reduce the overall relative amplitude error (RAE) in Equation. Figure 7 shows the wavetable allocation tree for the example from Figures 4-6 with N tabs = 5. Since the maximum number of wavetables is five in this case, the wavetable allocation tree contains only five levels (with five groups in the leaf level). The single group at the root includes all five wavetables, whereas each leaf will contain only one wavetable. The groups in intermediate levels will be allocated a proportional number of wavetables. We also restrict #wt i, such that 7

8 #wt i i (6) The number of wavetables allocated in group i must be smaller than or equal to the number of partials in group i. It is unnecessary to have more wavetables than partials in a group since additive synthesis yields an exact match (a "perfect group"). Such an example occurs in the leftmost group of the forth level (N groups = 4) of the wavetable allocation tree in Figure 7. According to Equation 5, the leftmost group ( = 37.7%) has the highest relative amplitude and would contain two wavetables, while the second strongest group ( = 25.3%) would contain one wavetable. However, there is only one partial in the 37.7% group, and one wavetable is enough to perfectly resynthesize this group. The surplus wavetable is passed to the second strongest group (which contains three partials). If the second strongest group already has enough wavetables, the surplus wavetable is passed to the next strongest candidate, and so on. Each group in the wavetable allocation tree (i.e., each box in Figure 7) is then passed to a genetic algorithm for optimization of the wavetables and amplitude envelopes as shown in Figure 8. This yields the amplitude error (RAE) and frequency error (NFDE) of the group. Identical groups from different levels only require one optimization. The total amplitude error (RAE) and frequency error (NFDE) can then be calculated for each level of the tree using all the partials..3 Selecting the Best Grouping After obtaining the amplitude error (RAE) and frequency error (NFDE) for each level, we need to determine which level gives the best grouping. To do this, we plot RAE versus NFDE for all levels of the tree. Figure 9 shows a plot of the example in Figure 7. The curve starts at the lower-right corner for the root and extends to the upper-left corner of the leaf level. In this example, there are five points corresponding to the five levels of the tree. The number of points in the tree is always equal to N tabs. We need to find the point that gives the lowest amplitude and frequency error. However, the point with the lowest amplitude error may not be the point with the lowest frequency 8

9 error. The frequency error always decreases with more groups, but the amplitude error may not (in the example in Figure 9 the amplitude error decreases up to three groups, and increases with more groups). One method of reconciling these two criteria is to select the point closest to the origin. Measuring the normalized Euclidian distance to the origin can do this. We term this distance the matching quality (MAQ) of a point, and defined it as: 2 2 & NFDE # & RAE # MAQ = $! + $! (7) % maxnfde " % maxrae " where the maxnfde and maxrae are the maximum NFDE and RAE of the line. A normalized distance is preferable to an absolute distance because the NFDE and RAE are on different scales. In the example in Figure 9, the values of NFDE are ten times smaller than RAE. That means the RAE will dominate the MAQ if the absolute values are used. For the example in Figure 9, the third level point (N groups = 3) is selected as the best grouping since it has the lowest MAQ (it is a little better than the fourth level point)..4 Resynthesizing the Signal To resynthesize the signal, we synthesize each group x i (t) using the ordinary multiple wavetable synthesis and sum all the group signals to produce the final synthetic signal x(t). To synthesize the group signal x i (t), every wavetable in group i uses a common timevarying fundamental frequency g i (t) defined as: ' & b # $ k, t NFDk ( t)! k( Groupi g = ( + ) = $ +! i ( t) F vi ( t) F (8) $! ' bk, t % k( Groupi " where F is the fixed fundamental frequency (specified by the user prior to spectrum analysis), and v i (t) is the time-varying NFD of group i. v i (t) is equal to the weighted-nfd of the group's partials. The weighting ensures that the resulting NFD will not be adversely affected by low amplitude partials, which are often noisy (the same situation as 9

10 in Equation 2). We do not use the GNFD i defined in Equation 4 to calculate g i (t), because GNFD i is only a single value rather than a time-varying value. We synthesize the group signal x i (t) using multiple wavetable synthesis. # wt i! x ( t) = w ( t) y ( t (9) i j= i, j i, j ) where y i,j (t) is the time-varying j-th wavetable for group i, w i,j (t) is the amplitude envelope of the j-th wavetable for group i, and #wt i is the number of wavetables in group i. Finally, we sum the signals of each group x i (t) to produce the resynthesized signal x(t). N groups! x( t) = x i ( t) (0) i= 2. Results This section compares the results of the new method to the original wavetable matching method. Eighteen instrument tones have been tested and are categorized as either harmonic tones (overall inharmonicity %) or pitched inharmonic tones (overall inharmonicity > %). We define the inharmonicity of a tone as its normalized frequency deviation error (NFDE) when N tabs =. Seven of the eighteen instruments are harmonic. They are mainly the woodwinds, brass and bowed strings (saxophone, shakuhachi, trumpet, violin, erhu, guitar, and piano). The remaining eleven instruments are inharmonic. They are mostly plucked string and pitched percussion (pipa, liuqin, zheng, qin, yangqin, glockenspiel, bell, qing, pitched gong, ling, and cello). Non-pitched instruments, such as the cymbal, are not included because they are unsuitable for wavetable matching, and Section 2. shows the harmonic results. Section 2.2 shows the inharmonic results. Section 2.3 shows the listening test results. 0

11 2. Harmonic Results We judge the quality of the matched tone by its amplitude error (RAE) and frequency error (NFDE). Figures 0 and show the amplitude error results. The tones are listed from left to right in order of decreasing inharmonicity. For each tone in the original method, the amplitude error decreases exponentially with an increasing number of wavetables (see Figure 0). In the new method, however, the amplitude error sometimes drops more rapidly than in the original wavetable method (see Figure ). The most obvious example is the guitar, which drops to nearly zero when N tabs = 5. Another example is the shakuhachi. When N tabs is large, there are more perfect groups. Recall that if #wt i = i (the number of wavetables equals the number of partials in group i), additive synthesis generates a perfect match. With more perfect groups, the overall amplitude error drops faster. The dropping rate actually depends on the relative amplitudes of the perfect groups. Stronger perfect groups reduce the error faster (as in the guitar and shakuhachi). Figures 2 and 3 show the frequency error results. The tones are listed in the same order. For every tone in the original method, the NFDE must stay the same because there are no groups (see Figure 2). On the other hand, the new method improves the NFDE with increasing N tabs (see Figure 3). Groups allow better NFD resolution. All the frequency errors shown in Figures 2 and 3 are below %, indicating that all the tones are near-harmonic and the original method is a good enough model. Nevertheless, the new method further improves the frequency error, although the improvement may not be perceptible. To summarize the harmonic tones results, the new method slightly improves on the original wavetable method for the same amount of resources. 2.2 Inharmonic Results Figures 4 and 5 show the amplitude errors (RAE) for each tone. The tones are listed from left to right by decreasing inharmonicity. The amplitude error of the original

12 wavetable method drops exponentially in Figure 4. It reaches zero for the qing, bell and ling when N tabs = 2 = N hars (it is the same as additive synthesis). On the other hand, the amplitude error of the new method drops significantly faster in Figure 5. The bell error drops the fastest, and approaches zero when N tabs = 6. The pipa error drops the slowest, and is similar to the original method. The bell has more perfect groups and they are stronger than in the other instruments. The pipa contains perfect groups, but they are all weak. Figures 6 and 7 show the resulting frequency errors (NFDE) for each tone. The tones are listed in the same order from left to right. Figure 6 shows that the frequency errors of all the tones stay the same with the original wavetable method. This is because the original wavetable method is unable to divide the partials into groups to give better NFD resolution. On the other hand, the new method lowers the overall frequency error (see Figure 7). 2.3 Listening Test Results To further evaluate our new method versus the original method, we carry out a listening test. The qin, yangqin and violin tones are matched with 7 wavetables in our test with to 7 number of groups according to the hierarchical grouping method in Section.. 3 subjects are given the 7 synthesized tones and the real tone (filtered to 3 partials) and asked to vote for the best 2 tones out of the 7 tones that sound closest to the real tone. They can also vote for the same tone twice if it sounds distinguishably the best. They are given a high quality headphone for the test and they can listen to the tones as many times as they want for the judgement. Figure 8, 9 and 20 show the vote results for the qin, yangqin and violin respectively. Note that the group indicates the original method and the circled group indicates the error-balanced group indicated from our MAQ function. The number of votes generally conform to our MAQ function. The subjects highly prefer the tones near the MAQ group. While nearly no votes are given to the number of group. That shows a clear quality 2

13 improvement over the original method. Another interesting result is that the votes are so concentrated in the qin (Figure 8) but spread across in the violin (Figure 20). Feedbacks from the subjects are that the qin tones are very easy to hear the difference but all the violin tones sound very close. That conforms to the fact that the violin tone is a nearlyharmonic tone with inharmonicity 0.29%, further improvement to it is more difficult to be heard, contrasting to the qin having initial frequency error 5.8% be reduced to 0.7%. 3. Conclusion The original wavetable matching method cannot effectively simulate pitched inharmonic tones such as the plucked strings. To remedy this problem, a new technique has been introduced to adapt the original method to pitched inharmonic tones. The new method hierarchically separates the partials into groups based on their normalized frequency deviations. Wavetables are then allocated to each group and ordinary wavetable matching is performed on each group. The final parameters are those with the lowest combined amplitude and frequency errors. The new method requires no user intervention, unlike the previous version of the algorithm [4], and is fully automatic. Eighteen instrument tones from the harmonic and pitched inharmonic families were tested. The results show that the new method is as good as the original method for harmonic tones, and better for inharmonic tones. The harmonic matches require fewer wavetables than inharmonic matches which require more wavetables to resolve the frequency error. The new method has its limits, and cannot handle non-pitched instruments such as the cymbal or environmental sounds. Their partials are noisy and unstable, and thus unsuitable for wavetable matching. The new method is well-suited to existing music synthesis hardware running multiple wavetable synthesis in realtime. The new method expands the range of the instruments 3

14 possible in wavetable synthesis from wind and bowed string instruments, to also include plucked string and pitched percussion instruments (which are usually modelled by PCM sampling). The optimal parameters selected by the new method are small in size, and efficiently stored. 4

15 Bibliography [] P. Kleczkowski, Group Additive Synthesis, Computer Music Journal, vol. 3, no., pp (989). [2] N.-M. Cheung and A. B. Horner, Group Synthesis with Genetic Algorithms, Journal of Audio Engineering Society, vol. 44, pp (996 Mar.). [3] S. Oates and B. Eaglestone, Analytical Methods for Group Synthesis, Computer Music Journal, vol. 2, no. 2, pp (997). [4] M.-H. Serra, D. Rubine and R. Dannenberg, Analysis and Synthesis of Tones by Spectral Interpolation, Journal of Audio Engineering Society, vol. 38, pp. -28 (990 Ma.). [5] A. B. Horner, J. Beauchamp, and L. Haken, Methods for Multiple Wavetable Synthesis of Musical Instrument Tones, Journal of Audio Engineering Society, vol. 4, pp (993 May). [6] A. B. Horner and J. Beauchamp, Synthesis of Trumpet Tones Using a Wavetable and Dynamic Filter, Journal of Audio Engineering Society, vol. 43, pp (995 Oct.). [7] A. B. Horner, Computation and Memory Tradeoffs with Multiple Wavetable Interpolation, Journal of Audio Engineering Society, vol. 44, pp (996 June). [8] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press. [9] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Reading, MA: Addison-Wesley. [0] S. Dubnov and N. Tishby, Influence of Frequency Modulating Jitter on Higher Order Moments of Sound Residual with Applications to Synthesis and Classification, Proceedings of the 996 ICMC, pp (996). [] J. C. Brown, Frequency Rations of Spectral Components of Musical Sounds, Journal of Acoustical Society of America, vol. 99, pp (996). [2] R. McAulay and T. Quatieri, Speech Analysis/Synthesis based on a Sinusoidal Representation, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp (986). 5

16 [3] K. Lee and A. B. Horner, Modeling Piano Tones with Group Synthesis, Journal of the Audio Engineering Society, vol. 47, pp. 0- (999 Mar.). [4] C. So and A. B. Horner, Wavetable Matching of Inharmonic String Tones, Journal of the Audio Engineering Society, vol. 50, pp (2002 Jan./Feb.). [5] J. B. Allen, Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier transform, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp (986). [6] M. Dolson, The Phase Vocoder: A Tutorial, Computer Music Journal, vol. 0, no. 4, pp (986). [7] J. Beauchamp, Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds, presented at the 94th Convention of the Audio Engineering Society, Journal of the Audio Engineering Society (Abstracts), vol. 4, preprint no. 3479, pp. 387 (993 May). 6

17 List of Figures Figure. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Multiple wavetable synthesis block diagram. Normalized frequency deviations of partials of the saxophone (F#4), trumpet (F4), zheng (C#4), and yangqin (C4). Wavetable matching algorithm for pitched inharmonic tones. Normalized frequency deviations for a hypothetical instrument with ten partials. The graph shows amplitude versus normalized frequency deviation (NFD) for each partial (the labels indicate the partial numbers). Increasing the number of groups by the successive removal of edges (for the same example shown in Figure 4). The partials are sorted by their NFD k and labelled on the x-axis. Connected edges are part of the same group. The process stops when all edges are removed, in this case, where N groups = N hars = 0. The first five levels of the hierarchical grouping tree (for the example shown in Figures 4 and 5). Each group is represented by a box with the number of partials () and the relative amplitude () of the group. One group is split at each descending level. The wavetable allocation tree for the example from Figures 4-6 with total number of wavetables (N tabs ) = 5. Each group is represented by a box with the number of partials (), relative amplitude () and the number of allocated wavetables (#wt). Genetic algorithm optimization of the wavetable basis spectra and amplitude envelopes for a group with three partials. The number of wavetables (#wt = 2) is set in the wavetable allocated tree. RAE versus NFDE for the example in Figure 7. maxnfde and maxrae are.0 and.05 respectively. The third level (N groups = 3) gives the best overall matching quality (MAQ). Figure 0. Original wavetable matching method: The plot of instrument inharmonicity versus amplitude error (RAE) on seven near-harmonic tones. Figure. New method: The plot of instrument inharmonicity versus amplitude error (RAE) on seven near-harmonic tones. Figure 2. Original wavetable matching method: The plot of instrument inharmonicity versus frequency error (NFDE) on seven near-harmonic tones. 7

18 Figure 3. New method: The plot of instrument inharmonicity versus frequency error (NFDE) on seven near-harmonic tones. Figure 4. Original wavetable matching method: The plot of instrument inharmonicity versus amplitude error (RAE) on eleven inharmonic tones. Figure 5. New method: The plot of instrument inharmonicity versus amplitude error (RAE) on eleven inharmonic tones. Figure 6. Original wavetable matching method: The plot of instrument inharmonicity versus frequency error (NFDE) on eleven inharmonic tones. Figure 7. New method: The plot of instrument inharmonicity versus frequency error (NFDE) on the eleven inharmonic tones. Figure 8. Number of votes by the subjects for the qin tone with 7-wavetable match from to 7 groups. The 6-groups is selected as the best error-balanced grouping from our MAQ function. Figure 9. Number of votes by the subjects for the yangqin tone with 7-wavetable match from to 7 groups. The 5-groups is selected as the best error-balanced grouping from our MAQ function. Figure 20. Number of votes by the subjects for the violin tone with 7-wavetable match from to 7 groups. The 5-groups is selected as the best error-balanced grouping from our MAQ function. 8

19 f ( ) t.... w ( ) w ( ) t! 2 t w N (t)....!! + x (t) w i (t) amplitude envelope of ith table at time t f ( t ) fundamental frequency at time t x (t) sample output at time t N number of wavetables Figure. Multiple wavetable synthesis block diagram. 9

20 2% Normalized Frequency Deviation (NFD) % 0% -% -2% -3% -4% -5% -6% -7% % Partials Saxophone Trumpet Zheng Yanqin Figure 2. Normalized frequency deviations of partials of the saxophone (F#4), trumpet (F4), zheng (C#4), and yangqin (C4). 20

21 Original Tone Short-time Spectrum Analysis Time-varying frequency spectra Hierarchical grouping of partials based on their NFD k Group decomposition tree Wavetable allocation based on N tabs Wavetable allocation tree For each level in the tree Ordinary GA-based wavetable matching Frequency Error (NFDE) and litude Error (RAE) No Processed all the levels? Yes The parameters with the lowest NFDE and RAE Figure 3. Wavetable matching algorithm for pitched inharmonic tones. 2

22 litude % -8% -6% -4% -2% 0% 2% 4% 6% 8% 0 Normalized Frequency Deviation (NFD) Figure 4. Normalized frequency deviations for a hypothetical instrument with ten partials. The graph shows amplitude versus normalized frequency deviation (NFD) for each partial (the labels indicate the partial numbers). 22

23 NFD k NFD k NFD k e e 3 e 4 e 5 e 6 e 7 e 8 e 9 8 e % -8% -6% -4% -2% 0% 2% 4% 6% 8% e e 2 e 3 e 5 e 6 e 7 e 8 e % -8% -6% -4% -2% 0% 2% 4% 6% 8% e % -8% -6% -4% -2% 0% 2% 4% 6% 8% 3 e e 3 e 5 e 6 e 8 e N groups = N groups = 2 N groups = 3 NFD k e N groups = 9-0% -8% -6% -4% -2% 0% 2% 4% 6% 8% NFD k N groups = N hars = 0-0% -8% -6% -4% -2% 0% 2% 4% 6% 8% Figure 5. Increasing the number of groups by the successive removal of edges (for the same example shown in Figure 4). The partials are sorted by their NFD k and labelled on the x-axis. Connected edges are part of the same group. The process stops when all edges are removed, in this case, where N groups = N hars = 0. 23

24 0 00% N groups = p 53.7% % N groups = 2 p 4 3 p53.7% p 3 3 p2.0% p N groups = % 37.7% p6.0% p2.0% p25.3% N groups = p p37.7% p6.0% p0.4% p20.6% p25.3% Figure 6. The first five levels of the hierarchical grouping tree (for the example shown in Figures 4 and 5). Each group is represented by a box with the number of partials () and the relative amplitude () of the group. One group is split at each descending level. 3 3 N groups = 4 24

25 0 00% N groups = #wt % % N groups = 2 #wt 3 #wt % 3 2.0% % N groups = 3 #wt 3 #wt #wt N groups = % 6.0% 2.0% 25.3% #wt #wt #wt #wt 2 N groups = % 6.0% 0.4% 20.6% 25.3% #wt #wt #wt #wt #wt Figure 7. The wavetable allocation tree for the example from Figures 4-6 with total number of wavetables (N tabs ) = 5. Each group is represented by a box with the number of partials (), relative amplitude () and the number of allocated wavetables (#wt). 25

26 Given: #wt 3 2.0% 2 Wavetable and its amplitude envelope GA wavetable optimization Wavetable 2 and its amplitude envelope Group spectrum with three partials Figure 8. Genetic algorithm optimization of the wavetable basis spectra and amplitude envelopes for a group with three partials. The number of wavetables (#wt = 2) is set in the wavetable allocated tree. 26

27 8% 7% 5th level maxnfde 6% litude Error (RAE) 5% 4% 3% 2% 4th level 3rd level 2nd level Root level maxrae best MAQ % 0% 0.00% 0.05% 0.0% 0.5% 0.20% 0.25% 0.30% 0.35% 0.40% 0.45% 0.50% Frequency Error (NFDE) Figure 9. RAE versus NFDE for the example in Figure 7. maxnfde and maxrae are.0 and.05 respectively. The third level (N groups = 3) gives the best overall matching quality (MAQ). 27

28 40% 35% Piano litude Error (RAE) 30% 25% 20% 5% 0% Erhu Guitar Shakuhachi Violin Trumpet Saxophone 5% 0% 0.854% 0.566% 0.528% 0.404% 0.29% 0.66% 0.30% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 0. Original wavetable matching method: The plot of instrument inharmonicity versus amplitude error (RAE) on seven near-harmonic tones. 28

29 40% 35% Piano litude Error (RAE) 30% 25% 20% 5% 0% Erhu Guitar Shakuhachi Violin Trumpet Saxophone 5% 0% 0.854% 0.566% 0.545% 0.404% 0.29% 0.66% 0.30% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure. New method: The plot of instrument inharmonicity versus amplitude error (RAE) on seven near-harmonic tones. 29

30 .% Frequency Error (NFDE).0% 0.9% 0.8% 0.7% 0.6% 0.5% 0.4% 0.3% 0.2% Erhu Piano Guitar Shakuhachi Violin Trumpet Saxophone 0.% 0.0% 0.854% 0.566% 0.528% 0.404% 0.29% 0.66% 0.30% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 2. Original wavetable matching method: The plot of instrument inharmonicity versus frequency error (NFDE) on seven near-harmonic tones. 30

31 Frequency Error (NFDE).%.0% 0.9% 0.8% 0.7% 0.6% 0.5% 0.4% 0.3% 0.2% 0.% 0.0% Erhu Piano Guitar Shakuhachi Violin Trumpet Saxophone 0.854% 0.566% 0.545% 0.404% 0.29% 0.66% 0.30% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 3. New method: The plot of instrument inharmonicity versus frequency error (NFDE) on seven near-harmonic tones. 3

32 litude Error (RAE) 55% 50% 45% 40% 35% 30% 25% 20% 5% 0% 5% 0% Pipa Qin Zheng Cello Glockenspiel Yangqin Qing Bell Liuqin Gong Ling 5.2% 3.8% 3.7% 2.7% 2.6% 2.3%.8%.8%.5%.5%.4% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 4. Original wavetable matching method: The plot of instrument inharmonicity versus amplitude error (RAE) on eleven inharmonic tones. 32

33 litude Error (RAE) 55% 50% 45% 40% 35% 30% 25% 20% 5% 0% 5% 0% Pipa Qin Zheng Cello Glockenspiel Yangqin Qing Bell Liuqin Gong Ling 5.2% 3.8% 3.7% 2.7% 2.6% 2.3%.8%.8%.5%.5%.4% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 5. New method: The plot of instrument inharmonicity versus amplitude error (RAE) on eleven inharmonic tones. 33

34 Frequency Error (NFDE) 6% 5% 4% 3% 2% % 0% 9% 8% 7% 6% 5% 4% 3% 2% % 0% Qin Zheng Yangqin Pipa Qing Bell Cello Gong Glockenspiel Liuqin Ling 5.2% 3.8% 3.7% 2.7% 2.6% 2.3%.8%.8%.5%.5%.4% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 6. Original wavetable matching method: The plot of instrument inharmonicity versus frequency error (NFDE) on eleven inharmonic tones. 34

35 Frequency Error (NFDE) 6% 5% 4% 3% 2% % 0% 9% 8% 7% 6% 5% 4% 3% 2% % 0% Qin Zheng Yangqin Pipa Qing Bell Cello Gong Glockenspiel Liuqin Ling 5.2% 3.8% 3.7% 2.7% 2.6% 2.3%.8%.8%.5%.5%.4% Instrument Inharmonicity Wavetable 2 Wavetables 3 Wavetables 4 Wavetables 5 Wavetables 6 Wavetables 7 Wavetables 8 Wavetables 9 Wavetables 0 Wavetables Wavetables 2 Wavetables Figure 7. New method: The plot of instrument inharmonicity versus frequency error (NFDE) on the eleven inharmonic tones. 35

36 4 2 Number of Votes Number of Groups Figure 8. Number of votes by the subjects for the qin tone with 7-wavetable match from to 7 groups. The 6-groups is selected as the best error-balanced grouping from our MAQ function. 36

37 4 2 Number of Votes Number of Groups Figure 9. Number of votes by the subjects for the yangqin tone with 7-wavetable match from to 7 groups. The 5-groups is selected as the best error-balanced grouping from our MAQ function. 37

38 4 2 Number of Votes Number of Groups Figure 20. Number of votes by the subjects for the violin tone with 7-wavetable match from to 7 groups. The 5-groups is selected as the best error-balanced grouping from our MAQ function. 38

Optimized Multiple Wavetable Interpolation

Optimized Multiple Wavetable Interpolation JONATHAN MOHR Augustana Faculty University of Alberta Camrose, Alberta CANADA http://www.augustana.ca/ mohrj / XIAOBO LI Department of Computing Science University