University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics ESE250 Spring 2013 Lab 7: Psychoacoustic Compression Friday, February 22, 2013 For Lab Session: Thursday, February 28, 2013in Detkin Lab. Due: Wednesday, March 13, 2013by 12:00pm. Collaboration: Work in lab in teams of 2. Perform individual writeups. See course collaboration policy in the Administrative Handout. Objective: Appreciate and experience sampling rates; understand aliasing. Prelab Requirements: Download and read this entire lab assignment. Complete the prelab questions at the bottom of the document. Bring your headphones. Optional but highly encouraged, encode a track from one of your music CDs into a.wav file sampled at 44.1KHz, 16bits. 1 Deliverable: Names of all lab group members Answers to all lab questions (including prelab question) Size vs. Quality plots Handin: All labs will be turned in electronically through the Penn Blackboard website. Go to the assignment submission link and follow the instructions. Please submit all your files as one zip file. Also, remember that your writeup should be a PDF file. Exit Ticket: Play for your TA a bad quality and low quality encoded audio file. 1 Make sure you encode it from a CD, re-encoding an existing.mp3 into a.wav will not be useful for the lab. 1

Motivation Labs 2 through 6 introduced you to many digital audio concepts from sampling to compression. In this lab we will use what you ve learned to experiment with a simplified lossy audio encoder modeled after an mp3 encoder. The encoder operates on one frame at a time consisting of 1152 samples. Each frame is processed through 4 stages: Sampling which consists of reading the audio file (week 2 and 6). DFT where the signal is transformed into (1152 2) = 576 frequencies in the frequency domain (week 4). Quantization where frequencies are binned and using a psychoacoustic model, some frequencies are removed. This is the lossy step of the process (week 5). Lossless Compression where, finally, the remaining signal is compressed (week 3). This encoder differs slightly from a standard mp3 encoder in the quantization step. In this step, a standard mp3 encoder is faced with a difficult challenge: it has a strict limit on the number of bits it can use to describe the frame, but it must maintain as high an audio quality as it can. It iterates until it finds a balance between adhering to the psychoacoustic model and meeting the bit requirement by deciding how many frequencies to keep and how to quantize the amplitudes. Our encoder does not have a strictly enforced bit limit, its only requirement is to produce high quality audio using as few bits as it can. Also, it does not quantize the amplitudes but rather only removes frequencies based on a policy you define. Another significant difference between our encoder and the standard mp3 encoder is that a standard encoder considers what policy to use for each individual frame and each individual critical band within each frame. Our encoder operates with a global policy over all frames and all bands, this makes it feasible for you to appreciate how different policies work rather than having to deal with the tedious problem of manually coming up with a policy for each frame and each band. Lab Procedure Guide This lab is not particularly complex, but it may take some time to complete. You should keep track of time as you work through the lab and keep in mind that since many of the experiments are subjective, you don t have to exhaustively search the space for the answer. Quantization Filters During quantization, the 576 frequencies are binned into 25 critical bands as explained in lecture 6. One or more quantization filters are then applied. Based on these, some frequencies are thrown 2

out. As we saw in week 6, we can do this because there are frequencies that we can t hear, either because they are too soft or because they are masked by other tones. There are four filters we can independently apply: a. Limit Number of Critical Bands allows us to choose how many of the 25 critical bands to keep. If less than 25 bands are specified, it will remove the higher frequency bands starting from the 25th down (i.e. if we specify 22 bands, we will throw out the 23rd, 24th and 25th bands) b. Limit Tones Overall disregards the binning of all 576 frequencies. Instead, it keeps the n All frequencies that have the highest amplitudes. c. Limit Tones by Band operates on each band individually, and keeps the n band frequencies that have the highest amplitude in that band. d. Threshold by Band operates on each band individually, and keeps all the frequencies that are within d decibels of the loudest tone in the band. This allows the total number of frequencies kept to differ between bands based on how many loud frequencies the band has. It also approximates how knowledge of masking may be used to remove masked frequencies. File encoding For the first part of the lab, you will encode a.wav file using six different quantization policies (i.e. combinations of quantization filters) and plot the encoded file size vs. audio quality. In the second part, for each quantization policy used, you will pick an encoding at the same quality level and explore what allows the different encodings to be at the same audio quality but vary in file size. Refer to the section LabView Encoder and Decoder to learn how to use the encoder and decoder we will use for these experiments. For each policy, you will need to find four data points relating quality to size a good quality point, two intermediate points and a bad quality point. Figure 1 shows what plotting these points for one policy may look like. Good quality point: This point is defined as the smallest file size that still sounds like something you would not mind listening to. It also means that any point that yields a higher file size should have the same audio quality as this point. Bad quality point: This point is the largest file size that produces an audio quality that is unclear and maybe even unpleasant to listen to. Any point that produces a smaller file size using the same policy should also have bad audio quality. 2 intermediate quality point These are two points between the good and bad points. To measure the quality we will use the absolute grade industry standard scale in figure 2. It is quantized from 1 to 5 where 1 is the quality of the bad point and 5 is the quality of the good point. 3

Figure 1: File size vs. Perceived Quality plot In order to complete the lab in the alloted time, you should only encode 15 to 20 seconds of audio. You should pick one.wav file and decide which part to encode. If you don t have a.wav file, you can download one from the course documents the ESE250 schedule webpage. Also, you can multi-task and run the encoder and decoder at the same time, so while you are waiting for the encoder to encode a new file, you can listen to the results of the file encoded previously. Pick a song and a duration that you wish to encode. Make sure to keep the parameters you just chose constant as you work through the rest of this lab. Data to Record For each of the four points in each policy (including the baseline), record: Frequencies overall kept (for policy 1, also record the number of bands kept). Rememeber that this is a parameter of your filters. For example, if you are using the Limit Tones Overall filter, then you record the number of frequencies kept over all bands, whereas if you are using the Limit Tones By Band filter, you record the number of frequencies kept in a band. Number of non-zero frequencies reported over the entire snippet encoded. Size in bytes (not kilobytes) of the encoded files. Quality on the defined scale from 1 (bad) to 5 (good) of the encoded files. Note: The baseline will only have one point since we are not applying any filters. For policy 4, record the decibel threshold for each point. For policies 3 and 6 also record the number of frequencies per band kept. 4

Figure 2: Quality Scale. [Painter & Spanias. Proc.IEEE, 88(4):451512, 2000] Baseline Before applying any policies, you should quantify how well the encoder does when we completely skip the quantization step. Set all three filters in the encoder to Inactive, set the input file, start time and duration with the values you decided on at the beginning of the lab. Set an output file and run the encoder. Listen to the encoded snippet using the decoder. possible audio quality. The way it sounds will define the best Policy 1 First we decide how many critical bands we will keep. Set the first filter to Limit Number of Critical Bands. Leave filters 2 and 3 set to Inactive. Keep the input file, start time and duration the same as before. Try a value of 24, 23, 22 and 21 for the number of bands to keep for filter 1 to find the four quality vs. size data points defined above 2. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. 2 For this policy, you may find that even at 21 bands, the quality is not bad enough to be considered a bad point, if so, you may only report the other three points. 5

Policy 2 Now we start to remove frequencies from each of the remaining bands. Set the first filter to Limit Number of Critical Bands and set the number of bands to the number found for the good point in Policy 1. Set filter 2 to Limit Tones Overall and leave filter 3 set to Inactive. Keep the input file, start time and duration the same as before. Choose values between 1 and 200 for the number of frequencies over all to keep for filter 2 as you search for the four quality vs. size data points defined above 3. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. Policy 3 Here we will try a different frequency removal filter. Set the first filter to Limit Number of Critical Bands and set the number of bands to the number found for the good point in Policy 1. Set filter 2 to Limit Tones by Band and leave filter 3 set to Inactive. Keep the input file, start time and duration the same as before. Choose values between 1 and 20 for the number of frequencies over all to keep for filter 2 as you search for the four quality vs. size data points defined above. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. Policy 4 Now, instead of directly limiting the number of frequencies, we threshold the amplitude of the frequencies and remove any frequency with an amplitude below the threshold. Set the first filter to Limit Number of Critical Bands and set the number of bands to the number found for the good point in Policy 1. Set filter 2 to Threshold by Band and leave filter 3 set to Inactive. Keep the input file, start time and duration the same as before. Choose values between 1 and 10 for the number of decibels below the highest amplitude in the band at which we will still keep the frequency for filter 2 as you search for the four quality vs. size data points defined above. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. 3 for such a large range you should not do a sequential search, instead try a few points such as 200, 100, 50 and 10 and then refine your search around those points if you feel you need to 6

Policy 5 Finally, we will combine three frequency removal filters. Set the first filter to Limit Number of Critical Bands and set the number of bands to the number found for the good point in Policy 1. Set filter 2 to Threshold by Band and set the number of decibels to threshold to the number found for the good point in Policy 4. Set filter 3 to Limit Tones Overall. Keep the input file, start time and duration the same as before. Choose values between 1 and 200 for the number of frequencies over all to keep for filter 3 as you search for the four quality vs. size data points defined above. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. Policy 6 For the last policy, change the last filter. Set the first filter to Limit Number of Critical Bands and set the number of bands to the number found for the good point in Policy 1. Set filter 2 to Threshold by Band and set the number of decibels to threshold to the number found for the good point in Policy 4. Set filter 3 to Limit Tones By Band, keep the input file, start time and duration the same as before. Choose values between 1 and 20 for the number of frequencies over all to keep for filter 3 as you search for the four quality vs. size data points defined above. Use the decoder to listen to the files and gauge their audio quality. Make sure you update the output file name as you explore the space so that you don t overwrite other files. Frame exploration You will now analyze what makes one policy better than another by looking at how each policy decides which frequencies to keep. Focus only on the 5 good points, one for each of the policies above ignoring policy 1. Load explorer.vi, it is similar to the encoder from the previous section but instead of encoding, it allows you to explore each frame and each band while you dynamically change the filters. You can explore the encoding by changing the Frame and Band sliders. Each channel reports the number of frequencies with non-zero amplitudes in that Frame-Band pair before and after applying the filters, it also reports the percent of frequencies kept after the filter. 7

FOR QUESTIONS: For each policy (2 to 6): Set the input file, length and duration to the same values used in the previous section. Run the VI. Set the filters and filter parameters to exactly the same values you set for the good point of that policy. Pick one band and examine the percent of frequencies kept as you sweep the frames. Repeat the above step for a few more bands (bands 10 to 18 tend to give better results). LabView Encoder, Decoder and Explorer Encoder encoder.vi is the main encoder application. It takes a.wav file and produces a.mp250, a simplified version of a standard mp3. The VI consists of four main parts: File Settings, Filters, Status and Frequency Graphs. File Settings Here you specify the path to the input.wav file and output.mp250 file. Also you choose the part of the input file to encode by setting the duration and start time values. Filters This part defines which quantization filters to apply. You can choose the filter to use, and if it takes a parameter, define the parameter for that filter. Filter 1 is applied first than filter 2 and finally filter 3. Status When the encoder is running, this section will show information about the snippet being encoded including total length, number of frames, and at the end, the number of frequencies that have amplitudes greater than zero. As it encodes, it also shows the progress for the left and right channels. 8

Frequency Graphs The final section shows what the encoder is doing to each band of each frame as it encodes. It has four frequency graphs, two on the left for the left channel and two on the right for the right channel. The top graphs show the amplitudes of the frequencies in the selected critical band. The bottom graphs show what frequencies remain after applying the selected filters. As the encoder runs, you can look at different bands by dragging the Critical Band Displayed slider, you will see that lower frequency bands have less frequencies. Decoder Since we are generating.mp250 files, we need a custom decoder that can play these files. The decoder is the simplest of the three VIs, you can run it by opening decoder.vi. You specify the.mp250 file to decode and run it. It reports the status of what it is doing and once it has finished decoding, it plays the file. Explorer The explorer is very similar to the encoder but it has some differences. Unlike the encoder, explorer.vi will not produce a.mp250 file but rather it will let you explore each frame and each band of the frame to see what is happening to the frequencies. Once you ve specified an input file, start time and duration, you can run it and dynamically change the filters, sweep the frames back and forth and go to a chosen critical band. The Frequency Graphs section has more features than the corresponding section in the encoder. At the top it lets you not only choose the critical band to look at but also choose which frame. Also, every graph shows how many frequencies with non-zero amplitudes the band has. Finally, the After Filters graphs calculate what percent of frequencies were kept compared to the Before Filters graphs. Lab Writeup Guidelines Theory: Pre-lab A standard MP3 has a bitrate of 128kbits/s for stereo sound (i.e there are two channels, each with a bitrate of 64kbits/s). The MP3 encoder takes the 44K samples in one sec of an audio sample and breaks them into roughly 38 frames of 1152 samples each. Each frame is then converted into 576 double-precision floating point frequency co-efficients. At this point, the psychoacoustic models 9

can then be applied to eliminate imperceptible frequency components to reduce the amount of information needed to be encoded. 1. Roughly how many bits are available to encode an MP3 frame (assume 1kbit is 1000bits)? 2. If each frequency co-efficient in a frame remains as a double-precision floating point value: How many bits would be required to represent all the co-efficients in a frame (a doubleprecision floating point requires 64bits to represent it)? How does this value compare to the available number of bits for a frame calculated above? 3. If we want represent all 576 values in a frame with equal precision: How many quantization levels could we use? How many bits would be needed per frequency co-efficient? 4. If each frequency co-efficient in an encoded frame is quantized to one of 8192 values, how many bits would be needed to represent one of these co-efficients? 5. Assuming we can represent a zero frequency with no bits: How many non-zero frequency frequency co-efficients can we afford to have with the available bits in a frame if each co-efficient is quantized to one of 8192 values as in the previous question? What does this mean for our encoder? Analysis File Encoding Question 1: Report your recorded data. This is best presented in a table. You can list data that does not fit well with the table. Question 2: Plot on one graph all 6 size vs quality curves (For each policy, the curve defined by the one good, two intermediate and one bad points). Question 3: Which of the six policies produced the smallest file size for the good point? Frame Exploration Question 4: For each policy, does the percent of frequencies kept stay constant as you sweep the frame? Is this the case for the other bands examined? Use what you know about what the policy is doing to justify what you observe. 10

Conclusion Question 5: Given what you discovered in Frame Exploration section and what you know about psychoacoustic models, what properties of the policy answered in question 3 allow it to be the best? Further Thought This section is not part of your required assignment. Along with each week, we will offer directions and questions for further thought. Due to the nature of this course, we can only begin to glimpse the depth and richness of each of the topic areas. These questions offer some headings to contemplate further depth. These will often be open-ended questions. he encoder used for this week s lab is deliberately limited to contain the complexity for both us and you. How would you improve the encoder (e.g. improve the search algorithm, exploit temporal masking)? The encoders we have described operate online. That is they encode each frame as they see it. If you allowed the encoder to consider all frames before and during encoding, how might you be able to improve the encoding? Does the MP3 encoding model limit you? Are there additional ways you might improve encoding (achieve a superior quality at some or all target bitrates) by departing from the MP3 standard as you understand it? How would your changes impact requirements for the decoder? 11