Creating an Encoding Independent Music Genre. Classifier

Similar documents
Music Genre Classification

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Automatic Classification of Audio Data

Unit 2 Digital Information. Chapter 1 Study Guide

Applications of Machine Learning on Keyword Extraction of Large Datasets

Fundamental of Digital Media Design. Introduction to Audio

2.1 Transcoding audio files

AAMS Auto Audio Mastering System V3 Manual

Export Audio Mixdown

Structured Completion Predictors Applied to Image Segmentation

Audiograbber (convert audio-cd to mp3)

(English Version) R9-User Manual.

Audio Recording. Technology in a Box. Box Contents: USB microphone Audacity Directions. What you can do:

GCSE Computing. Revision Pack TWO. Data Representation Questions. Name: /113. Attempt One % Attempt Two % Attempt Three %

Network protocols and. network systems INTRODUCTION CHAPTER

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Audio involves developing a variety of techniques. In this short course, you will learn the necessary skills to do the following:

FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT

Principles of Audio Coding

Attributes and ID3 (artist, album, genre, etc) Guide

CS229 Final Project: Predicting Expected Response Times

Gerald's Column by Gerald Fitton

MEDIA RELATED FILE TYPES

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

AGPTek Music Player. A12 User Manual

A Novel Approach to Image Segmentation for Traffic Sign Recognition Jon Jay Hack and Sidd Jagadish

ADDING MUSIC TO YOUR itunes LIBRARY

Microcontroller Compatible Audio File Conversion

File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18

CHAPTER 6 Audio compression in practice

CHAPTER 10: SOUND AND VIDEO EDITING

Digital Media. Daniel Fuller ITEC 2110

LADSPA plug- ins LAME MP3 encoder FFmpeg import/export library

File Size Distribution on UNIX Systems Then and Now

Elementary Computing CSC 100. M. Cheng, Computer Science

The Affinity Effects of Parallelized Libraries in Concurrent Environments. Abstract

Data Representation. Reminders. Sound What is sound? Interpreting bits to give them meaning. Part 4: Media - Sound, Video, Compression

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Preface. Quick Start Guide

The Automatic Musicologist

Tutorial 2: Query and Select TRI Spatial Data to Study State-Wide Emissions Quantum GIS

How to: Ripping Audio CDs to Mp3 on Microsoft Windows XP / Vista / 7 - CDEx Audio CD Rip free software

itunes Tour for Macintosh Users

ECE 285 Class Project Report

Using models in the integration and testing process

Mp3 Player with Speaker. Instruction Manual FOR MODEL NO FIESTA2. Please read this instruction carefully before the operation

CS 268: Route Lookup and Packet Classification

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

Call me back on Skype

Euphonia. Manual. by tb-software (C) tb-software 2014 Page 1 of 6

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

Applying Supervised Learning

Evolutionary Lossless Compression with GP-ZIP

If you are a podcast producer, or are curious to know more about how to start a plan to keep your files around forever, you ve come to the right spot.

Multimedia Data and Its Encoding

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets

Predict Topic Trend in Blogosphere

Procedural Art. Dr. J. Bikker. lecture

1 Connections: Example Applications of Linear Algebra in Computer Science

CSE141 Problem Set #4 Solutions

Frequently Asked Questions

GETTING STARTED 8 December 2016

Digital Audio. Amplitude Analogue signal

Copyright 2017 by Kevin de Wit

Balancing Awareness and Interruption: Investigation of Notification Deferral Policies

CSE380 - Operating Systems

MPC1000 Product Overview

Summary of Bird and Simons Best Practices

Document No.: CD Duplicate Master. CD Duplicate Master. Jam Video Software Solution Inc. Page 1

CYBER ANALYTICS. Architecture Overview. Technical Brief. May 2016 novetta.com 2016, Novetta

Elemental Set Methods. David Banks Duke University

Working with Apple Loops

Louis Fourrier Fabien Gaie Thomas Rolf

3.01C Multimedia Elements and Guidelines Explore multimedia systems, elements and presentations.

Media player for windows 10 free download

User s manual. Serial Number:

Background and summary of recommendations

Archives About ARCHOS 405 General Questions Why is it impossible to copy files larger than 2 GB to my ARCHOS? You cannot copy files larger than 2 GB i

Predicting connection quality in peer-to-peer real-time video streaming systems

Abstract. 1 Introduction

Completing the Multimedia Architecture

Weighted Powers Ranking Method

CHAPTER 5 PROPAGATION DELAY

MP4 PLAYER Ref.: MP16 BTH

15-780: Problem Set #2

Inverse Scattering. Brad Nelson 3/9/2012 Math 126 Final Project

Topic 2 Data and Information. Data Data can be defined as a set of recorded facts, numbers or events that have no meaning.

Clustering and Reclustering HEP Data in Object Databases

Introducing working with sounds in Audacity

OLIVE 4 & 4HD HI-FI MUSIC SERVER P R O D U C T O V E R V I E W

Semi-supervised Learning

Design and implementation of a Beat Detector algorithm

THE STA013 AND STA015 MP3 DECODERS

From Improved Auto-taggers to Improved Music Similarity Measures

INSTALLATION MBL USB LINK MCMI

Data Reduction Meets Reality What to Expect From Data Reduction

Virtualization Technique For Replica Synchronization

Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices. Lanier Watkins, PhD

POSSIBLE DATA OBJECTS FOR A LIBRARY RFID SYSTEM

Transcription:

Creating an Encoding Independent Music Genre Classifier John-Ashton Allen, Sam Oluwalana December 16, 2011 Abstract The field of machine learning is characterized by complex problems that are solved by algorithms run over data models. While time can be spent making gains in the algorithmic area, many speculate that with enough data many problems can be solved. This study utilizes that theory to develop an encoding independent music classifier. In particular, given sets S 1,...,S n, where each S i = {S i1,...,s im } is a set of labeled data corresponding to music encoding i of files in m genres. The problem is represented as n classification problems. In this study it was found that independent of file encoding, an accuracy in the high 70 s could be achieved (an average of 76.93) in our case. However the average is much higher (in the 80%s) for genres with different sounding music (i.e. Rock and Hip-Hop as opposed to Hip-Hop and Rap) 1

Set Up To set up our problem of classification we first collected a corpus of diverse songs with genre tags. We hypothesized that a normal human being can classify the genre of a song based on a clip of the song, so we first converted the collection of songs which totaled 120 GB of data to a standard encoding using the mp3 format with a bit rate of 32KB/s and a frequency of 44100 HZ. Using this as our base we then converted the entire corpus of data to clips of size 20 seconds and 10 seconds in the formats mp3, m4a, and wav. In each of these encodings we made sure to strip all metadata that could be used to help the classifiers learn decision boundaries based on data besides the encoded song clips. With this data set we moved into developing the classifiers. We then wrote code that took these encodings and created feature vectors out of the raw bytes of the files. This resulted in different sized feature vectors for each of the encodings. We chose to use block sizes of data of 1 byte, 2 bytes, 4 bytes, and 8 bytes. We chose these sizes in order to get an encoding independent method of analyzing data. The different size block sizes of the data gave different feature vector sizes, with the 1 byte block size giving the largest feature vector size, especially on the lossless encoding, wav. The 8 byte block size gave much smaller feature vector sizesand faster run time on the algorithms but with a higher loss of accuracy. With this we focused our analysis mainly using the 32 bit block size of raw data for our feature vectors to balance the trade-off of space complexity and computational complexity. 2

Results & Reflection Our study started by training a logistic classifier on two of our genres to test the feasibility of our project. Choosing many different pairs of genres we found that accuracy using logistic regression was fairly consistent in the range of 85%-95% range, much higher than random guessing. However when we added another genre this accuracy dropped to the 77%-83% range. Adding an additional genre brought us to the 73%-80% range using 4 genres and approximately 5GB of data. These tests were run on the mp3, m4a, and wav datasets of 20 second clips. We found that using the smaller block sizes yielded computations that took several hours and offered very little improvement over the native 32 bit chunks supported by our machine. Accuracy dropped significantly, however, between the 32 bit chunk sizes and the 64 bit chunk sizes in our test. This led us to investigate the cause of our decreased accuracy when adding additional genres. Investing further we found that the for set of i genres, our best performance was bound by the weakeast classifier for i 1 genres. To see this more clearly, we present an example. Jazz (possibly because it influences many genres) did bad pairwise with each genre. Pairwise, Latin and Jazz had a classifier that was correct on average 85% of the time, while a classifier on Latin, Dubstep, and Gangta performed at about 86% on average. When compared to the 3 way classifier on Latin, Dubstep, and Jazz, which perfomed at about 76% on average, the effects are clear. Furthermore results are dropped dramatically when attempting to distinguish between Hip-Hop and Rap, as the distinction is not clear for human beings. 3

Future & Improvements As was previously speculated, we feel that our tests were far from conclusive. In order to produce more accurate results we will begin by taking steps to better encoding the data that is used for the algorithms. We found that the encodings were really low quality and this may have had an affect in the overall performance of our algorithms. We also utilizing Support Vector Machines to get a better, nonlinear, decision boundary for our data set in order to improve boundaries between hard to distinguish genres. Memory usage also became a problem during our tests, as such we would search for more efficient implementations and hope to increase the amount of data that we can analyze through different methods of compression. We found that analyzing clips of different sizes had very little impact in overall performance so in order to reduce memory usage and increase number of training examples future studies should favor smaller clip sizes. Our study may also benefit from using different features that are encoding independent, such as using the beats per minute, the Fourier transform of each song to compare each of the songs component frequencies. This will require a more sophisticated method of storage that will render the underlying bytes of the file more meaningless. Using these more sophisticated features we feel that we can get drastic improvements over the 79% accuracy using just the raw data of the encoded files. Implementation of using the Fourier transform of each song will require reading the encoded files into its component waves and encoding those waves as a vector over the songs duration at a particular sampling frequency. These improvements in algorithmic design would bolster the amount of work that we get out of a machines memory and processing power and as such would be the next logical choices for improvements in the future. However, when taking this approach 4

with these features the algorithm loses its format independency. 5