Sharing Music and Contextual Information in Mobile Social Networks

Size: px

Start display at page:

Download "Sharing Music and Contextual Information in Mobile Social Networks"

Rachel Kennedy
6 years ago
Views:

1 Sharing Music and Contextual Information in Mobile Social Networks Tore Nygaard-Jensen Kongens Lyngby 2010 IMM-M.Sc

2 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone , Fax

3 Abstract This project implements a music and context sharing prototype both with a centralized and decentralized architecture. Information from a set of selected context features is gathered in a user survey with seven users and data analysis is carried out on the collected data to find patterns that can enable creation of new mobile services for the end-users. The collected data are analyzed to discover significant locations from each user and significant social groups are obtained using hierarchical clustering. Music information collected from the users are analyzed to find similarities between the users with collaborative filtering based on implicit ratings to be able to give personalized music recommendations to the users. The music sessions of the users are categorized and indicate in which contexts the users listen to music.

4 ii

5 Preface This thesis was prepared at the Informatics and Mathematical Modelling (IMM) institute at the Technical University of Denmark (DTU) as a part of fulfilling the requirements for the degree of Master of Science in Engineering. The thesis was supervised by Associate Professor Jakob Eg Larsen and Ph.D. Michael Kai Petersen, IMM, DTU. The thesis was carried out from October 2009 to May 2010 with a workload of 30 ECTS credits. Kongens Lyngby, May 2010 Tore Nygaard-Jensen (s032109)

6 iv

7 Acknowledgements I would like to thank my supervisors Associate Professor Jakob Eg Larsen and Ph.D. Michael Kai Petersen for providing inspiration for my thesis in the beginning and for constructive feedback during the process. I would also like to thank the people that participated in the user test. The collected data was invaluable for the project. And finally, thanks to my family for good moral support.

8 vi Acknowledgements

9 Contents Abstract Preface Acknowledgements i iii v 1 Introduction Motivation Problem Definition Methodology / Approach Expected Outcome Thesis Structure Related Works 5 3 Analysis Analysis of Current Online Music Recommendation Services User Data Collection Method Context Feature Analysis Music Ratings Requirements Specifications Design Context Data Representation Persistent Storage Description of the Data Collecting Prototype Information about the Mobile Context Toolbox System Design Components

10 viii CONTENTS 5 Implementation Initial Component Tests Converting from Decentralized to Centralized System Mobile Context Toolbox Mobile Database Limitations Problems Graphical User Interface External Python Modules Software Hardware Data Analysis Device Setup for the User Test Test Power Consumption Test Specifications User Test Information for the Users Experimental Results User Test Statistics Data Analysis Results Discussion 89 9 Conclusion Future Work 95 A Results 97 B from Nokia 123 C Source Code 125

11 Chapter 1 Introduction 1.1 Motivation In recent years there has been a rapid development of mobile devices which have transformed the mobile devices into multisensor devices. The sensor data contain information about the context of the user. Software as the Nokia Mobile Web Server changes the architecture of the network that is connecting the phones, since each mobile device can be considered a mobile web server. Furthermore some of the ways in which online social networks from traditional PC environments have been migrated to the mobile devices have not taken into consideration the huge differences among the platforms. The mobile environment is much more dynamic compared to the environment of classic online social networks. It could be possible to share context information in a more dynamic way.

12 2 Introduction 1.2 Problem Definition The overall problem is to look at how to share music and context information in a mobile social network and look at what interaction possibilities this sharing provides. The project will explore the possibilities that the build-in music player of mobile devices provides related to displaying the context of the user. Another goal is to integrate the traditional sensors with the music sensor to translate a combination of location, people nearby, and music into a user state. 1.3 Methodology / Approach Initially a choice of context information will be made for setting up a prototype end-to-end to have a quick small solution that can provide useful feedback. The overall idea is to build the first iteration so it initially only handles the near real time data to see what can be done and then later add a longer temporal scope to use the possibilities that logging information provides in the second iteration. The first iteration will print near real time data and the information will be logged and gathered so it can be further investigated. With the data collected from the first iteration data analysis is performed to provide inferred information to future mobile application with music and context related content. 1.4 Expected Outcome Analysis of how music and contextual information can characterize the states of the user and how this can be appropriately shared in a mobile social network. Prototype that shares and displays near real time contextual data in a decentralized system also logs information for further data analysis. Clusters of relative locations and social groups for the users. Categorization of music sessions based on temporal information, the relative location and social groups of the users. Personalized music recommendations given by applying collaborative filtering to implicit ratings from the logged music played by the users.

13 1.5 Thesis Structure Thesis Structure The report will have the following structure. In Chapter 2 Related Works will be mentioned. In Chapter 3 the domain of the thesis will be analyzed and the requirements for the prototype will be specified. In Chapter 4 the design of the different parts of the prototype is described in details as well as the architecture. Having the requirements the Implementation Chapter 5 describes how the chosen design was implemented to build a prototype. The tests are described in Chapter 6. Having collected the user data from the user test different experimental results were calculated, these are listed in Chapter 7 with brief comments before Chapter 8 where the experimental results will be discussed in more depth along with other aspects of the project. Chapter 9 will conclude the project, look at what have been accomplished and take what can be done as improvement to it into perspective.

14 4 Introduction

15 Chapter 2 Related Works In the following works different authors have been experimenting with how people react when using applications that share context information. Bentley et al. [5] have logged and transcribed numerous phone calls and by analyzing which topics the users actually mentioned in these they discovered that in 71 percent of the conversation a location or an activity was disclosed. They discovered that people already know a lot about the context of the other person. So detailed inferred context information is not necessary. Bentley et al. [4] tested this sharing motion information with close family and friends. The information shared was if the user was moving or not, combined with the duration of the current motion state. This allowed the recipients to infer a lot considering the time of day and the normal behavioral pattern of the user. E.g. knowing that the husband will be home in 5 minutes because he has been in transit for 15 minutes and have a 20 minutes drive home from work since it is afternoon. The term micro-coordination was used in the two works of Bentley et al. previously mentioned as a term that identifies a lot of small phone calls that are made in order to clarify simple questions during the day usually related to social interaction. These micro-coordination are currently done using quick phone calls to verify the small details which are usually questions that the mobile device already has the answers to. Smith et al. [27] shared the place of the user in the application with close family and friends. The same assumptions are made with regards to users being able to infer context from a single context feature. Many times the questioned users

16 6 Related Works did actually have the right answer. Boehm et al. [6] describes the advanced context gathering application IYOUIT 1 that makes it easy to share personal experiences within communities. It uses existing social services to share with others. IYOUIT has an advanced underlying context management system keeping track of the information related to the different parts of the application. All the context information from all the context providers of the application is formalized in a semantical ontology so various relations can be created on top of the raw data. IYOUIT is decentralized in the sense that it delegates responsibilities to online web services through various APIs. It is not decentralized how it is meant in this project where the architecture is decentralized if the system can run without having an active central server. CrystalMe 2 is a context sharing project which is also using the Mobile Web Server to post information about user context on the mobile website. The application shares context information but it is shared at the end-user level compared to the application level in this project [15]. When the context is shared at the application level it can be used to create services with. The display of the information on a mobile web site alone is already the service itself so it is less versatile. Wang et al. [30] are proposing a distributed collaborative filtering peer-to-peer file sharing system. Their implementation is scalable and works on even small network enabled devices, it uses an ibuddy table that is sent if a file is downloaded from a node and pushed if a file is uploaded. They have tested the system on Audioscobbler data. Five iterations of tests with transactions. In this system they were adding this ibuddy list to the data. They shared actual files the system intended in this project is just sharing context information. If recommendations should be made without using a central database to calculate the relevance of the various music files, a way of collecting the context information about the track could be found to be used as a similarity measure for the track. Kostakos et al. [19] do not want to replace the face to face interactions with the computer work but rather use computers and technology to assist the process. Regarding privacy issues some of their users did not like to broadcast or make their information available to strangers. So to fix this a restriction could be set up saying that the person should have been in proximity of a user a fixed number of times before getting access to the user information. They have found a list of applications that they think have good elements but are falling short somehow. E.g online services as Friendster 3 and Match 4 but they only exist as regular web sites. ContextContacts is an example of an application built on top of the Context

17 7 Phone project and very similar to the application of this project because it tells about location, time spent there, mode of the phone, and numbers of friends or strangers nearby[22]. Bluedating 5 does its matching of two users on the mobile phone but it relies on explicit input to update userprofiles. xtolk created by Tu is using the mobile web server to share context information. It creates a mobile web site to display the collected information[29]. Table 2.1 shows how the different related works overlap with the project. The column abbreviation are explain in the following. Arch is architecture and to get the label decentralized (D) the system must be able to function without a central server. The letter C indicates centralized for systems which rely on having a central server. C.F. is Collaborative Filtering to indicate if some kinds of recommendations are calculated. L.I. means that a location is inferred, i.e. attaching a higher level label to the location. MWS is short for Mobile Web Server and indicates if the implementation uses the MWS. Application Arch. C.F. L.I. MWS This project C/D x x x IYOUIT C - x - Crystal Me D - - x Dist. Collab. D x - - ContextContacts C xtolk D - - x Table 2.1: Table comparing the work in the project with related works. Arch. is Architechture where C/D is Centralized/Decentralized, C.F. is Collaborative Filtering, L.I. is Location Inferences, and MWS is the Mobile Web Server. 5

18 8 Related Works

19 Chapter 3 Analysis In this chapter current online music recommendation services are analyzed and different ways of collecting user data for context sharing applications are examined. A selected set of context features and feature combinations are investigated in order to find out what applications they could support. In particular a section is dedicated to how implicit music ratings are created. It is illustrated how the different data collecting prototypes could share music and context information in both a centralized and a decentralized architecture. After this initial analysis of the application domain the functional requirements are derived from a list of use cases. 3.1 Analysis of Current Online Music Recommendation Services Online collaborative filtering using music data is already in use so it is relevant to understand how it is currently working to be able to adapt the techniques to the mobile domain appropriately. Two examples of online collaborative filtering are Last.fm and ilike which are described in the following sections.

20 10 Analysis Last.fm Last.fm 1 is a web site that allows users to enjoy different services related to music. They can have their own musical profile to which their played tracks are added. They have termed this music data gathering scrobbling. Because the web site has an extensive API, it can be deduced from the API which data about the user has been collected or have been inferred from other data. On the web site it is possible to sign up for up coming events. This registration is stored in the system. The list of friends that a user has accumulated on the web site can also be found. Last.fm makes it possible to scrobble from many different media players both for the computer but also from portable devices like ipods and iphones. Scrobbling means to register the tracks that have been played so they can be collected and sent to the user s personal page. This is more specifically done by logging a track as played when more than half of the track has been played. Regarding ratings the user can love and ban (like and dislike) but it is not required to do so, the recommendation system works without explicit user rating inputs. With all the music data collected from all the users, Last.fm can, using collaborative filtering, find musical neighbors that have a music taste similar to the targeted user. Using the same data set, the system is able to give personalized recommendations for each user suggesting tracks that the users have not heard but that would fit the person s taste, because the music neighbors liked it. Another feature of the Last.fm site is that users can display what song they are currently streaming. This is a real-time feature that is preparing the ground for communication with other users because each user on their profile page can see what their music neighbors are currently playing. The streaming feature has recently been limited so there is a monthly fee to get access to the unlimited streaming of music via Last.fm ilike On the ilike web site 2 it is possible to listen to music. Many of the tracks are 30 seconds track samples, because the users have to buy the full MP3 file if they want to listen to the track in its entirety. They also feature YouTube 3 videos online of full length. ilike has many common features with Last.fm. ilike minimizes the effects of the cold start problem in its service. The cold start problem is when an item can not be recommended to anyone because it has not been rated yet or similarly for users if a new user has not rated any items

21 3.1 Analysis of Current Online Music Recommendation Services 11 yet he can not receive recommendations because his preferences are unknown to the recommender system. To minimize the cold start problem in its similarity algorithms and recommendation system, ilike has made an initial music survey for the users. The results from this survey are used as guidelines in the initial phase. Initially they just look at what artists the user likes in itunes and finds users that also like the same artists. For selection of music neighbors ilike does not only select users who listen to the same artist, but also selects users who listen to similar artists meaning that it is not just a direct track and artist match that is performed. ilike says on the web site that it will implement location in their user similarity algorithm. With the focus on finding users with the same music taste who live close by, they collect the music information from a sidebar application for itunes Web vs. Mobile From these examples showing how to create web-based music recommender systems it is interesting to think about how this could be used on a mobile device. The features which the web-based services provide, can they actually be transferred directly to the mobile platform? Does it make sense to do at all? Because the mobile platform has several advantages compared to the web-based regarding the creation of interesting user experiences, and the usage of the mobile platform is also much more dynamic. The application can be started in many different situation compared to the web application where the usage is more static. The UI however is limited so only essential information ought to be presented. Many of the features the web-based services present will need to be modified for the mobile platform and some are not suited at all. Long lists of recommendations are not ideal. The service should be more like a steaming service that is simple, accurate and without any configuration. It should play music similar to the one the user has listened to previously. In the mobile domain it is very important what the user is doing at the moment, because this can change rapidly. Looking at the services like Last.fm and ilike provide it is not only possible to let people see what each other are playing currently, but on the mobile device it can also compare their whereabouts and the users close to them. Features like this add more depth to the common features among the users. This kind of unusual contexts can be displayed to persuade the users to initiate social contact [3]. Kostakos et al. [19] examine what it takes to make people interact with others. In this case to talk to others face to face. They narrowed it down to two overall criteria which are common membership in cultural communities and second having joint perceptual experiences or joint actions. The communities could be created on the basis of the music. The contextual information gathered from the mobile phone could make users aware that they

22 12 Analysis have actions in common. This could be enough to make people feel comfortable to initiate contact. This does not have to be face to face contact initially perhaps just sending a short comment about an activity of another user is enough. It is plausible that the principles of social interaction in systems like Last.fm and ilike can be transformed to the mobile platform, because the platform has the properties to draw the common perceptual knowledge and common actions out. 3.2 User Data Collection Method When data needs to be gathered from the users regarding to a user test, a collection method has to be chosen. Either storing the data locally on the device in different ways or sending it to a central server. The choice can be a combination of the following: Logs Local Database Send to Server The simplest method is to log the data to a text file. Then the data should have a format that allows easy parsing for later usage. This could be obtained with e.g. XML or JSON. Another local alternative is to use a local database. This should keep better track of the data and make it directly usable via SQL queries. A remote option is to send the gathered data to a central server. Logs would be gathered and stored on each device. They would after a suitable amount of time be transferred to a central server. This suitable interval could be e.g. daily or weekly, depending on how convenient it should be for the user and how sure the test leader wants to be to have collected all the data correctly. With the memory card that available test phones have there is plenty of space for the log files. It is worth to keep track of the data integrity during storage and when transferring the data to a central server. For logging systematically every log could be appended to individual log files on a daily basis. To collect the log files from the users the files could be uploaded to a server while at the same time being kept as a backup on the phone. The use of a local database imposes a bit more structure to the collected data client side and this method makes it possible to send the data in a more structured way. With the local database the collected data can be accessed easily on the device without parsing. With regards to sending the collected data to a central server, there are different approaches. Data can be send continuously to get the data to the central server

23 3.3 Context Feature Analysis 13 as quick as possible or it can be sent in larger batches. Data should preferably be pushed before the database is filled up. After a successful push/pull the local database should be cleaned. Pushing the data does not allow for much local data processing if the data is removed. If the local database is prone to error it would be best to upload the collected data as soon as possible this requires a continuous access to a network connection. 3.3 Context Feature Analysis From the similar web sites above a list of input features can be selected. This list of features is a subset of the entire set of features that can be extracted from mobile phones. Jensen [17] already has made a range of demonstrated input features available. To avoid too much complexity, noise and data processing needs from the input features, a reasonable set of context features is selected. This subset is selected to focus on the most commonly used features in the context of mobile music applications. The features that will be analyzed are: 1. The users nearby 2. Location 3. Music Information The possibilities of the features change when the temporal dimension is added. It adds the opportunity to infer information from context histories. This is essentially a list of context information that is ordered by time Users (around you) The user feature makes it possible to discovery who the user is surrounded by. Alone this sensor will be able to indicate who is surrounding the user at each moment. It can also later over time say which people the users see most often. People are going to be identified by their phones. This is immediately done using Bluetooth which has a quite short proximity range. The people who will be observed repeatedly close to a user usually will have a kind of relation to the user. These reoccurring co-occurrences gives a measure of familiarity. Familiarity in the sense that the user is in proximity of the device regularly. It is interesting that it does not have to be family, friends, or colleagues it can also be the so-called familiar stranger introduced by Milgram [20]. There is the

24 14 Analysis possibility that one user owns more than one Bluetooth enabled device. But inferring the device-to-person relationships usually requires more context features unless a revealing naming convention has been used. Who surrounds the user also tells a lot about what context the user is in socially. Having this Bluetooth information tells who is immediately around the user. The discovery of these people can be used to retrieve further information from them individually. This could be static personal data or dynamic context data. This gives real time information to the requesting user, which is a nice feature by itself, but it also makes it possible to draw a graph of who the users have been in proximity of and give a overview of how each person is connected in this implicit social network. The mapping process could be done in different ways but if a connection is mapped it should be shown on the graph as an edge, this will create a graph with all the users. If the same edge occurs multiple times the succeeding ones can be ignored or decorate the edge as a kind of weight. The weights will indicate the strength of the relationship among the users regarding co-occurrence frequency. Users as a context feature can have different types. Users can be friends from a social network in various ways. First type is the users in proximity, when friends of a user is close the user would usually like to know about it. This search is usually done using Bluetooth but if the definition of nearby is longer than the short Bluetooth range then another type of location measure must be used. This could be done by getting a GPS coordinate set from the other users and calculating the Euclidian distance. Another type of users is the normal friends that, the user knows, but is not necessarily physically close to. The real friends are a bit more tricky because the strength of their friendship has been built up before starting to monitor the behavior of the user. So the connectedness to the real friends can maybe not be found in terms of co-occurrence but could rather be inferred from the conversations or SMS logs. Rettie [23] compares the connectedness with awareness and social presence. Awareness of another person being the most general terms. Having both social presence and connectedness as sub sets. Social presence is when users are co-located or co-present. Regarding to social presence then starting an online conversation about a user is listening to on Last.fm is equivalent to starting a conversation about what a current experience, both require the users to be psychologically involved. The connectedness mentioned before is different because it is more focused on maintaining the relationship rather on the actual content of the conversation. The conversation can even be asynchronous. The third type of users is, the neighbors, who are similar users. This similarity measure can be calculated from different things usually in a media context. The context could be songs the users have listened to or movies they have seen and rated. This is how the users are typically characterized. But the similarity measurement is not limited to only these features. The user dimension is even more informative when it is used in combination with other features. This makes it possible to look at how the second feature varies over many users and allows

25 3.3 Context Feature Analysis 15 common trends to emerge. However it is also important to know how the user information can be used. User information can be used differently considering the temporal aspect. First using the user information immediately to display relevant data at the current location of the user. Second storing the user information statically in a database of logs to be used in a user similarity calculation one week later. Real time usage of user data is especially interesting in a mobile context because it creates a connection between the users that would not necessarily have been discovered. Different types of users have been described but having a dynamic application in mind the focus will be put on the type of users that are within a reasonable proximity of each other and thus can be identified e.g. by Bluetooth. This means that the user data can not be used in a grand scale because it is usually a small number of users that is within proximity of each other. Local exchange of information is still possible. The following list contains some examples of applications using the users with different temporal scopes from short to long. 1. Displaying a list of users around. 2. Displaying a user similarity on a graph after comparison. 3. Logging the gathered information for social profiling Location The location of a user can be determined in various ways using WLAN, GPS or GSM. Gathering of wireless local area network (WLAN) information is done to infer the approximate location for the user, even though it might not always be possible to pinpoint the exact position of the user. Using the collected WLANs as a geographical position indicator will serve its purpose of capturing the spatial context. Global Positioning System (GPS) information can be used to retrieve precise information about the users whereabouts from satellites. For monitoring purposes and in situations where discovery of geographical patterns are wanted, this is the solution that offers the best precision and immediate monitoring in outdoor environments. The system requires access to its tracking satellites, so most indoor office environments have bad connection to the satellites. So when in transit the GPS can be a viable tool, but for capturing geographical indoor contexts it is not the best solution. Having GPS running on a mobile phones consumes a lot of energy and drastically reduces the battery life, even below what is commonly accepted as the minimum (a working day 7-8 hours). If it is decided to keep the GPS as a context sensor it should be justified. If the GPS coordinates are not available and an absolute location is wanted, mapping

26 16 Analysis services like Wigle.net 4 can be used to make a lookup after WLAN IDs and thereby getting an absolute location. The Wigle.net services does not always contain enough information to get the wanted result but it is an option if GPS coordinates are not directly accessible. Global System for Mobile Communications (GSM) is a third option that can be used to collect location information about a user. The locating can be done at different levels of accuracy. Ranging from collecting cell information of the connected cell tower to triangulating the position of the mobile devices using the signal strength from cell towers in the proximity. These ways of locate the user are used in many location based services. The three location sensors allow applications to know where the user is at the moment. When this is extended with logging, the whereabouts of the users is put on a temporal line. This enables an application to proactively find information that could be helpful to the user from the current position. In some of the larger online social networks like Facebook 5 and MySpace 6 it is popular to set a status message. The location directly given from the sensors can be converted from e.g. a GPS coordinate set to a more human interpretable format such as a dot on a map or a city name. This will provide the answer to the question where the users are. If the location of each individual user is analyzed, then a number of regular relative location or places that the user revisits can be found. Even though Last.fm collects location information from events that the users sign up for, they do not collect location information related to all the tracks that have been scrobbled from mobile devices e.g. from ipods, mainly because they do not feature sensors that can discover the current location of the device. Having context sensors on the music playing device can be used to make more exciting service. If the absolute location of the user has been found it can be used to calculate the geographical distance to friends and with the distance measure suggest to set up meetings etc. Location information is usually coupled with user generated content to e.g. share pictures that have been geo-tagged. For the location feature the following list will contain some example applications with different temporal scopes from short to long. 1. Displaying the user s current position on a map. 2. Displaying the user s route on a map (running application). 3. Logging location data to find spatio-temporal behavioral patterns

27 3.3 Context Feature Analysis Music Information From the music player the playlist of the user is captured. This immediately indicates how much music the particular user listens to. Later a music profile of the user is created to describe what genres and particular bands the user likes. Furthermore it could be analyzed if play patterns emerge that might reveal the activity of the user during the play session captured from the music player. Additional data can possibly also be found but this can also be looked up in a music database online so it does not necessarily have to be extracted while the track is playing if it is going to be used for later analysis. There are however some low-level music information that can not be found afterwards e.g.: Insertion or ejection of headphones, changes to equalizer settings, and volume level. However there are different uses of the music information that is gathered. It can be used immediately with the intention that it should be seen by others or it could be used to get further information about the artist playing or the particular track. The opposite way of using the information is to log the information and put it in a database so it can be used for further analysis, when enough data is gathered. In this way the data is becoming a part of a raw music profile for the user. This can be used to create a user model which is a formal representation of the user. Jameson [16] gives examples of how to make these formal models of both the context and the user itself. Henricksen et al. [12] have looked in to how to actually personalize context aware applications so it is not just the normal location based services that have become widespread. Just creating this music profile is pretty static since the gathered information first is used long time after it has been gathered. To keep the usage of the music information in this particular application dynamic, the computational intensive and data requiring collaborative filtering will not be applied immediately on the mobile device. Rather a smaller on-the-fly user comparison could be applied. It should just compare the similarity of the user s play list looking at a suitable time window. Artist similarity can be found be comparing information from Wikipedia 7 or by using the Last.fm API. For the music information feature the following list will contain some example applications with different temporal scopes from short to long. 1. Displaying artist and song name information. 2. Displaying additional information from internet lookup. 3. Logging music information with the intent to perform collaborative filtering. 7

28 18 Analysis List of Feature Combinations The individual features have been described by themselves. They have been described considering both a short and a long temporal perspective. It is also important to know if the information is going to be used on the device itself or if it is collected to a central database for further data analysis. All the possible feature combinations are presented in the following sections Users and Location This combination provides information about where the users are at all times. Immediately it just keeps track of the position of all users who are currently active. Seen from the perspective of the local devices it has its location and the nearby users available. With these inputs the first that comes to mind is a map application where the user is displayed at the center of the map and the nearby users are displayed at their positions at the map around. If the locations of the other users are not available a list of nearby users could be compiled. The positions could however be queried from the nearby users and then shown on the map Users and Music Information Combining these features is most commonly related to the creation of personalized recommendations using collaborative filtering as Last.fm and ilike. This is user profiling using the music information. Having the users and the music information can make an application able to compare the user s music similarity with the nearby users to see if a local similarity match can be found. The application could also simply show what the user is playing right now as a type of status and display what the nearby users are playing as well without any advanced processing of the input data Location and Music Information Seen in the perspective of the local device everything the device knows is its location and the music information gathered from the music player. Not having knowledge about the users in the vicinity makes the application s possibilities a bit less social. This feature combination invites to create a more locationcentered application in the sense that places could be tagged with music infor-

29 3.4 Music Ratings 19 mation as the users listen to it. So people discover music played by others on the same location. It could still be time dependent recommendations because people might listen to different music dependent on whether it is morning or evening Users, Location and Music Information This is the combination of the three features. Each log tells about where a person has been and what music he has been listening to while being there. In a short time frame this allows a pretty accurate status to be set in e.g. a social network application. Over a longer time period this would allow to make collaborative filtering with the music information and it would make it possible to see if people are traveling in the same geographical places. Trying to correlate the music information and the location is a bit of a challenge. But e.g. it could be seen if the same person is listening to the same music at the same locations. Generalizing over the users determines whether the users are playing the same types of songs in the same relative locations. E.g. listening to classical music at home. Having the combination of all the selected context features, nearby users, local and other user s location, and music information, the end-user application could incorporate all these features. Displaying the names of the nearby users together with information about their geographical places and information about what they are listening to at the moment could be a way to do it. The music the user wants to listen to is dependent of what he is doing at the moment. E.g. exercising sports (high tempo) or relaxing (low tempo). This can partially be inferred from where the user is and who are around him. If the user location is absolute then a layer must be placed on top of the low-level input before it is comprehensible. What is most interesting is how the current location affects the user. In this having a absolute data is not so useful because they are difficult to compare. If the absolute locations were mapped to locations relative for each individual, then a meaning could be attached to each relative location. 3.4 Music Ratings For a recommender system to work it needs to have a measure for how much the users like the items presented to them. The simplest case is explicit rating where users review an item and then rate it e.g. on a 1-5 scale. Implicit rating is where the user rating is inferred from data about the user related to the item. In a web context this would be user statistics from a web site where click counts and site behaviors could be used for rating an item for a specific

30 20 Analysis user. How to implicitly rate a music track in a mobile context depends on the track information gathered. Specific pattern in usage of the music player could indicate that the user really likes or dislikes a specific track. In general when tracks are repeated in a short period of time the track is liked or the opposite if a track is skipped the user dislikes the track. There is a difference if the song is chosen actively or it is from a playlist because if the song is from a playlist it is not chosen in accordance with the context as it would more likely be if it was actively chosen. Each play of a track should adjust the user s rating of the track. This discovery of disliked tracks should be done in a way that it is not confused with the normal navigation of the tracks in the music player. A case of the unwanted usage of the music player that could be filtered from user data is when the music player is playing a track, but the volume is zero. Tracks logged while the volume was zero should be disregarded because they have in fact not been listened to by the user. Play count of a track is a good indicator of how much a user likes a track take e.g the top 25 playlist available on most ipods, but it must be weighted against each individual because people listen to different quantities of music and in different ways to have a direct approach to infer a music rating Music Player Behavior Analysis To understand what a context observation with a play represents, a few terms must be described. Observation - A log element in the system storing context information. Track - A music file with a duration. Play - A snapshot of a Track being played in an Observation. A play proofs that a track was loaded in the music player. There is however a bug that will make it seem like it is still playing even though it is just stopped. A play can be from several parts of a track and have different meanings. The possible meanings of a single play should be analyzed First Play If the play is a first play after a period with no music being played. This is a track playing and it will most likely be a track that the user likes and has actively chosen. So first plays should have a very positive influence on the rating of the track for the user.

31 3.4 Music Ratings Normal Play This is a play in the middle of a track. Normal playing behavior is going on and it is justifiable that the user approves that the track is being played. This is also the normal type of plays because it is given to all plays in a list of track from a playlist if the user do not interfere, except the first play of the first song. So a normal play has a positive influence on the track rating for the user Skip Play The half length of a track is chosen to distinguish what it means when a user is skipping a track. If less than half the track is played then the track is disliked or the user at least had enough reason to actively avoid it. Observing this behavior should have a negative influence on the inferred user rating of the track. If more than half of the track has been played then the meaning is a softer skip that should not decrease the user s rating of the track chosen. The user has, after having heard half of the track, listened to a large part of it and that means that at least he is indifferent to the track. Last.fm is also using this length as a minimum to define whether a track has been played or not Long Play If the play is prolonged beyond the length of the track it can mean two things. Either the user has put the track on repeat or it is an effect of the player being stopped. So for a long play it is best to be cautious and not assign to much positive value. It is also limited what bonus a repeated tracks rating should have if it was played repeatedly Play Weighting To calculate a rating from the plays of a track numerical values are assigned to the different types of plays observed. N is normal plays, F is first plays, and S is skip plays. R (x) is the users rating of item x. R (x) = w n N + w f F + w s S (3.1) w n = 1 w f = 5

32 22 Analysis w s = 2 To infer the track length in plays without using the time stamps directly a list of observation is used to count the various types of plays for each track. Having this information can be used to find the implicit rating of the track for the particular user. To know if the current play is out of the normal track duration, the average track length must be calculated. The average track length is the count of observations during a track being played without interruptions. This average track length can be found using the data from all the users and taking the average value. 3.5 Requirements Specifications Here the requirements for the different parts of the prototype are written, divided into sections. Figure 3.1 shows an abstract system component diagram for the system of the data collecting application. The system is split into three components the GUI, the Context Collection, and the Communication. The GUI component is going to handle what the user is going to see. The GUI component are using two interfaces to retrieve the visual content. This is the Local Contexts and the Remote Contexts from the Context Collection component and the Communication component respectively. The Communication component uses three interfaces first it get local contexts from the Context Collecting component like the GUI component. Second it uses an interface of the planned back-end server to upload context data. Similar to the Context Collection component the Communication component also provides an interface to the GUI component. The Share interface is special because in a centralized architecture it is only needed to used the interface, but in a decentralized architecture it is also required to provide it from each mobile system. Figure 3.2 and Figure 3.3 shows how nodes are connected in a decentralized and centralized architecture respectively. They also illustrate the message flow.

33 3.5 Requirements Specifications 23 Figure 3.1: Abstract System Component Diagram. Containing the components of the system with the connecting interfaces. The interfaces which are not connected is going to be connected to components in other nodes. Figure 3.2: Message flow between system nodes in a decentralized system. The mobile nodes are both providing and using the Share interface.

34 24 Analysis Figure 3.3: Message Flow between system nodes in a centralized system. The mobile nodes use the Back-end Upload interface to upload context information and use the Share interface to retrieve it.

35 3.5 Requirements Specifications Use Cases The use cases of the system are described in the following sections. They capture the functional requirements of the system by specifying what happens in different scenarios in the system. The following use cases can be both use cases for the more hidden parts of the prototype and for the data-collecting prototype where the user is directly involved as the primary actor. Figure 3.4 shows the use case diagram of the system. Figure 3.4: Use Case Diagram for the Prototype System.

36 26 Analysis Domain Specific Data Concepts In the system different kinds of data are transferred around in the system. First there is the Context element. This is a collection of the features that are gathered from the phone describing the different dimensions of captured user context. Furthermore another data element is used, which is the Connection. This element captures that a connection between two entities has been made. It stores when the connection occurred, who was the sender, who was the recipient of the connection, and what type the connection was because it can happen in different contexts. Numerous works describe that using the temporal dimension created by storing the context information to compile context histories are used too rarely. Most context-aware applications simply do not store the data after it has been immediately used to adapt to the preference of the user in a specific context(e.g. [14], [7] and [24]). In this project a context collecting application will be developed so this notion of context history will be relevant. Salber et al. [24] use this concept of making a trace of another person by asking colleagues what they have seen the user do last or looking at the meeting list of the user to understand what he has been doing recently. In this way a context history can tell more about the current context of the user than e.g. knowing the current location because multiple contexts can take place at the same location.

37 3.5 Requirements Specifications Use Case: Trace User Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description To select an observed user and see a list of recent context information from this user. The user selects a user from a list of observed users then the system will return a list of different context information and display it for the user. The user needs to have observed nearby users to be able to trace them. The trace of a user is trigger by the user when a user to trace is selected. 1. The user selects an observed user to trace. 2. The system displays a list of recent context information about the selected user. A list of context information should have been displayed for the user. The state of the system has not been changed except from having shown the retrieved information to the user until he chooses not the watch it anymore. Table 3.1: The Trace User Use Case

38 28 Analysis Use Case: Watch Stats Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description To display different statistics and information collected from a session. The system compiles different of data from the different collected contexts that has been collected during the current execution of the application. These statistical can all be inferred from the lists of all the contexts collected so far. No immediate preconditions. If no context information is available empty spaces will be presented to the user instead of the information planned. The user actively chooses to watch the overview of statistical information. 1. The user selects to watch the stats in the system. 2. The system displays the session overview to the user. 3. When new context information becomes available it is added to the statistics where it is appropriate. After the use case the system has not changed state as result of to the execution of the use case. It has only manipulated how the statistics are displayed which are dictated by data. Table 3.2: The Watch Stats Use Case

39 3.5 Requirements Specifications Use Case: Start Context Collection Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description The goal is to start the collection of context data. The use case initiates the collection of context data. This involves activating collection from the wanted sensors. The sensors needed to collect the wanted context information must be available. This use case are triggered by the user as a consequence of the start up of the application. 1. The wanted context sensors get activated. The wanted context sensors are activated and feeding context information to the system. Table 3.3: The Start Context Collection Use Case Use Case: Publish Context Feature Information Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description The goal of the use case is to collect some context information for a specific context feature. The use case is an a generalization of a list of use cases. Data from a context feature is collected and made available for the system. The specific context sensor is activated. The use case is triggered by the use case Start Context Collection. 1. Specific context information is collected from the sensor. 2. The collected information is made available in the system. 3. When new context feature information is read from the sensor it is also made available to the system. The specific context feature is being published to the system. Table 3.4: The Publish Context Feature Information Use Case

40 30 Analysis Use Case: Publish Music Information This use case is a specialization of the use case Publish Context Feature Information with the context feature specified as music player information Use Case: Publish Bluetooth Information This use case is a specialization of the use case Publish Context Feature Information with the context feature specified as Bluetooth information Use Case: Publish WLAN Information This use case is a specialization of the use case Publish Context Feature Information with the context feature specified as WLAN information Use Case: Publish GPS Information This use case is a specialization of the use case Publish Context Feature Information with the context feature specified as GPS information.

41 3.5 Requirements Specifications Use Case: Get Context Information Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description To gather the latest context information and store it in the system. In this use case personal context information is gathered from the smart phone. The gathering from the various sensors is triggered, when new context events occur. Then a context element is compiled and stored. The required sensors are active. When one of the context sensors collects a new value a new context information should be created. 1. Specific context information is collected from the sensor. 2. The collected information is made available in the system. 3. When new context feature information is read from the sensor it is also made available to the system. A new context information have been captured and stored in the system and the sensors are still active. Table 3.5: The Get Context Information Use Case

42 32 Analysis Use Case: Get Remote Context Information Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description Retrieve the latest context information collected by a remote user. Context information from a remote user is collected. It is collected similarly to how it is done in use case Get Context Information but just being done remotely from another device. A working network connection and a running collecting system running on the devices of the remote user. Triggers can be both automatic and manual. But it is decided if it should be the user itself that requests the information or it should be triggered by e.g. observed nearby users. 1. A remote user is specified. 2. The remote user is contacted. 3. The retrieved response is formatted to make it compatible with the local context data. Retrieved the latest context information from the remote user. Table 3.6: The Get Remote Context Information Use Case

43 3.5 Requirements Specifications Use Case: Get Context History Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description To retrieve a recent context history for a given user. For a given user a context history is retrieved remotely. This context history reveals a list of the last/current contexts of the user. The use case can be compare with the use case Get Remote Context Information just with more context elements being requested. A user to trace must be specified. A working network connection and a running on the devices of the remote user. Activated by the user when starting the use case Trace User 1. A remote user is specified. 2. The remote user is contacted. 3. The retrieved response is formatted to make it compatible with the local context data. A context history has been retrieved. Table 3.7: The Get Context History Use Case

44 34 Analysis Use Case: Aggregate Context Information Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description The goal is the combine individual context features to large context compilation. The use case is putting all the latest context feature information together in a context element and making it available in the system. The required sensors are active. When one of the context sensors collects a new value a new context information element should be created. 1. New context feature information has been published in the system. 2. The new context information is compiled to a context element containing all the latest context feature information published so far. 3. Make the combined context information element available in the system. A new context information have been captured and made available to the system and the sensors are still active. Table 3.8: The Aggregate Context Information Use Case

45 3.5 Requirements Specifications Use Case: Collecting Remote Context Information Goal Summary Preconditions Triggers Basic Course of Events Postconditions Description To collect context information from remote users. The use case collects context information from remote users as new remote users are discovered. A new user must have been discovered. A working network connection and a running on the devices of the remote user. When a new remote user has been discovered in the system. 1. New aggregate context information element is published in the system. 2. The remote users contained in the context element are requested for context information. 3. Formatting the remote context elements and making them available to the system. The latest context information is retrieved from the remote user and the retrieved remote context is made available to the system. Table 3.9: The Collecting Remote Context Information Use Case Data Representation Requirements This is a list of requirements which as to be satisfied by the data representation in the system. Unique identifiers for the data. Indication of the user who created the context data. Time stamps on context data collections. Music information. Location information. User information. It should have a structure so its representation should be transformable between internal and external representation. Internal meaning both a volatile object and a persistent database representation. A connection logger should log inbound and outbound connections. Using connection logging is a way to understand how the users connect to each other. It is different from using e.g. the Bluetooth scans to see who is around the user.

46 36 Analysis Logging the connections between the devices is a bit more informative because strangers without any interest will appear on the Bluetooth scans but not in the connection logs at least not the incoming connections. That is due to not being a registered user in the system. This connection logging can also be used to see if user are connecting in different ways to each other Requirements for the Underlying System Koskela et al. [18] evaluate and compare a centralized mobile system with a decentralized one. The decentralized solution is using Apache Racoon which is the open sources alternative to the Nokia Mobile Web Server. It is concluded that using the mobile web server provides more up-to-date context information but put more strain on the mobile device. Even though it is more related to non functional requirements it is still important to know if it is feasible to used the mobile web server. If the system can be implemented without decreasing performance below the minimum user threshold. Salber et al. [24] designs a generic context server in the design many important related elements like context access, context management and how to create an infrastructure that can make a context server work is mentioned. Henricksen et al. [13] have made a detailed architecture design and described a formal way of modeling user preferences. They have also shown that the user models belong to the context management layer together with the repository storing the gathered context information. This is a list of requirements for the system connecting the mobile devices. Allow users to access the gathered context information of discovered users. Retrieve data from users and store it. Create a decentralized and a centralized system Requirements for the Data Collection Prototype This is a list of requirements for the data collection prototype: Allow the user to see the latest contexts collected both local and remote. Allow the user to see statistics about the latest trends in the collected local data. Allow the user to pick a user from a list to trace for context information. Allow the user to select network mode.

47 3.5 Requirements Specifications 37 Non functional requirements for the data collection prototype are that it should have a GUI response time lower than ten seconds before returning a prompted result but preferable less than one second. The battery time should minimum be approximately eight hours so the phones does not have to be charged during a work day. The contexts that a user gathers can be categorized in different ways. It could be by owner or it could be by geographical place. It could also be sorted by played music track or sorted by artist. It should be considered which calculations and comparisons can be done on the device and which calculations have to be done at a server with more processing power. The context traces of the users can be compared to see if they have something in common immediately (Heard the same track, same artist, seen same users, have been listening to music/have opted to not hear music, short or long geographical distance). If tagging of locations were introduced, users would also be able to discover other users from their left traces. Tagging of locations would gather more users to compare but the project does not focus on this Data Analysis From the analysis of the context features in Section 3.3 on page 13 several of data analysis opportunities arises. Initially overall statistics about the collected data as a whole can be collected. Also average values across all the test participants can be calculated. Entire Test period Number of Test Participants Days per Test Participant Number of Active Days per Test Participant Number of Played Tracks Number of Collected Context Information After finding these numbers some aggregated plots should be made which show how many much music was listened to for all the test participants. The purpose of these two diagrams is to see if any trends can be observed when the users are listening to music. First the long term trends over all the days and then second the daily scope to understand when most of the users are listening to music during the course of a day. The same is done for both discovered users and WLANs. Because the mobile device is scanning for WLANs and Bluetooth at the same time it could be expected that the number of both discovered users and discovered WLANs would be proportional. It is however possible for them to change independently e.g. staying in one location and having a lot of people come by or the opposite going

48 38 Analysis around with the same group of people and discovering new locations. Again will the long and the short term patterns be analyzed. It could be tried to to cluster all the WLANs discovered and simlarly for the users. The purpose of making these clusters of both WLANs and users is to discover some places that can be identified by the WLANs connected to them and some social groups that consist of the users within. These cluster networks will be calculated individually for each test participant. Each of the users can be tracked on the basis of what WLANs they have discovered. Phung et al. [21] have used a method to use the observed WLANs to create so called significant locations by also using algorithms to detect when the user is moving. Ashbrook et al. [1] are finding significant locations using GPS input instead, but the idea is similar. To find these significant locations that are relative positions for each user to try to understand which locations the individual users are in. In the social context similar user clusters could be found to evaluate which social contexts each user is in. When creating visualization of the clusters the size of the nodes and the edges should indicate frequencies, i.e. a node representing a WLAN or a user should be big if it has been observed many times. Similarly the edges should receive a kind of weight that indicate how many times the connected nodes have be observed together. Related to location Song et al. [28] have made this type of node size weighting and also made edges thicker when then number of transitions among these nodes was higher. The exact visual representation of the weights will be implementation specific. When these two types of clusters have been found, the changes in social and location context over time should be analyzed. When a larger data set has been acquired users can be compared on different features. Discovering users that are similar can be used to offer each user different kinds of personalized services. In a social context it could be establishing a connection between similar users based on the otherwise hidden similarities. Segaran [26] describes different ways of applying collective intelligence it is mentioned how to practically make recommendations both finding similar users and recommending items. The service created does not have to be social it could just be recommendation of e.g. music that similar users have liked. The pure music recommendation approach is possible from the collected information. Finding music neighbors and using their rating to suggest tracks that the local user has not listened to yet. Sarwar et al. [25] mention a number of item-based collaborative filtering algorithms. The recommendation is in this case computed in the reverse order. All the items are compared first and then the similar users are found. This is especially useful for performance reasons when the user-base becomes huge. Relative location neighbors could indicate how many overlapping locations users have and how much the groups overlap. Having the relative locations of the users can be used with other inferred data to say something about the state of the user so this way of categorizing music sessions of users should be explored.

49 Chapter 4 Design In the Design chapter the data representation for the prototype system are designed. The design of the data collecting prototype with its various components and components connections is described. This shows how the Mobile Web Server is going to be used as a sharing component in the system. Further the system containing multiple instantiations of the data collecting prototype is described. Finally the different required types of data analysis specified in the Analysis chapter are divided into modules and the wanted data analysis approach is designed. 4.1 Context Data Representation The context data have to be represented to fulfill the requirements described in Section and fit with the discovered domain concepts in Section The right extensible context representation for the prototype must be chosen and the context must be stored in a database and shared in the format of a hierarchical XML file which is still just an aggregation of sensory input. No knowledge has been inferred from the input data. So it is still just a collection of low-level context data. But for testing and prototyping the actual level of context data is not immediately important because it is the sharing possibilities

50 40 Design that have to be evaluated and prototyped. Most of the time context can be represented as XML, but the schema for the context is not necessarily a fixed one, the representation of the context can be easily extended. For the s60 devices there are only few XML parsers available that actually work, and if the sharing should be done in another language as Web Ontology Language 1 (OWL) or Resource Description Framework 2 (RDF) then the parsing would still have to be done using the XML parser on the mobile device. The databases in the system must store the same information which is kept in the internal representation and the XML Schemas. Figure 4.1 shows the class diagrams for the internal data representation with the two created data classes (the Context and the Connection classes). The Context data class basically holds the information given by the context features. From the GPS the latitude and longitude. From the Bluetooth a list of user IDs. From the music player the track name and the artist name. From the WLAN the list of discovered WLANs. Finally from the system itself a Global Unique IDentifier (GUID), a user name, and a time stamp is added. The GUID is to be able to refer to a particular context. The user name is to link the context collected to the user. The time stamp is to have a temporal marker, which is also good when it has to be compared with contexts of others. The context information is going to be a snapshot of the context adapter inputs. With regards to the needed XML representation the hierarchical structure of the XML documents is used to group data elements that are semantically connected. To create the document the attribute names from the class diagrams are taken and used as element names in the XML document. The groupings music, and location were made by nesting the relevant elements with these complex elements. For the two list attributes users and WLANs, each individual ID was put in an element and then inside the sequence element representing the list. In the running system contexts that are being gathered locally and registered the queries for contexts from other devices should be stored. This is done to link the requestor to the data so the request information will be in the database in order to analyze it later. This is going to be the Connection class being shown in Figure Level of Context Information Low-level context information is the raw sensor input. The information can be stored in standard text format but more structure can be imposed to it by compiling the raw data in the data objects or by embedding the collected information in an XML document as described in the previous section

4.2 Persistent Storage 41 Figure 4.1: Data classes in the prototype system. High-level context information is context data created from low level context information using data processing. E.g. a value that is saying whether a person is moving or not.

51 4.2 Persistent Storage 41 Figure 4.1: Data classes in the prototype system. High-level context information is context data created from low level context information using data processing. E.g. a value that is saying whether a person is moving or not. Jensen [17] has this movement which is a high-level context that could be shared. The reason why it has a higher level is because it from lower context information determines if the user is moving or not. The high-level context can be difficult to share, especially if it is individual like a relative location. Unless it is being used to compare user from their relative perspective. In that sense general high-level context like creating recurring location descriptions like home, work, and cinema are appropriate and can be used describe the context of a user. 4.2 Persistent Storage To make sure that the gathered information is stored, even if the application is terminated, persistent storage is needed. This persistent storage will be a part of the context management of the system. So a database is designed for the data. This database will consist of four tables: context, connections, users, and wlans. The database diagram can be seen in Figure 4.2. The tables are create by looking at the internal class representation and using the equivalent data types in the database. For the list attributes separate tables are made for storing all the list elements and connecting them to the same context object. These tables can be used both in the mobile database and in the back-end server database.

42 Design Figure 4.2: Database diagram showing how the data classes have been stored in a relational database. 4.3 Description of the Data Collecting Prototype The data collecting prototype is going to display dynamic context information.

52 42 Design Figure 4.2: Database diagram showing how the data classes have been stored in a relational database. 4.3 Description of the Data Collecting Prototype The data collecting prototype is going to display dynamic context information. It is a good starting point for setting up the architecture for this kind of application which works on multiple layers and will have to be divided in several components. The prototype is a modified version of the mobile context toolbox proposed by Jensen [17] with a local database that is uploading context information to a central back-end server. These contexts can be retrieved by the other users after they discover each other via Bluetooth. When a mobile device have discovered the unique name of another device it will look it up in order to gather contextual information. From the data transferred during the Bluetooth protocol the originally requesting device has similarly shared its unique name. A two way relation have been created in this way. These devices can now query each other on a regular basis so when the number of discovered devices increases, the context scanner sweeps will take longer time. The mobile devices will reach a limit where it is not possible to retrieve data from more users, simply because it has almost lost its relevance as fresh context information. This could indicate that the decentralized architecture has a scalability issue in the case that it is actually a wanted feature to continuously scan all the discovered devices. When discovering new devices that the Bluetooth sensor scans for nearby devices, these are asked for their private nearby devices. Then these devices are returned and now the user knows even more devices. This is the naive approach to building a social network. The total information gathered two hops from the local user will be related because context data one step further away will be create at almost the same time. This draws a picture of a context of a user which is in a higher resolution than what each of the individuals have gathered. This could be put together after the test is over or a query could be created so it could be done online and in this was creating more

53 4.3 Description of the Data Collecting Prototype 43 detailed information collaboratively. It will be made possible for the user to select the network mode of the application. The application will in online mode send the context information to a central server as they arrive, and in offline mode they will not be sent immediately but only stored in the mobile data base to be transferred at a time when the user has an available network connection Peer Discovery and Identity Sharing To discover other users also having the application running different approaches can be taken. First a Bluetooth protocol can be created. The purpose of this protocol would be to give access to the context data of others, because the user only retrieves a Bluetooth IDs when he makes a scan for people around him. The protocol could provide the requestor with the global ID of the discovered user so further information could be fetched. This is providing the mapping mechanism from Bluetooth ID to global ID. Beach et al. [2] proposes a similar identity-sharing protocol which is used to initiate social activities and to make a context-aware location specific music player. The second option is to have a mapping table with all users running the application (Bluetooth ID global ID). However, this has to be administrated centrally to make sure that all users know each other also after new users are added. This approach is contradicting the idea of having a decentralized system. Both the protocol and the table option put no restrictions on which Bluetooth ID the user selects for his phone. A third option could be if the Bluetooth ID is fixed and thus known. It could be the same as the global ID and thus making the protocol or lookup table unnecessary. But this restriction of the user can easily become an annoying element for the user if he e.g. has applied a personal naming scheme for his Bluetooth enabled devices Graphical User Interface The Graphical User Interface (GUI) is going to be in Python and to have three panes showing various information: Feed, Stats and Trace. Feed is the first pane displaying the most recent contexts both local and remote. They are displayed as they arrive. Because of the limited screen area only the latest three local and the latest two remote contexts are shown respectively. Stats is the second pane showing statistical music information (most played tracks, tracks matching with other seen users), it also contains a list of the users who have been observed

44 Design during the current execution of the application. Trace is the third pane and it is going to allow the users to perform a temporal trace of a specific user seen previously.

54 44 Design during the current execution of the application. Trace is the third pane and it is going to allow the users to perform a temporal trace of a specific user seen previously. The temporal traces are going to contain seven context elements which is the maximum practical number of elements to fit onto the display immediately with its limited screen real estate, otherwise a custom scrollable list implementation has to be created. To provide extra functionality the program also has a menu that allows the users to change the current network mode of the application. In online mode the application is sending each collected context object to the central server. In offline mode this is not done and the context entries will just get store in the local database so it can be sent at a more appropriate point in time. Having the offline mode means that the user does not have to insert a SIM card to use the application. The flow of data in the application is shown in Figure 4.3. First the raw data is collected then it is aggregated to a object format representing a context observation. Using this object representation the context is stored in a local database. When the context entry is requested by a remote user it is queried in the database and the context observation is returned to its object form. From this reestablished context object a equivalent XML representation is created. This is done in order to make the information available to others either by hosting it when other users are requesting it from the local mobile web server or from the central back-end server depending on the system architecture. Figure 4.3: Data flow in the mobile application. Starting from raw data being presented in the system and then stored in a database. When the context information is going to be sent it is transformed to XML and parsed at the recieved node. So what is represented is what the local node has and also what the remote user receives.

55 4.4 Information about the Mobile Context Toolbox Information about the Mobile Context Toolbox The Mobile Context Toolbox is using the software design pattern called the Observer Pattern [8] to create an event based flow of information. This flow is bottom-up. There are three layers inherited from the architecture of the Mobile Context Toolbox. These layers are, starting from the lowest, adapters, context widgets, and applications as decribed by Jensen [17]. Having these distinctions of various responsibilities the various parts of the data flow can be pieced together. Starting from the bottom with the adapters. From the context feature analysis in Section 3.3 a choice of input features was madee. Each input feature is handled by a single adapters to divide responsibility. In this project the Bluetooth adapter, the music adapter, the WLAN adapter, and the GPS adapter are going to be used. These adapters publish their information to context widgets that aggregate the data. A part of designing the application is to decide what the sampling rates are for each of the adapters used in the mobile context toolbox. The following Table 4.1 shows the how long time there is between samplings are triggered by the MCT. On top of the basic functionalities of displaying context information, other functionalities can be added. They require more data processing like collaborative filtering and other types of collective intelligence but this requires the data for the data analysis before these additional features can be added. Sample Type Seconds Between Samples Bluetooth 60 GPS 1200 WLAN 60 Music 60 Table 4.1: Length of intervals between context samples for different context features. 4.5 System Design The design of the system needs to support the wanted end-user application. So the system will be divided into different components. The context information could be gathered by an aggregating context widget that can give the structured higher level context information to the module that the web service is using (make the near-current context available to the modules). This widget

56 46 Design will gather the information from the four adapters GPS,WLAN, Bluetooth, and the music player (this is assuming that the music adapter is created in the same way as the other adapters). Because the local mobile phone does not have all information available about the nearby users, a communication widget that can get the information from the remote users must be created. This is in charge of getting remote information from users discovered during in the collected local context information. As a starting point the system will have a decentralized server structure with the Nokia Mobile Web Server. This means that each mobile phone hosts its own data and when users identify that there are other users around they will query each nearby user for data to display in the application or to store for further data analysis. Only a relatively small number of nearby users can feasibly be queried due to the capabilities of the individual server, so for more computational intensive work then a central server architecture must be considered. That is both in terms of scalability of server performance and processing power for data analysis Architecture In this section different architecture approaches for sharing music and context information in mobile social networks will be presented. There are two overall approaches. A system can either be centralized or decentralized. The two forms can exist purely or as hybrid systems which overlap features of both the forms. In any case a centralized system is conceptually different from decentralized one. It will be described which components the systems consist of. Both systems are going to accomplish the same functionality, which is to share contextual information, they just have different approaches Decentralized System In a decentralized system the nodes are of same type. They are able to contact each other directly and thus in this way both act as client and server. A node needs to know how to contact all nodes in the system explicitly because there is no third party to take care of the correct forwarding of the message. With regard to data storage the individual node is given more importance in the decentralized system. Each device holds its own data since there is no central server to rely on. Even if the sharing system is decentralized it does not have to consist of only one type of peers. It could be beneficial to have different peer types in a sharing system in the cases where the behavior of the users is different from each other.

57 4.5 System Design 47 If a person has created a lot of specialized recommendations, that is, significantly more than average and the ratings are consistent and accurate, this peer could get a specialist peer status in the system. This would influence the system so the ratings of the specialist was given more importance when recommending particular items. Similarly, if another user is very social and is meeting a lot of other users, his peer status could also be changed so the large social network he has, is used to connect sub-communities that have this user in their user set intersection. In this way utilizing this extra networking capability that the particular peer has creates benefit for the rest of the users. These ideas about having different types of peers is inspired by Gladwell [10] who describes how these characteristic personalities exist in the society Centralized System In a centralized system the nodes in the system are not of same type. There are a central server and a set of client nodes. The communications in the system flow between the clients and the server. The client nodes can not contact each other directly. So when a client wants to send information to another client, this information must first be transferred to the server and from the server to the receiving client. The data is stored primarily at the central server in the centralized system which is ready to send it quickly to the recipient Push and Pull Regarding the direction of the flow of information between the peers in the system it depends on the structure of the system what is best. Usually in a centralized system, the server is a globally accessible node where the clients are not. So when inter-client transfer is made, the sender pushes the information to the server. The information can then be pulled by the recipient. When a user knows how to contact the server it is able to contact all the clients of the system. On the other hand in a decentralized system each node usually makes the content that it wishes to share available and then the recipient can pull the information when he needs. These general considerations have also affected how the flow of the implemented systems has been designed. The recipient of the context information is always going to pull the information from either the back-end server or the mobile web server in the centralized and decentralized system respectively. After gathering the context information the sender of the data will either send it to the back-end server or host it on the local mobile web server.

58 48 Design 4.6 Components The modules that will be created will work with low level context data as it was designed in Section 4.1 because at this level it is more important to set up the infrastructure. After the setup it could send almost arbitrary context information. The components will have a layered architecture using the component design decribed by Jensen[17] introducing the Mobile Context Toolbox with adapters, context widget and applications. The context gatherer module, which has a bit misleading name, is more like a context manager that can be queried from the applications at all times (See Gu et al. [11]). The local context widget will be responsible for collecting the data from a number of adapters. The remote context widget will take care of the communication with the other devices which makes sure that context information is made available and also external information is fetched for other local components. The database application is not strictly an application because applications in this framework by definition only use data but do not produce or calculate anything themselves. It is in any case the end of the event flow coming all the way down from the adapters. The data stored by the database application will be queried in the mobile database on demand. Figure 4.4 shows the three component layers. The adapters collecting the context data from the sensors, the context widgets that take adapter output as input and manipulate the data to its higher level context information. The output from one contexts widget can be the input for another context widget as it is illustrated by the event data flow indicated between the local context widget and the remote context widget. Finally the top layer containing the applications which are in this frame of references the endpoints of the event data flow. They have also delegated the data processing to context widgets so it is the presentation of the data that the applications primarily deal with.

4.6 Components 49 Figure 4.4: Data event flow among the modules in the data collecting prototype. The data events flows upwards in the layers. From the adapter layer to the application layer. 4.6.1 Modified Mobile Context Toolbox The Mobile Context Toolbox is a very versatile framework.

59 4.6 Components 49 Figure 4.4: Data event flow among the modules in the data collecting prototype. The data events flows upwards in the layers. From the adapter layer to the application layer Modified Mobile Context Toolbox The Mobile Context Toolbox is a very versatile framework. It gives the possibility to use a wide palette of features. During the set up the various adapters, context widget and applications are connected so the context events flow as wanted. The reason for calling it the modified mobile context toolbox is because some features are going to be added. A music adapter must be created with the similar adapter functionalities. First the adapter should make it possible for other object to register to get notifications every time the adapter is getting output from the sensor. Second it should have daemon behavior, i.e. be an object that runs in the background not interfering with the user. The mobile context toolbox is also more general because it defined an extensible structure and it is possible to create custom adapters. Because new context widget and application are needed for this project, the mobile context toolbox made it easy to create something that fit specific needs Mobile Web Server The Mobile Web Server component enables the transfer of data in a decentralized way. It is going to host some Python scripts that allow the data transfers. It is setup in two parts: context.py for retrieving context information from the

60 50 Design local database and the admin.py for remotely giving commands in a HTTP GET Request. The following list of commands shows what admin interface can do. getcontexts getconnections emptydatabase The commands are more or less self descriptive. They get the context and connections separately, and enable users to empty the local database remotely. Especially the last one shows the strength of the admin interface and this is primarily used for test purposes. These commands should not be accessible to the end-users for logical reasons. Having these fundamental pages up they can be used appropriately by other modules in the system. The idea with the usage of the Mobile Web Server is that it should be hidden. So requesting a list of context elements from a remote user only identified by an ID returns exactly that list of context elements to the user. This hides the HTTP request to the mobile web server and the parsing of the retrieved XML document. The mobile web server is only used in the decentralized system because having one server is all that is needed in the centralized system to work Data Analysis Modules The data analysis is mainly going to be performed on the back-end server. It will be split into different modules to separate the work that needs to be done. The clustering module is going to consist of a list of functions that create a visual output either various plots or graphs displaying found clusters. The list is going to consist of the graphs mentioned in Section To make the data a bit more smooth a concept of time windows is introduce, where instead of having a continuous time scale the time is split into intervals of predefined length. Depending on the wanted granularity shorter or longer intervals can be chosen. The effect of using the time windows will be that a list of observations are going to be put together and seen as one observation. This type of accumulation of the observations are also useful to determine if some observations can be considered noise. Especially related to creating location clusters, it is wanted to filter out possible transition time windows. These unstable windows that are between two locations with stable time windows when looking at the observed WLANs. The time windows length will be set to 6 minutes which means 10 time windows per hour. The music module is going to analyze the music that the users have played. First

61 4.6 Components 51 it should get the average play count for all tracks as mention in Section 3.4 then it should categorize all the plays collected by the users. Having this categorization a rating is going to be assigned for each track each user listened to. Being able to compare these ratings and normalizing them provides a table of user preferences for all the users. With this music neighbors can be calculated using the approach described by Segaran [26] using the Pearson correlation coefficient as similarity measure. Equation 4.1 shows how to calculate the Pearson correlation coefficient r. r = N R 1(x)R 2 (x) (R 1 (x) 2 N R1(x) n N R1(x) N R2(x) N )(R 2 (x) 2 N R2(x) N ) (4.1) Using the collective knowledge in the set of user preferences, music recommendations can be created weighing each user recommendation proportional to similarity and finding the average recommendation from the entire user set. The normalization scheme that is used to put all the ratings of the users to the same scale is shown in Equation 4.2. R u (x) is the rating of item x by user u. R u is the average rating for the user. Ru N (x) is the normalized rating of item x by user u. Ru N (x) = R u(x) R u (4.2) (R u (x) R u ) 2

62 52 Design

63 Chapter 5 Implementation The Implementation chapter contains how the decentralized sharing system can be converted to a centralized system. It is described how the original Mobile Context Toolkit was modified to fit the needs of the music and context sharing system. Then the limitations are set and the problems encountered during the project are identified. The use of different hardware, software and external modules is described. The implemented GUI and the different data analysis modules are presented. Each module section shows in detail how the clustering is performed and how the personalized music recommendations are given. 5.1 Initial Component Tests The implementation started with installing Python and the Mobile Web Server on the test devices. Having the software installed a number of different components that were tested and later combined. These tests had to be set up and a lot of extensions had to be signed and transferred. A data format for the context data corresponding to the input features was created following the data design from Section 4.1 on page 39. Having the internal object representation of the context data, an equivalent XML representation was made for when an object had to be serialized and sent to another place in the system. Again following

64 54 Implementation the design the extra nested elements were added to hold the elements that had a semantical relation. E.g. track name and artist name were nested in a music element utilizing the hierarchical structure of the XML format. A web service for the Mobile Web Server able to load local context information and transform it to XML so it can be sent in response to the requestor has been implemented. This web service is web pages that can fetch data from the HTTP GET request by parsing it so it is transformed to the internal representation format. Initially a small application that shows the gathered information was created to have a minimum of user experience. These building blocks are the first steps to get started with the implementation. 5.2 Converting from Decentralized to Centralized System The system was initially created as a decentralized system with a centralized back-end server that was passively collecting data for data analysis. Because the mobile web server components on the devices were later disabled the system had to become centralized B. Going from a decentralized system, a number of steps are required in order to convert it to a centralized one with the equivalent functionalities. Even though each device is still able to have a working local mobile database, the data flow in a centralized system indicates that having the central server which fetches each requested context data for the mobile devices inserts two unnecessary messages. All said assuming that the sender of the context information has already sent the information to the central back-end server for data analysis. Some changes had to be made for the mobile applications to fetch the context information from the database of the central back-end server. The back-end server was already running Apache with mod python and MySQL to receive the context gathered for later data analysis. So this had to be extended with the feature of being able to return context information to the mobile applications by creating another Python page that takes requests and returns XML data. 5.3 Mobile Context Toolbox The prototype of this project is based upon previous work done by Jensen [17]. Initially when access to this elaborate Mobile Context Toolbox (MCT) was given, small tests of the toolbox were performed to understand its functionalities.

65 5.3 Mobile Context Toolbox Mobile Context Toolbox Dependencies During the setup of the Mobile Context Toolbox on the test phones the following extensions were dependencies of the Mobile Context Toolbox. They were signed and installed using milab s online signing application. 1 Appswitch - To switch between applications. Blues - To switch the Bluetooth on/off. Cenrep - To collect various information (call info,bt powerstate, etc.). Elocation - To get GSM information. Envy - To cancel red button exit and make application invisible. Lightblue - To get access to Bluetooth functionality in Python. Locationrequestor - To get GPS information. Miso - To manipulate the system of the phone and get system information. Music Player Remote - To get information from the music player. As it can be seen from the list above the Mobile Context Toolbox uses a lot of extensions to reach the sensor input it needs. After the extensions have been installed they are handled by the Mobile Context Toolbox so the complexity of using the specific API of each extension is removed. The Music Player Remote was the only new extension that needed to be adapted to the architecture of the Mobile Context Toolbox Corrections/Modifications During the revision of the Mobile Context Toolbox early in the project some small one line mistakes were found in the following adapter module. It was in the bluetooth module adding parenthesis to a function call. In the gps module modifications were made by removing the movement sensor activation of the GPS coordinate polling. Instead a simple delay and counter method was made to check a fixed number of times with a time delay before aborting to get the GPS position from the sensor Additions To add the wanted functionalities of the system, additions were made to the original MCT. 1

66 56 Implementation Adapters music - An adapter that publishes music information into the system. The music adapter was created in a similar way as the other adapters. They contain a getpublisher() function so they can be reached during the MCT startup. The music adapter contains a class Music that holds the current music information and auxiliary methods to load the Music Player Remote extension. The Music class also inherits the options for other object to register and be notified when event occurs from the Subject class. From the Daemon class the music class inherits the functionalities that the class is able to work in the background and also can be started at predefined intervals Context Widgets localcontext - The localcontext widget was created to aggregate the context information arriving from the context adapters. remotecontext - For each local context received by the localcontext module the discovered users are queried for data by this remotecontext module. The context widgets are connected as it was described in the Design Chapter illustrated on Figure 4.4 on page 49 where output from both context widgets is fed to both the applications described in the following section. In this case the output of the local context widget is also fed to the remote context widget. This exemplifies the possibility to create a chain of context widgets to refine the context data Applications contextapp - This application takes context elements as input and displays them in a GUI. dbapp - This application takes context elements as input and stores them in the local database. Later on the option of sending the context data to a central server was added. The applications are the end of the context event data flow started from the adapters. They are not producing any output for any other applications in the MCT framework.

67 5.4 Mobile Database Mobile Database For the project a wrapper implementation to handle the e32db database on the phone was used. It continued to fail after relatively few queries. After some debugging a fix for this made the implementation more robust even when it would have to run for a long time. A database module was implemented containing a list of functions used during the database related activities such as creating, dropping tables, and inserting different entries. The data types for the e32db deviated a bit from the equivalent ones for the MySQL server but similar were implemented without loss of data. 5.5 Limitations During the process of writing the software for this project some modules could have been done differently if more time had been available. Optimizations of database schemes used on the phone could have been done. The entries have to be selected by the user column, that is the user who has collected the particular context. This means that a lookup by context ID have to be done for each of the users and WLANs associated. If instead the user had also been a key in the table then all the usernames for each user could have been fetched. This would make the retrieval of the information faster. But compared to normal database systems the biggest limitation is that the JOIN statement is not supported in the mobile database management system (Pythons e32db). Limiting the context features to the ones described in Section 3.3 on page 13 in the Analysis Chapter made it impossible to use the method for finding significant locations described Phung et al. [21] using a Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Because the signal strength data was stripped from the collected data in the WLAN adapter. 5.6 Problems During the project Nokia decided to discontinue the Nokia Mobile Web Server beta(see in Appendix B). This meant that the gateway providing access to the cell phones from network was shut down. So the development was from that point on only possible in local mode i.e. only being reachable from the same LAN. So even though the decentralized prototype was functional it would not be able to be used in a user test. This meant that considerable effort had to be made to implement an equivalent centralized system. This gave good insights

68 58 Implementation to what differentiate a decentralized and a centralized system in the context of the prototype. Initially the intention was to create an identity-sharing protocol over Bluetooth by setting up a service that scanning devices could then call to retrieve the global ID of the user, but due to an error in the used Bluetooth module for Python this flexible approach was abandoned in favor of using the alternate option of fixing the Bluetooth ID, skipping the entire ID mapping problem. Another problem was that early on in the project it was realized that the Music Player Remote extension was not able to be installed on the Nokia 5800 Xpress- Music and it would therefore not be possible to collect music information from this device. This led to the decision to discontinue the development for this particular smart phone and focus on the Nokia N95 smart phones for which the Music Player Remote extension is working. The last problem was concerning Apache web server which was crashing a lot because of exceptions from the MySQL database regarding duplicate entries during the upload of context data from the users. These duplicate insertion exceptions were ignored because it was safe to do without loosing data unless the extra attempted addition can be avoided. In this case it is difficult because the mobile application does not know if the back-end server has actually received the send context information. 5.7 Graphical User Interface The graphical user interface was implemented in Python. Figure 5.1 shows two screenshots taken from the implemented mobile data collection application. First the Feed pane contains two feeds. In the top of the white area of the application is the local context feed. It contains two visible contexts represented with three lines each. These lines can be explain using the following syntax: <Local User> with <Observed Users> <Artist> - <Track> (<GPS coordinates>) at <Hours>:<Minutes> The number in the upper left corner of each feed is the context counts of the feed type. The remote context feed has a similar syntax but contains three context element instead. This amount was chosen due to space restraints and also because it is more interesting to see the context of others compared to the one of the local user. The Stats pane contains the elements described in the following syntax: <Number of observed users> Users seen <Observed users> Most played artist <Artist> <Local top 5 tracks>

5.8 External Python Modules 59 <Top 5 tracks matching the other users> The number of observed users during the application session and the corresponding user list are shown first.

First a list with the five most played tracks in the current session and a second list with the tracks that the local user has played, which are observed being played most by a remote user.

69 5.8 External Python Modules 59 <Top 5 tracks matching the other users> The number of observed users during the application session and the corresponding user list are shown first. Then the most popular artist is displayed. The rest of the displayed text is split into two lists. First a list with the five most played tracks in the current session and a second list with the tracks that the local user has played, which are observed being played most by a remote user. The user ID of the matching user is shown next to each track.the last pane Trace has the following syntax: User trace:<user> <List of seven Context s> A Context is in the trace pane visualized using following syntax: <Time>:<Observed Users> <WLANs> This uses all the 15 lines that can fit on the display of the mobile device. Figure 5.1: Screenshots of Feed and Stats Panes of the GUI. 5.8 External Python Modules The following Python modules has many of the functionalities that are needed in the project. networkx - To create and visualize network graphs from the collected data.

70 60 Implementation matplotlib - For visualization of data. numpy - For heavy calculations performed in Python. hcluster - For hierarchical clustering in Python. pydot - For exporting networkx graphs to Graphviz using the Dot language for the graphs. elementtree - XML parser for Python. XMLParser - XML parser for Python s60. These are mainly modules used during the data analysis either for the network graph representations, visualizations, for clustering, or calculation. Two XML parsers were used to transform the data on the different platforms. 5.9 Software Table 5.1 lists software used for the implementation of the system. Two versions of Python were installed to make the different parts of the system work. The implementation of mod python required Python 2.5 but the networkx module needed Python 2.6. For the synchronization and installation of extensions both putools and Nokia PC Suite were used respectively. Putools is using checksums to see if files has been updated. A MySQL server was set up on the backend server along with the Apache server. Graphviz was installed but not used through its own user interface, only through the pydot API from Python. Name Version Description Python 2.5,2.6 For mod python for Apache(2.5), and for Networkx(2.6). Apache 2.2 To host the back-end web services. MySQL Server 5.1 To store the incoming information in the back-end server. Putools From 2006 To synchronize files between PC and phones. Graphviz For network graphs layout algorithms. MySQL-Python MySQL for Python. Nokia PC Suite To install applications on devices. Table 5.1: Software used during the project.

71 5.10 Hardware Hardware The hardware used during the project are the elements mentioned during the Design phase in Chapter 4. In Table 5.2 the types of hardware are listed with their respective model. The wireless router was used to provide a WLAN to use as a test environment for the prototype system set up especially for the decentralized system. It was also used to provide easy internet access for the mobile devices when testing the access to the mobile web servers from the internet. Type Server Phones Phones Wireless Router Model IBM ThinkPad T40 Nokia N95 8GB Nokia 5800 XpressMusic Linksys WRT54GL Table 5.2: Hardware used during the project Data Analysis MySQLdb was used in all the data analyses to make the data extractions from the back-end database in Python. After being instantiated the data was ready to be manipulated and used in various calculations. The following section describe how the data analysis modules have implemented. They were executed on the back-end server and not on the mobile devices Music Module As described in the Design Chapter in Section on page 50 the first implemented function was getting the average play count for each track. Having a number for the average length of the tracks each play of a particular track can be categorized. This characterization is implementing the analysis of the behavior from Section on page 20. After this characterization each track played by a user is represented by a list of 4 elements. So [normal, first, skip, long] is a representation of how the user has played the track. These characterizing lists are used to calculate a predicted rating for the tracks for each user by implementing Equation 3.1 on page 21. With a personal rating the user can find out which tracks he likes most. The weights for the different play types could need to be adjusted due to the fact they were a simple logical initial set of weights

72 62 Implementation which can later allow the comparison of all the users ratings using Equation 4.2 showed on page 51. The average rating of each user will be calculated to have the same scale for all the users. The most important thing is that the ratings are on the same scale. The ratings were all normalized to the interval [ 1, 1]. Related to ratings many systems use a ordinal scale from e.g when users have to explicitly insert ratings. People can relate to this kind of scale when they have to rate things, however the distinction between the different points of the scale gets a bit blurred when there are many points. E.g. what separates the 7 and the 8 on the 1-10 scale? But when the ratings are being compared automatically and the ratings are just a measure to see how far the users ratings are from each other it is not needed to have a ordinal scale. With a music preferences table for all the user containing ratings for all the tracks they have rated. With that the collaborative filtering can be performed. First by finding music neighbors using the Pearson correlation coefficient as a similarity measure. With this measure of how similar the users are to each other predictions of a user s rating of unheard tracks can be calculated by using the user similarity measure as a weight on their ratings. Then the average weighted rating of the neighbors is calculated and used as a predication of the rating of the user. In the music module the play sessions for each user were also extracted by observing a pattern in the music plays collected. A new music session would normally start when the user starts playing music after a pause period with no music. However in the collected music plays a consistent single gap was found every time the music player was changing tracks, so the pause period had to be longer. Two plays were chosen as the new gap. A longer gap could be chosen so e.g. one hour was needed to separate two sessions, but the gap between the two plays was accurate enough to get a normal amount of sessions per user Clustering Module The matplotlib module was used for visualizing the results of the analysis related to the discovered users and WLANs for plotting the daily counts of discovered elements over time, bar charts and more. The networkx module was used to represent and visualize graphs. The hcluster module was used to find the cluster labels for the elements that were clustered. It was also used to calculate the maximum distances between clusters which in turns were used to determine how many cluster to use. To find an appropriate amount of cluster for each user two iterations of clustering are performed. In the first all nodes are added. When the clustering has been performed a list of cluster distances are retrieved. A threshold has been set to 8% of the largest cluster distance based on experimental trials. This threshold determines the number of clusters to use. The clustering can then be performed again with the wanted number of

73 5.12 Device Setup for the User Test 63 clusters. The pydot module was used during the optimization of the visual representation of the created clusters. The networkx module does not have any advanced layout algorithms for drawing the network graphs. The best one is the spring layout making the edges in a graph act as springs. This is fine for simple networks but the quality of the layout gets low when more nodes are added and they are connected in complex ways. To improve the layout, the graph was exported to the Dot language, which is a plain text graph description language. The textual representation of the graph could then be to loaded in Graphviz which is a graph visualization software. This was done directly from Python using pydot. As argument to the exporting function the wanted layout must be specified. The chosen layout engine was fdp which is similar to neato but with different physics rules applying between nodes that are not directly connected (the specific differences are described by Gansner [9]). fdp was also the layout engine that performed best by trial and error on the specific network graphs of the project Device Setup for the User Test The following is a list of steps performed in order to set the Nokia N95 8GB up to be used for the user test. Set Bluetooth ID to correspond to the phone ID. Turn on Bluetooth on the phone. Download and sign all the dependencies. Using Nokia PC Suite to install all dependencies for the Mobile Context Toolbox on the phone. After performing these steps all the necessary script files (the Modified Mobile Context Toolbox) should be placed correctly on the phones. These are transferred using Putools. Device dependent files are copied separately. The unique device dependent setting is the Bluetooth ID of each device. To create better usability for the users standby shortcuts for Python, Bluetooth, and the music player were created to provide easy access on the device. To see the source code for the implemented software and scripts used for the data analysis see Appendix C.

74 64 Implementation

75 Chapter 6 Test In the Test chapter it is described how the user test was performed. The chapter starts with considerations about the power consumption of the phones due to the application running. After the most important test requirements are presented with the user test choices. Finally the information the test participants need to receive before starting the test are listed. 6.1 Power Consumption To find out how long the test phones would last having the application running without charging it an application called Nokia Energy Profiler was installed on a phone. The Nokia Energy Profiler records the level of power consumption over time and predicts how long the battery is going to last. It can also be used to observe the power consumption profile of a running application and give an understanding of how the internal components consume power and how much. If no applications were running on the phones the profiler predicted that it would last 11 hours and 42 minutes.starting the music player made the time of the battery go down to 5 hours around the 200 ma level. Only starting the created Python application (without GPS) made the current usage go to 100 ma (approx. 90mA). Adding more adapters or playing music drained the battery faster. A plot comparing the application power consumption with and

76 66 Test without a component will usually be shifted vertically unless some scanning is done periodically. E.g. the network connection made spikes to 500mA from approx. 260mA. In Figure 6.1 the spikes are represented in the line with data with WLAN enabled. Bluetooth scanning is indicated by the longer increased power consumption level. A good performance goal for the power consumption level of the application is when the battery lasts at least an average working day so the users do not have to charge the phone during the day. The application with all adapters disabled was using approx. 90 ma (11 hours of battery time) and if all the chosen sensors were enabled except the GPS it was instead using 116 ma which is giving 8 hours of battery time. If both the GPS sensor and the music adapter were enabled it would use approx. 240mA which was equivalent to 4 hours on a particular phones. On the basis of these coarse findings the GPS sensor was disabled to allow the users to have the application running the entire working day without worrying about charging it. Figure 6.1: Comparison of power consumption while running the application with or without WLAN enabled. High spikes are WLAN and low spikes are Bluetooth

77 6.2 Test Specifications Test Specifications Test Size To test the basic functionalities of the user application few users should be chosen in proximity of each other and observe that their respective applications are showing the names of the other users and what they are listening to. The very dynamic and local nature of the application makes it easier to test in small groups. To check how scalable the system is a larger test could be run. Furthermore if some kinds of collaborative filtering is used a critical mass of users must be present otherwise the similarities risk to be too low among the users. There is also a time requirement because a certain amount of time has to pass to allow the users to listen to the same tracks Test Duration When the time frame for the test has to be selected it is important to think about what temporal patterns should be observed. Some immediate time patterns are listed below. Hours Day Week (weekday/weekend) The chosen type of time interval gives the possibility to zoom on particular aspects of the user behavior. Choosing a two week test duration is about a minimum if the weekly pattern want to be observed Test Participants When choosing the test participants different considerations must done. First it should be estimated which is the end-user target group of the application to find out who is actually going to use the application. After knowing the distribution of the target users, test participants sampled from this distribution should be selected for the test. There are several factors to consider if some specific demographic usage trends should be examined.

78 68 Test Age (if the test had been bigger a more representative distribution could be considered). Geographic (in case of a bigger test also the geographical distribution could be considered to find specific geographical patterns in the collected data). Educational background (if the number of participants was higher, the different educational backgrounds should be considered because it is changing the user behavior). 6.3 User Test For performing the user test seven young people between 20 and 26 were chosen who usually hear music on mobile devices (portable mp3 players etc.). All the users live close to each other in the same geographical area near their study place. All of them are enrolled in a university level education. These participants were chosen to make sure that some contexts containing played music could be gathered, and that the users could meet for sharing the music contexts, which is one of the main purposes of the test (to see how this way of sharing context information works in practice). The selected users meet at least once a day. The user group is from a sharing point of view similar to other non academic groups like colleagues at the same workplace or neighbors. 6.4 Information for the Users The following information were given to the users to instruct them about the use of the application. Each of the test participants was personally instructed. How to start the application. How to activate the Bluetooth. What connection type should be used (WLAN/GPRS). How to install and use Nokia PC Suite. How to connect the PC and phone with Nokia PC Suite. Where and how to play music on the phone. What was causing the test participants the most trouble was connecting to Nokia PC Suite in order to transfer personal music to the device.

79 Chapter 7 Experimental Results The chapter presents user test statistics and the most relevant results from the data analysis including location and user clustering, connection logging, results from the created music recommendations and the categorization of the music sessions of the users. 7.1 User Test Statistics The user test was conducted over the total course of 40 days from to but the effective active period of the application was shorter because the test participants did not start the application or the phones ran out of battery. The average active period was 11 days as shown by Table 7.1 where the individual user participation information is shown. In Table 7.2 a number of facts about the user test is shown. Comparing the numbers it can be seen that there are more WLANs than Bluetooth IDs being collected by the application. For a user test with 7 participants the total number of distinct tracks was 79 which seems low. That aside the users seem to be hearing music a lot, 39% of the time that the collecting application has been running. Combining this with the low activity rate for the users it suggests that when the users turned on the collecting application also listened to music.

80 70 Experimental Results User Active Days Days Activity Start Date End Date milab milab milab milab milab milab milab Table 7.1: User participation during the user test. Activity is Active Days/Days. Description Value Number of test participants 7 Number of WLANs 2441 Number of Bluetooth IDs 1076 Number of distinct tracks 79 Number of context collected Number of music plays collected Table 7.2: User test statistics. The context collected are each aggregated context observation. Music plays collected are contexts containing a track name. 7.2 Data Analysis Results Figure 7.1 shows the number of plays for each test participant spread out over the active days that each user had. It is important to notice that the days can have different lengths because it depends on how long the application has been running. It can be seen from the figure that the users have a phase in the beginning of the test were their did not play so much music. This could be due to the unfamiliarity of the application. Figure 7.2 compares the number of plays that each user has listened to. There are two main users and the rest are under average. The figures shows that the ratings should be rendered comparable so the users with more plays do not give higher ratings purely for the reason of amount of plays. Figure 7.3 shows how the number of plays fluctuates according to the different days (only the days with activity are shown). Some of the observed peaks are during the weekends. The highest one is a Thursday though. The activity of the users were low in the beginning as mentioned earlier but after a while a lot of plays were registered especially on some peek days.

7.2 Data Analysis Results 71 Figure 7.1: For each user the number of plays per active is displayed. Figure 7.2: Bar chart showing the total number of music played for each user.

81 7.2 Data Analysis Results 71 Figure 7.1: For each user the number of plays per active is displayed. Figure 7.2: Bar chart showing the total number of music played for each user. Showing two user playing more music than average.

82 72 Experimental Results Figure 7.3: The figure is showing the total amount of played tracks per active day of the user test. It displays when the users are listening to music during the active parts of the user test. Figure 7.4: The figure is showing the total amount of played tracks per hour of the day. It displays when the users are listening to music during the day.

83 7.2 Data Analysis Results 73 Figure 7.4 shows when during the day the users listen to music. The time interval with the highest amount of plays is from 13 to 20. This is a period where people are active and move around the most especially students that might not have classes after 15. The interval with the lowest amount of plays is in the early morning from 4 to 9 when people are sleeping and busy preparing for work, so they do not have time to listen to music. Figure 7.5 shows the total number of WLANs collected over the test period and Figure 7.6 shows the total number of users collected over the test period. The figures show more or less the same trends as the ones for WLANs. There are three definite peak days where the level really stands out. This could however be the result of relatively few users going to big events with a lot of people and access points (APs). Figure 7.7 displays some user characteristics. From 4 to 9 is the minimum user activity. A peak at from 12 to 13 that could indicate a lunch break and overall from 11 to 17 is school/work. Figure 7.8 displays that the levels of WLAN observations are a bit more monotone compared to the users, but the night pattern from 4 to 9 can still be seen as the minimum and the interval from 18 to 20 is where the most APs are observed. Figure 7.9 and Figure 7.10 display distinct users and distinct WLANs over time respectively and give an expression of how the number of users or WLANs fluctuates during the course of a day. Many of the subplots had spikes where the number of WLANs or users changed rapidly. These spikes usually indicate a change of context e.g. location. Similar spikes are seen for the users. In the implementation of the gathering application short memory of the users and WLANs observed is kept. When moving quickly by many users and WLANs many different IDs are stored in the same context observation. In the process of finding location cluster stable time windows are wanted so unstable time windows with spikes should be removed. This will also remove some of the noise from the time windows where WLANs far from each other are observed together due to application memory and rapid movement. Looking at the height of the spikes, the threshold values to determine the stable windows were found.

84 74 Experimental Results Figure 7.5: The total number of WLANs discovered per active day of the user test. Figure 7.6: The total number of users discovered per active day of the user test. The amount of users observed fluctuate a lot. Due to the different activities of the users.

The time intervals with peaks indicate high social interaction. Figure 7.

85 7.2 Data Analysis Results 75 Figure 7.7: The total number of discovered user per hour during the course of a day. The time intervals with peaks indicate high social interaction. Figure 7.8: The total number of discovered WLANs per hour during the course of a day. The time intervals with peaks indicates high user activity because the users collect distinct WLAN even if they stay at location for a short period.

86 76 Experimental Results Figure 7.9: The figure plots the number of users discovered for each day the user (milab182) has been active. The numbers of time windows per day vary because the application has been running different amount of time per day.

87 7.2 Data Analysis Results 77 Figure 7.10: The figure plots the number of WLANs discovered for each day the user (milab182) has been active. The numbers of time windows per day vary because the application has been running different amount of time.

88 78 Experimental Results Location Clustering The three figures: Figure 7.11, Figure 7.12, and Figure 7.13 display three networks illustrating the WLAN clusters that were found for different test participants during the data analysis. These clusters can help understanding various characteristics of the user depending on how they are used. This is also valid also for the user clusters in the following section. One way to use the cluster algorithm is to find significant clusters consisting of the most frequently observed nodes which indicate the most frequent locations of the user. Another way is to lower the thresholds both for the observation frequency that the nodes must be above and the co-occurrence frequency for the edges. Doing this will discover more clusters having different characteristics. It will capture one time events where e.g. a lot of nodes are discovered together. The differences in node sizes within a cluster like the size add information to the profile of the location the cluster represents. If the differences in the node sizes are small the cluster is homogeneous and the location is in a small area so none of the WLANs can get out of range at the location. If the cluster is heterogenous the location has a larger area. The nodes are, in the three figures above, WLANs which are clustered according to their observed co-occurrence frequencies. In the diagrams with the clusters of WLANs different patterns are seen. The sum of the frequencies of the nodes in each cluster indicates where the user stays most, home cluster is. It is stable clusters being observed the amount of connected WLANs also illustrates something about the relative location that each cluster represents. E.g. as seen in Figure 7.12 there is a home cluster with the large nodes and then there is also a second cluster with many medium large nodes. These relative locations are different. The first has four really dominant WLANs and some smaller peripheral ones. On the other hand in the second relative location there is a larger amount of more or less homogenous nodes. An interpretation of this could be that the home location is a larger area where a small set of APs are the ones often seen. But once in a while a weak signal is picked up from a nearby AP. The test participants are all living in a dormitory on a university campus. The geographic location of the university is a suburban area. The fact that they live on campus means that they are close to a lot of people and a recognizable WLAN infrastructure. But from some places of the relative location it can not reach the weak APs because the clusters are also geographical dependent. For the homogenous relative location the defining area of the cluster is smaller, so the APs observed is stable. The real AP density is higher in the second relative location which could be a populated place closer to the center of the city and the home location could be in a less populated area even though there was actually a high level of observed APs just more spread out.

7.2 Data Analysis Results 79 Figure 7.11: The graph shows the WLAN clusters for the user milab182. Each cluster of nodes represent a specific location for the user.

89 7.2 Data Analysis Results 79 Figure 7.11: The graph shows the WLAN clusters for the user milab182. Each cluster of nodes represent a specific location for the user. Node size indicates the frequency with which the WLAN has been observed. The cluster with the largest nodes is the home location.the edge color indicates co-occurrence frequency.

80 Experimental Results Figure 7.12: The graph shows the WLAN clusters for the user milab106. Each cluster of nodes represents a specific location for the user.

90 80 Experimental Results Figure 7.12: The graph shows the WLAN clusters for the user milab106. Each cluster of nodes represents a specific location for the user. Node size indicates the frequency with which the WLAN has been observed. The cluster with the largest nodes is the home location. The edge color indicates co-occurrence frequency.

7.2 Data Analysis Results 81 Figure 7.13: The graph shows the WLAN clusters for the user milab135. Each cluster of nodes represents a specific location for the user.

91 7.2 Data Analysis Results 81 Figure 7.13: The graph shows the WLAN clusters for the user milab135. Each cluster of nodes represents a specific location for the user. Node size indicates the frequency with which the WLAN has been observed. The cluster with the largest nodes is the home location.the edge color indicates co-occurrence frequency.

92 82 Experimental Results User Clustering Figure 7.14 and Figure 7.15 show social clusters formed for two different test participants. The user clusters show similar characteristics as the WLANs clusters did have generally fewer nodes. The user cluster represents the social contexts or groups that the user belongs to. Both the WLANs and the user clusters found represent the spatial and social context that are recognizable for a particular user. So this can be used when trying to categorize music being played. The social clusters of two users displayed in Figure 7.14 and Figure 7.15 have different characteristics. The clusters vary in size because they are situation dependent. There are however users that have larger node sizes. The node size is directly proportional with the frequency with whom the users have been seen. From the nodes sizes in the graphs it can be known with whom the user spends more time. The edges among the users in the clusters are colored after how often two users are seen together by the local user of the application. To indicate the tightest couples in the user set the co-occurrence frequency was squared and used for edge weight. The color of the nodes is determined by the degree of the nodes meaning how many edges the nodes have. One of the more active users (milab182) has as it can be observed in Figure 7.14 (the big clusters could indicate large event where a lot of people have been seen together almost at the same time). Figure 7.15 is another example of clusters found for a user. Different sizes of clusters are found but not as large as the ones seen in the previous figure. This shows that the social networks of the test participants are different and probably also time dependent. If the event has to reoccur many times before being grouped as a cluster, then the clusters generated by big one-time events would be filtered out. Information is lost in the representation of the clusters. They represent disjoint sets of users so if friends are present in different social gatherings with the user, this will not be shown to preserve the clusters as non overlapping sets.

93 7.2 Data Analysis Results 83 Figure 7.14: The graph shows the user clusters for the user milab182. Each cluster of nodes is a known companies of users. Node size indicates the frequency with which the user has been observed. The edge color indicates user co-occurrence frequency.

94 84 Experimental Results Figure 7.15: The graph shows the user clusters for the user milab135. Each cluster of nodes is a known companies of users. Node size indicates the frequency with which the user has been observed. The edge color indicates user co-occurrence frequency.

95 7.2 Data Analysis Results Cluster Changes Figure 7.16 shows how a particular user changes location and user groups over the course of a day during the active days of the user test. In general the changes of clusters for each of the users are not so frequent especially for the relative location clusters, instead the social clusters have a bit more changes during the day. The reason for this higher frequency of changes can be found in the users locations which is in a crowded university campus. This proves that social context can change even though a user stays in the same location. The social and relative location cluster do not change simultaneously so often. The seen user behavior must be compared with the demographic information about the users. They are students having a reasonable regular daily schedule and if they have forgotten or left their phones at home one day the clusters will also be unchanged. Figure 7.16: The plots display the changes of both location and social groups over the course of the active days for the user survey for a single user. The clusters are given arbitrary labels so it is when the clusters change that should be noted. Finding the home location is also possible.

96 86 Experimental Results Music Results Implemented the collaborative recommendation algorithms it was possible to make personalized recommendations on what other users have listened to. Music neighbors were also found during the process of finding recommended items for each user. Table 7.3 shows the result of the performed collaborative filtering for two users. First the similar users with their similarity measures and second the music recommendations with their predicted ratings. The two users have each other as music neighbors and their similarity is much higher than the other users. With only seven test users the number of music neighbors with high similarity is low, but if more users entered the system the accuracy of the predicted rating would increase. The recommender system only suggests tracks that the user has not listened to before. This can be a problem in small user tests where the same short list of tracks is distributed to the test participants. This means that if the user does not transfer any music to the mobile device then there are probably no tracks to share assuming that all the users listen to all the tracks handed out. Table 7.4 contains a list of music sessions for the user milab182. For each session attributes have been found. The time of day (T.o.D), whether or not the session occurred during the weekend and whether or not the user was moving from one location to another. The last attributes are related to clusters both location and social clusters. The first three are about location. The location has been categorized in to three groups Home, Other Known Location (O.K.L) and Unknown Location(U.L). The Home location is the most observed location cluster for the user. Other known locations are the rest of the significant clusters and the unknown locations are the rest which are out of category. The last two attributes are Known Company(K.Comp) and Unknown Company(U.Comp) which indicate if the social group of the session is known or not. The user has been stationary during all its play sessions Connection Logging During the user test it was logged who was requesting information from each other. These data were used to create a directed graph showing how the participants of the user test were connecting to each other. This directed graph seen in Figure 7.17 reveals that not all of the test participants have met each other during the test. It is worth noting that an arrow means that information is sent from the beginning of the arrow and to the tip. When the users have met their exchanges of data could be not mutual. Some of the users have been more social seen from the perspective of the application.

97 7.2 Data Analysis Results 87 User: milab153 Similar Users Similarity [ 1, 1] User milab milab milab milab182 Recommended Tracks Rating [0, 2] Artist - Track AMPER - COMO ME DUELE Black Eyed Peas - Meet Me Halfway Emilie Autumn - Castle Down Jay-Z - Empire State of Mind (feat. Alicia Keys) User: milab135 Similar Users Similarity [ 1, 1] User milab milab milab milab102 Recommended Tracks Rating [0, 2] Artist - Track Tenacious D - Tribute Green Day - Homecoming Green Day - Jesus of Suburbia Green Day - Give Me Novacaine Table 7.3: Comparing Music neighbors and recommendations for two users. Figures containing the data of the other users are presented in Appendix A.

98 88 Experimental Results S.ID T.o.D Weekend Moving Home O.K.L U.L K.Comp U.Comp 0 Afternoon False False False False True True False 1 Evening False False False False True True False 2 Afternoon True False True False False False True 3 Afternoon True False True False False False True 4 Morning False False True False False False True 5 Morning False False True False False False True 6 Afternoon False False True False False False True 7 Afternoon False False True False False False True 8 Morning False False True False False False True 9 Afternoon False False True False False False True 10 Afternoon False False False False True True False Table 7.4: The user milab182 s list of music sessions. S.ID is the session ID, T.o.D is the time of day, Weekend indicates if it is a weekend or not. Moving is if the session took place in an unstable time window. Home is the most frequency of the known locations. O.K.L is Other Known Location, U.L is Unknown Location. K. Comp and U. Comp are Known Company and Unknown Company respectively. Figure 7.17: A directed graph showing the connections that have been made among the test participants. An arrow indicates that contexts has been sent form from one user to another in the direction of the arrow.

99 Chapter 8 Discussion In the project two different architectural approaches have been created to set up a system that would allow its users to share music and context information. Initially, the decentralized approach was developed. The decentralized approach needed a method to map Bluetooth IDs to global user IDs. The global ID is used to access the context information. This access was previously provided by Nokias gateway so when users were trying to reach the Mobile Web Server of a user they could write a URL which would be rerouted to the current IP address of the Mobile Web Server if it was online. However, when Nokia decided to discontinue the Mobile Web Server beta (see the in Appendix B on page 123), the global lookup of each Mobile Web Server was not possible anymore. It was still possible to test the decentralized concept on a local area network, but to make the system work it has to be assumed that the users were connected to the same WLAN. A way around this could be to try to discover and connect to WLAN so the mobile web server could be online more often. This could be able to log on to unknown WLANs and restart the mobile web server. So the WLAN restriction can be enforced if a controlled test is made. But this is primarily for functional testing because in an actual user test it would be needed that the users could be mobile, otherwise the entire spatial dimension of the collected context data is lost. Conceptually the Nokia Web Server worked well as the context sharing component because the application programmer do not have to think about the sending the appropriate HTTP requests to retrieve and parse the context in-

100 90 Discussion formation. These tasks have been hidden by the implemented component. The mobile system only had to stop providing the sharing interface and just use it to make the mobile nodes work in a centralized system. The context collecting prototype had inherited the layered architecture from the MCT made by Jensen [17] related to collecting the wanted context features. The implementation of the Music adapter was problem-free. The architecture was easy to set up and intuitive to extend. With all the context opportunities the MCT provides some were not used this was done to limit the context dimensions. Even for some of the chosen context adapters information was stripped. The WLAN adapter only returned a list of WLAN IDs even though more information was available. This limited the clustering possibilities but instead of using signal strengths as distance measures the co-occurrence of the WLANs was used which also worked fine when thresholds used to filter out the least occurring WLANs. The mobile web server was shown to be a possible sharing component in a decentralized context sharing system. In the implemented system it was shown as a proof of concept that it could be used as an embedded sharing component instead of sharing at the end user level. The advantages of implementing this decentralized approach was lessened because of the back-end server passively being in the system. If the back-end server handles the sharing instead of only receiving context uploads then the mobile web server components can be excluded. From the results a number of observations about the user behavior was made. Figure 7.4 indicates the best time to make sharing available. On the basis of this user test this is in the period from 13 to 20 because this is where the users were playing most. If this is compared with when the user are in vicinity, then it can be seen that the period of seeing many users is within the play interval. The user observations had a peak from 12 to 13 and an overall high level from 11 to 17 this was seen from Figure 7.7 and Figure 7.8 indicated that the users were moving the most in the evening from 18 to 20. So when services in this domain are designed it is worth to think of services that are relevant in this period of the day. During the project the clustering of WLANs performed many time for different threshold values giving a different amount of clusters. If the threshold were set high then it was only the most significant clusters that were shown. These clusters identify the known locations of the user and the largest cluster in terms of WLAN observations are the home location.if the threshold is lowered, locations that are only seen a few times appears. The clusters appearance reveals information about the locations they represent. The clusters have been observed homogenous and heterogenous indicating what size of area the location has. The size of cluster indicates WLAN density and thus if it is in a city or suburban even though exceptions exist. Similar reasoning can be used for the user clusters which indicate the social

101 91 groups that the user participates in. The user groups indicate the different users kept as company. The temporal observations and the results about both known locations and company are used to categorize the music play session of the users. These give a good overview of the state of the user when he is listening to music. Collaborative filtering was performed by creating an implicit rating scheme to apply to the tracks that the users have listened to. Using the generated user preferences, personalized music recommendations were given. The inferred temporal, location and user group information could potentially make the predicted ratings even more precise assuming that the users actively choose which tracks to listen to and that the music the users listen to is dependent of the context that they are in.

102 92 Discussion

103 Chapter 9 Conclusion In this project a music and context sharing prototype has been implemented and tested in a user survey. First a decentralized system was implemented, but due to unfortunate circumstances regarding the essential decentralized sharing component the Nokia Mobile Web Server it was not possible to test the decentralized system in the user test. The centralized system with the equivalent functionalities was implemented and a user test with seven test participants was run. Relative locations and social groups have been found for the users. Using hierarchical clustering algorithms can reveal powerful knowledge about the data. Collaborative filtering was performed on the collected data and were able to give personalized recommendations. The implicit rating scheme provided the collaborative filtering algorithm with ratings, but the amount of tracks played was small so high precision recommendations were not guarantied. The music sessions of the users were categorized to be able to see for each user when music was played. Having these various inferred information opens up for a range of different mobile end-user applications. Finally the contributions of the author are summarized in the following points. Implementation of two architectures with the equivalent functionalities: a decentralized and a centralized architecture. Prototypes of a music and context information sharing system have been completed.

104 94 Conclusion Proof of techniques for transforming a decentralized context sharing application to a centralized. Discovered patterns in the collected context information which give a clearer view of the state of the user in a given context. Personalized music recommendations and categorizations of user playing sessions. Demonstration of the versatility of Python which has been used in almost all the programming aspects of the project, except the SQL.

105 Chapter 10 Future Work With the large amount of data collected more extensive data analysis could be performed. More analysis of the relative locations, could tell if users equivalent locations can be found. To find the relative clusters a network of locations that multiple users can relate to could be formed. In these matching locations listening pattern of the users could be reviewed to see if location specific music recommendations could be made. Similarly for the music plays from each social group of the test participants. Further the user recommendations could be make using a weighted combination of the features collected based on the observed patterns. This could be done by making a similarity measure for each of the context features. A Latent Sematic Analysis (LSA) module was initiated during the project period but since the social tag data that was extracted from Last.fm s API had to few mood related word, the module was abandoned. Other data extraction methods must be found. In fact interesting findings could have been obtained with LSA, similar to what was presented by Phung et al. [21]. They found rhythms in the user behavior using LSA techniques on words created by concatenating time stamps and the found location labels. The entropy calculation made by Song et al. [28] could have been calculated similarly for the data in this project.

106 96 Future Work

107 Appendix A Results The following figures are appended here for completeness even though the most representative examples have been displayed in the main chapters. User Distinct users Distinct WLANs milab093 Fig. A.1 on p. 98 Fig. A.2 on p. 99 milab102 Fig. A.4 on p. 100 Fig. A.5 on p. 100 milab104 Fig. A.8 on p. 103 Fig. A.9 on p. 103 milab106 Fig. A.12 on p. 106 Fig. A.13 on p. 107 milab135 Fig. A.15 on p. 109 Fig. A.16 on p. 109 milab153 Fig. A.17 on p. 110 Fig. A.18 on p. 110 milab182 Fig. 7.9 on p. 76 Fig on p. 77 Table A.1: Figures related to Users and WLANs Counts for Each User.

108 98 Results User WLAN Clusters User Clusters Cluster Changes milab093 Too few users Fig. A.3 on p. 99 Fig. A.21 on p. 112 milab102 Fig. A.6 on p. 101 Fig. A.7 on p. 102 Fig. A.22 on p. 113 milab104 Fig. A.10 on p. 104 Fig. A.11 on p. 105 Fig. A.23 on p. 113 milab106 Fig. A.14 on p. 108 Fig on p. 80 Fig. A.24 on p. 114 milab135 Fig on p. 84 Fig on p. 81 Fig. A.25 on p. 114 milab153 Fig. A.19 on p. 111 Fig. A.20 on p. 112 Fig. A.26 on p. 115 milab182 Fig on p. 83 Fig on p. 79 Fig on p. 85 Table A.2: Figures related to clusters for each user. Figure A.1: Distinct users discovered per timewindow for user milab093.

109 99 Figure A.2: Distinct WLAN IDs discovered per timewindow for user milab093. Figure A.3: Clusters of WLANs for user milab093.

110 100 Results Figure A.4: Distinct users discovered per timewindow for user milab102. Figure A.5: Distinct WLAN IDs discovered per timewindow for user milab102.

111 Figure A.6: Cluster of users for user milab

112 102 Results Figure A.7: Cluster of WLANs for user milab102.

113 103 Figure A.8: Distinct users discovered per timewindow for user milab104. Figure A.9: Distinct WLAN IDs discovered per timewindow for user milab104.

114 104 Results Figure A.10: Clusters of users for user milab104.

115 Figure A.11: Clusters of WLANs for user milab

116 106 Results Figure A.12: Distinct users discovered per timewindow for user milab106.

117 Figure A.13: Distinct WLAN IDs discovered per timewindow for milab

118 108 Results Figure A.14: Clusters of users for user milab106.

119 109 Figure A.15: Distinct users discovered per timewindow for user milab135. Figure A.16: Distinct WLAN IDs discovered per timewindow for user milab135.

120 110 Results Figure A.17: Distinct users discovered per timewindow for user milab153. Figure A.18: Distinct WLAN IDs discovered per timewindow for user milab153.

121 Figure A.19: Clusters of users for user milab

122 112 Results Figure A.20: Cluster of WLANs for user milab153. Figure A.21: Clusters changing per day for user milab093

Contextion: A Framework for Developing Context-Aware Mobile Applications

Contextion: A Framework for Developing Context-Aware Mobile Applications Elizabeth Williams, Jeff Gray Department of Computer Science, University of Alabama eawilliams2@crimson.ua.edu, gray@cs.ua.edu Abstract