VRIJE UNIVERSITEIT BRUSSEL FACULTEIT WETENSCHAPPEN VAKGROEP INFORMATICA EN TOEGEPASTE INFORMATICA SYSTEMS TECHNOLOGY AND APPLICATIONS RESEARCH LAB STAR Lab Technical Report Benefits of explicit profiling in the IMAGEN project Jan De Bo & Luk Vervenne affiliation keywords profiling, ontology number STAR-2003-03 date 21/01/2003 corresponding author Jan De Bo status accepted reference Proceedings of E-society 2003 (IADIS International Conference) Pleinlaan 2, gebouw G-10, B-1050 Brussel Phone: +32-2-629.3308 Fax: +32-2-629.3525
BENEFITS OF EXPLICIT PROFILING IN THE IMAGEN PROJECT Jan De Bo STAR Lab - Vrije Universiteit Brussel Pleinlaan 2, Building G-10, B 1050 Brussel, Belgium jdebo@vub.ac.be Luk Vervenne STAR Lab Vrije Universiteit Brussel Pleinlaan 2, Building G-10, B-1050 Brussel, Belgium Luk.Vervenne@vub.ac.be ABSTRACT The goal of the Imagen Project (Intelligent Multimedia Application Generator) is to develop an innovative publishing system to produce and deliver personalized multimedia products on demand. Imagen fully automates the complex and time consuming tasks of content selection (Content Manager), layout design (Layout Manager) and rights management (Transaction Manager). In this project STAR Lab has investigated how user profiles, built by means of users own feedback, are being applied in collaborative filtering mechanisms. We believe that search engines based upon collaborative filtering techniques will improve the precision of the retrieved information. KEYWORDS Multimedia, explicit user profiles, collaborative filtering, ontology 1.INTRODUCTION As introduction we will give a high level description of what the Imagen project is about, followed by a more detailed presentation of the innovative aspects of the work performed by STAR Lab. By combining complementary technologies, Imagen (http://www.imagenweb.org) aims to develop an innovative publishing system that automates the complex tasks of content selection (Content Manager), layout design (Layout Manager) and rights management (Transaction Manager). Figure 1 below is a high level description of the Imagen system. Authors create multimedia content (1), which is classified by the Content Manager (2). From each content unit we extract meta data that is stored in a central RDBMS (3). An end user can either download the module implemented by Bar Ilan that creates his profile (4) implicit request or can access the system and choose explicitly the characteristics of the contents he wants to have delivered (5) explicit request. The latter is covered by the STAR Lab module. Unlike the implicit request approach, where a search for multimedia content is based on an implicit profile, which has been created by observing the user s surf behaviour on the Internet, our module is concentrated on building explicit profiles of the end user. These profiles built by our system are based on the feedback the user provides to the system. Therefore they are called explicit profiles. In the next section we go into more detail about how these profiles are built and how they are used for personalising the user s search results. When the request is made by the end user, IMAGEN selects contents from the authors repository and from the Internet (6) when applying the implicit request strategy, whereas our explicit request strategy selects its content from its own repository. The search results are then packed in an IMS compliant package (7), processed and transformed in order to obtain a consistent and uniform layout. The Layout Manager aims at an optimal compromise between the publisher's layout specifications, and constraints imposed by the user's layout preferences and presentation platform (Kröner et al, 2002). The output of the Layout Manager is a multimedia product that is ready for being rendered on the target platform after being delivered by the
Transaction Manager (8). The Transaction Manager module's tasks are the transaction record and the watermarking on multimedia contents. Whenever a multimedia package is received from the Layout Manager, the Transaction Manager processes the order and authorises online payment. It then performs watermarking of the multimedia contents for copyright and licensing requirements and records all relevant information to ensure that royalties are paid by the customer to the respective IP rights owners and to protect multimedia contents from illicit use. Figure 1 2.CONTENT SELECTION EXPLICIT PROFILING 2.1 Fuzzy search and content selection The Content Manager module performs the selection of personalised content units. The Content Manager will enable users to choose between collecting feedback from them (STAR Lab s explicit module) or installing 'a looking over the shoulder' software on their computer in order to develop profiles which capture and characterize their specific interests and preferences (implicit module). A list of selected content is then submitted as a recommendation to the Layout Manager module. In our system people are allowed to perform searches according to some predefined categories. This implicit that all the content units of the repository have to be categorized accordingly. Both, the content provider of the first pilot application and the user consortium decided it would be useful to categorize each content unit according to six predefined classes of extremes. For the MyArt application, situated in the domain of Renaissance Art, the following categories were considered being appropriate. <real imaginary> <erotic non erotic> <secular religious> <joyous melancholic> <spiritual temporal> <passionate platonic> Each content unit is then given weights according to those 6 categories based upon the extracted meta data. When a huge number of content units has to be managed only a small training set of content is rated
manually. All content items of the training set are then read by domain experts and are marked according to the predefined categories. In order to automate the selection process of the remaining content units we used algorithms based on Bayesian inference techniques provided by APR (Applied Psychology Research http://www.youmeus.com). These algorithms were trained very carefully on the manually rated training set by extracting meta data out of these content units and establishing links between this metadata and the marks they were given on the respective scales. Based upon this posterior knowledge the system is now able to compute predictive categorizations of the remaining set of content units. Our software allows users to search for media contents. They may specify their interests by moving slider bars in the appropriate direction between two extreme points of those categories. This way users are no longer obliged to choose between two extremes (erotic vs. not-erotic) on a scale but are allowed to point out everything in between. This results in a search mechanism that is more granular than regular search engines are: because every value between the two ends of a category are allowed we call this a fuzzy search. 2.2 Building profiles using explicit user feedback Once the user has performed a search he is presented with results which he might like or not. He then has the possibility to specify whether he likes the results returned, dislikes them, loves them, hates them etc. These ratings are used to set up a user s profile. Because a profile is built by asking the user explicitly to judge the search results returned we call this an explicit profile, unlike the implicit profiles of Bar Ilan University which were developed by observing the user s internet behaviour (Halamish M. and Kraus S. 2002). 2.3 Collaborative filtering The system also foresees in a feedback mechanism for the user logged in, based on the profiles of all registered users in the system. The feedback mechanism makes use of collaborative filtering techniques and can thus produce personal recommendations by computing the similarity between your preference and the one of other people (Resnick et al, 1994). The main idea of collaborative filtering is to automate the process by which people recommend products or services to one another. If you need to choose between a variety of options with which you do not have any experience, you will often rely on the opinions of others who do have such experience. However, when there are thousands or millions of options, like in the Web, it becomes practically impossible for an individual to locate reliable experts that can give advice about each of the options. By shifting from an individual to a collective method of recommendation, the problem becomes more manageable. Instead of asking opinions to each individual, you might try to determine an average opinion for the group. This, however, ignores your particular interests, which may be different from those of the average person. You would rather like to hear the opinions of those people who have interests similar to your own. The mechanism behind our collaborative filtering system is the following: a large group of people s preferences are registered a subgroup of people (neighbours) is selected whose preferences are similar to the preferences of the person who seeks advice a weighted average of the preferences for that subgroup is calculated the preferences resulting from the previous calculation are used to recommend options on which the advice-seeker has expressed no personal opinion yet. By gathering people with similar tastes, it is most probable that options highly valued by that peer group will also be appreciated by the advice-seeker. The typical application of collaborative filtering systems is the recommendation of books, music CDs, movies, or in our case, content units concerning Renaissance Art. The main bottleneck with existing collaborative filtering systems is the collection of preferences (Shardanand & Maes, 1995). To be reliable, the system needs a very large number of people (typically thousands) to express their preferences about a relatively large number of options (typically dozens). This requires quite a lot of effort from a lot of people. Since the system only becomes useful after a huge number of opinions has been collected, initially people will not be very motivated to express detailed preferences when the system cannot yet help them. One way to avoid this start-up problem, however not applicable to our situation because we only encounter preferences given explicitly by other people, is to collect preferences
that are implicit in people s actions (Nichols, 1998). For example, people who order books from an Internet bookshop implicitly express their preference for the books they buy over the books they do not buy. Customers who have bought the same book are likely to have similar preferences for other books as well. This principle is applied by the Amazon online bookshop, which for each book offers a list of related books that were bought by the same people. 2.4 Linking to ontologies However not yet applicable, it will be interesting for STAR Lab in a later stage to build an ontology from the domain knowledge of the first pilot application MyArt; namely Renaissance Art Our definition of an ontology is : An ontology is a shared and agreed conceptualisation of domain knowledge (Guarino et al.,1999).the ontology can then be created by annotating the different pages of the repository and mining concepts and relations from them. The concepts and their mutual relations form the ontology base. Search requests, comparable to the ones we perform now in Imagen, could then be seen as queries on the ontology. We had a similar exercise in the EU project Namic (IST-1999-12392) last year, where people were able to set up user profiles by defining queries on the ontology (De Bo et al., 2002) and perform searches against these profiles. It would be very interesting for us to compare the precision and recall of the results which we obtain after applying both kind of search methods. 3.CONCLUSION Our belief is that recommendation systems based upon explicit feedback mechanisms, will be more appreciated by other people, because it still remains an open question if people will like the idea of having a plug-in in their browser that monitors their surf behaviour while they are surfing the internet. On the other hand, our software asks for more human interaction effort, which might be a bit unpleasant in the beginning because users will not immediately enjoy results from their efforts. Nevertheless we are convinced that once the system has gathered enough users preferences our module will return more precise results than the implicit module does. ACKNOWLEDGEMENT First of all I would like to thank Dr. Peter Spyns for carefully reading this paper and providing me with some valuable feedback, as well the consortium partners for their supportive discussions while setting up the architecture of the project. Parts of this research have been funded by the Imagen Project (IST-1999-13123). REFERENCES De Bo J. et al, 2002, Ontology-based author profiling of documents; in workshop Proceedings Event Modelling for Multilingual Document Linking(LREC 2002), Gran Canaria, Canary Islands, pp. 23-28 Guarino N et al, 1999, OntoSeek: Content-Based Access to the Web, IEEE Intelligent Systems, 70-80 Halamish M. and Kraus S., 2002, Learning Users Interests for Providing Relevant Information, Proceedings of ABIS, Hanover, Germany, pp. 59-66. Kröner A. et al, 2002, Managing Layout Constraints in a Platform for Customized Multimedia Content Packaging, Proceedings of the Working Conference on Advanced Visual Interfaces AVI, Trento, Italy, pp 89-93,. Nichols D.M., 1998, Implicit Rating and Filtering, Proc. Fifth DELOS Workshop on Filtering and Collaborative Filtering, Budapest, Hungary, pp. 10-12. Resnick P et al., 1994, GroupLens: An open architecture for collaborative filtering of netnews, Proceedings of ACM, 1994 Conference on Computer Supported Cooperative Work, Chapel Hill, NC: ACM, pp. 175-186. Shardanand U. and Maes P., 1995, Social Information filtering: Algorithms for automating word of mouth, Proceedings of CHI'95 Human Factors in Computing Systems, Denver, Colorado, U.S.A, pp. 210-217