International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

Similar documents
Proposed System. Start. Search parameter definition. User search criteria (input) usefulness score > 0.5. Retrieve results

Search Result Diversification

INTERSOCIAL: Unleashing the Power of Social Networks for Regional SMEs

University of Delaware at Diversity Task of Web Track 2010

Exploiting the Diversity of User Preferences for Recommendation

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

Efficient Diversification of Web Search Results

5. Novelty & Diversity

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

Microsoft Research Asia at the Web Track of TREC 2009

Scalable Diversified Ranking on Large Graphs

Current Approaches to Search Result Diversication

A Survey On Diversification Techniques For Unabmiguous But Under- Specified Queries

GrOnto: a GRanular ONTOlogy for Diversifying Search Results

The 1st International Workshop on Diversity in Document Retrieval

Inferring User Search for Feedback Sessions

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

mnir: Diversifying Search Results based on a Mixture of Novelty, Intention and Relevance

A Survey on k-means Clustering Algorithm Using Different Ranking Methods in Data Mining

Inverted Index for Fast Nearest Neighbour

A Class of Submodular Functions for Document Summarization

An Analysis of NP-Completeness in Novelty and Diversity Ranking

Topic Diversity Method for Image Re-Ranking

Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

Search Engine Architecture. Hongning Wang

Flight Recommendation System based on user feedback, weighting technique and context aware recommendation system

Result Diversification For Tweet Search

DivQ: Diversification for Keyword Search over Structured Databases

Diversification of Query Interpretations and Search Results

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

A Survey on Keyword Diversification Over XML Data

present the results in the best way to users. These challenges reflect an issue that has been presented in different works which is: diversity of quer

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

Spatial Index Keyword Search in Multi- Dimensional Database

{david.vallet,

A New Technique to Optimize User s Browsing Session using Data Mining

An Efficient Methodology for Image Rich Information Retrieval

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

Final Exam Search Engines ( / ) December 8, 2014

Learning Socially Optimal Information Systems from Egoistic Users

Towards Rule Learning Approaches to Instance-based Ontology Matching

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM

Case-based Recommendation. Peter Brusilovsky with slides of Danielle Lee

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary

Hierarchical Online Mining for Associative Rules

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Tag Based Image Search by Social Re-ranking

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Exploiting the Diversity of User Preferences for Recommendation. Saúl Vargas and Pablo Castells {saul.vargas,

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Measuring Diversity of a Domain-Specic Crawl

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

I. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80].

Supporting Fuzzy Keyword Search in Databases

Microsoft Research Asia at the NTCIR-10 Intent Task

TREC 2017 Dynamic Domain Track Overview

Keywords Data alignment, Data annotation, Web database, Search Result Record

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

A Comparative Analysis of Cascade Measures for Novelty and Diversity

Clustering Based Diversity Improvement in Top-N Recommendation

Capturing User Interests by Both Exploitation and Exploration

Social Data Exploration

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

R. R. Badre Associate Professor Department of Computer Engineering MIT Academy of Engineering, Pune, Maharashtra, India

GRID SIMULATION FOR DYNAMIC LOAD BALANCING

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Building Rich User Profiles for Personalized News Recommendation

Ontology-Based Web Query Classification for Research Paper Searching

Diversity based Relevance Feedback for Time Series Search

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface

Adaptive and Personalized System for Semantic Web Mining

A Survey on improving performance of Information Retrieval System using Adaptive Genetic Algorithm

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

Improving Aggregated Search Coherence

Daund, Pune, India I. INTRODUCTION

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Structure of Association Rule Classifiers: a Review

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

A Joint Optimization Approach for Personalized Recommendation Diversification

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

IRCE at the NTCIR-12 IMine-2 Task

INTRODUCTION. Chapter GENERAL

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Extracting Summary from Documents Using K-Mean Clustering Algorithm

Diversity in Recommender Systems Week 2: The Problems. Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon

REDUNDANCY REMOVAL IN WEB SEARCH RESULTS USING RECURSIVE DUPLICATION CHECK ALGORITHM. Pudukkottai, Tamil Nadu, India

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

DATA MINING II - 1DL460. Spring 2014"

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Recommender Systems: User Experience and System Issues

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Transcription:

DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE Geetanjali Mohite 1, Prof. Gauri Rao 2 1 Student, Department of Computer Engineering, B.V.D.U.C.O.E, Pune, Maharashtra, India 2 Associate Professor, Department of Computer Engineering, B.V.D.U.C.O.E, Pune, Maharashtra, India ABSTRACT: Data sharing is the key cause for the burst in web usage in today's context. Users have become large producers of diverse data which can be stored in data spaces distributed in different systems. Thus the data sharing and then searching for data amongst such diversified data distributed across many system has become difficult. In the scenario of huge distribution of users and system for the diversified data many approaches were proposed. In this paper we have discussed the Gossip Based recommendation approach for searching the useful data for the user. Further we have suggested enhancement in the approach for more relevant search result and with efficiency of the approach measured across the usefulness of the result with better performance. Keywords: Usefulness Score, Space Partitioning and Probing, Diversification, Distributed Environment [1] INTRODUCTION Data diversification has recently attained substantial attention because of increased user confidence in Recommender Systems (RS) due to which satisfaction has improved amongst users, as well as in online and backend search. The huge amount of information available on web creates the need for developing methods towards selecting and presenting to the user specific subgroups. Recently, data diversification has sought considerable attention as a way of increased user satisfaction. Data diversification has different forms consisting of selecting items so that their novelty, coverage, or content dissimilarity is maximized[6]. Existing approaches to data diversification is divided into two categories 1. Greedy heuristics 2. Interchange heuristics Greedy heuristics (e.g., [11]) creates a diverse dataset incrementally, considering one item at a time so that some distance function is maximized, whereas interchange heuristics (e.g., [10]) start from a random initial dataset and takes effort to improve it. Applying indexing to diversification is also an approach proposed by many researchers. One such approach is, a Dewey-based tree which is used for structure based diversity, which uses priorities of attribute and the second approach is 33

DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE spatial indexing which exploits the location of nearest neighbors of an item that are the most far away to each other. In spite of the immense interest in diversification in recent years, most previous researches study and address the static nature of the problem, which is, the available items out of which a diverse subset is selected which do not change with time. As a solution to the above mentioned challenges a simple solution to data sharing is offered by distributed search and recommendation. To be specific it is gossip-based search and recommendation where every user constructs a cluster of "relevant" data that will be employed in the processing of queries. However, considering only usefulness introduces a significant amount of data duplicity among users. In the system when a query is submitted, As the user profiles in each user's cluster are quite similar, the probability of retrieving the same set of relevant items increases, and recall results are limited. Thus a modified version of the gossip based recommendation is discussed in this paper where for enhancing the relevance of the data retrieved the "usefulness" score is introduced and for performance efficiency the space partitioning and probing algorithm is used in conjunction. [2] EXISTING SYSTEM OVERVIEW Recommender Systems (RS) is considered as a reference to guide users in the task of speedily browse/explore large product space, assisting users to identify interesting products in an optimized way. However, common RS usually do not provide diverse results though it is considered that diversity is a required feature. The study of diversity aware RS has become an important research area in recent years, inspired from diversified solutions for Information Retrieval (IR). Diversity is a concept that has been applied in many fields; mostly with the goal of obtaining a set of objects that have a high level of dissimilarity between them, and that as a group, maximize a quality criterion. However, there is usually a trade-off between diversity and quality. Hence, the diversification problem is how to choose k elements from a set that maximizes diversity at a low quality sacrifice. Diversification approaches for both RS and IR can be classified as implicit or explicit. In IR, implicit approaches infer that by selecting dissimilar documents the diverse query aspects will be indirectly covered. The method MMR (Carbonell et al. 1998) is a classic example that aims to maximize relevant novelty : weighted linear combination 484 of relevance and novelty (novelty is defined as dissimilarity from previously selected documents). In contrast, explicit approaches directly attempt to cover different query aspects or sub-topics. IA-Select (Agrawal et al. 2009) and xquad (Santos et al. 2010) are examples of explicit approaches. In addition, (Zheng et al. 2012) proposed strategies to specify coverage functions of query sub-topics that serve as a basis for their diversification solution. [3] COMPAARATIVE ANALYSIS The comparison of different approaches has been done against the below parameters in this paper. (a) Greedy optimization means that the query should be result hungry and the extensive search should be performed for the expected result.

(b) Explicit approach should propose solution directly attempting to cover the diverse aspects of the query/user profile. (c) Implicit approach means the proposed solution explicitly prevent redundancy within the results. (d) Control of diversity vs. Relevance trade-off asks question that is there a control parameter that can tune the diversity vs. relevance trade-off. (e) Encourages discovery identifies that does the proposed approach not penalize novel/serendipitous items. (f) Control of exploitation vs. Exploration trade-off answers that is there a control parameter that can tune the exploitation vs. exploration trade-off. Below Table shows the comparison of the approaches proposed previously: 1 2 3 4 5 6 7 8 Greedy Optimization Y Y Y Y Y Y Y N Explicit Approach N Y Y Y N N Y N Implicit Approach Y N N N Y Y N N Control of diversity vs. relevance trade-off Y N Y Y Y Y - - Encourages Discovery - N N N - - N - Control of exploitation vs. exploration trade-off N N N N N N N N 1 - (Carbonell et al. 1998) 2 - (Agrawal et al. 2009) 3 - (Santos et al. 2010) 4 - (Zheng et al. 2012) 5 - (Smyth et al. 2001) 6 - (Ziegler et al. 2005) 7 - (Vargas 2012) 8 - (Adomavicius et al. 2009) Table 1 Comparison of different approaches [4] OUR APPROACH - USEFULNESS SCORE BASED EXPLORATION OF DIVERSIFIED DATASET In the existing Web world there are numerous systems where user from diverse location and with diverse interest has facility to share data and the users have become heavily dependent on the Web for relevant information search. Introduction of cloud has added more distribution and diversification to the dataset the search engines has to navigate to extract relevant data as per the user search query. For the discussion of our modified approaches methodology we have considered the Real Estate Data set where the Diversification parameters in the consideration will be Cost, Area, Location and Property type. If Q is the set of all possible queries (all the combinations of terms), and the probability that a user v can return at least one relevant item given a random query q out of Q. In the following, we first define the coverage with respect to User Set. Then, based on coverage, we express the usefulness of a user v with respect to the other users in the user set. 35

DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE For gossip based recommendation approach we need to have a set of registered users say U- Set. The user profiles should be such that the coverage probability is maximized. Thus a strategy for maximized coverage probability will be devised. For the usefulness score Given u's from U-Set, the usefulness of a user profile v is the probability that it can return relevant items for a random query q, that could not be returned by other users in u's U-Net. The usefulness score should also consider relevance. The usefulness score will be provided by the user Then the useful U-Set Clustering should happen for which we are using Space Partitioning and Probing mechanism where, Bounded diversification with sorted access methods is introduced for the first time and defined formally. The Pull/Bound Maximum Marginal Relevance (PBMMR) family of algorithms will be used, which exploits spatial probing locations and the adaptive alternation of usefulness score-based and distance-based access to reduce the number of fetched objects. An instance of PBMMR, called Space Partitioning and Probing (SPP), is presented, whose pulling strategy uses a tight upper bound. SPP is shown to attain the same diversification quality and exactly the same output as MMR, the most popular result diversification algorithm, but accessing only a fraction of the objects. Data Owne 1. Divers ified Datas Submits Data To be available for Search With Web Server 1. Searc h Engin U s e Search Results Provided Search User Search Query is formed and Pushed to Modified Usefulness Score Sent Figure. 1 System architecture The architecture used to demonstrate the working of our system is distributed web based i.e. the diversified dataset will reside on the centralized web server with the web based application also hosted on the cloud web server. The user will be able to access the hosted site from any cloud enabled environment. The search query will be submitted by user and the Query engine will interpret the query and based on logged in users group and parameters like income and area the usefulness score will be derived and search results will be provided to users. for the search results

the user will have the facility to specify the usefulness score and thus the usefulness score will be recomputed and persisted in database for next search query by other users in the same group. REFERENCES [1] Adomavicius, G., & Kwon, Y., Toward more diverse recommendations: Item re-ranking methods for recommender systems. InWorkshop on Information Technologies and Systems. (2009, December). [2] Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (pp. 5-14). ACM, (2009, February) [3] Carbonell, J., & Goldstein, J., The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335-336). ACM, (1998, August). [4] Drosou, M., & Pitoura, E. Search result diversification. ACM SIGMOD Record, 39(1), 41-47, (2010) [5] Haritsa, J. R., The KNDN Problem: A Quest for Unity in Diversity. IEEE Data Eng. Bull., 32(4), 15-22, (2009). [6] Santos, R. L., Macdonald, C., & Ounis, I., Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on World wide web (pp. 881-890). ACM, (2010, April) [7] Smyth, B., & McClave, P. Similarity vs. diversity. In Case-Based Reasoning Research and Development (pp. 347-361). Springer Berlin Heidelberg, (2001). [8] Vargas, S., Novelty and Diversity Enhancement and Evaluation in Recommender Systems. MSc. diss., Department of Ingeniería Informática, Universidad Autónoma de Madrid, Spain. (2012). [9] Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Yahia, S. A., Efficient computation of diverse query results. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on (pp. 228-236). IEEE. (2008, April). [10] Yu, C., Lakshmanan, L., & Amer-Yahia, S. It takes variety to make a world: diversification in recommender systems. In Proceedings of the 12th international conference on extending database technology: Advances in database technology (pp. 368-378). ACM. (2009, March). [11] Ziegler, C. N., McNee, S. M., Konstan, J. A., & Lausen, G. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web (pp. 22-32). ACM. (2005, May) [12] Zheng, W., Wang, X., Fang, H., & Cheng, H., Coverage-based search result diversification. Information Retrieval, 15(5), 433-457, (2012). 37