User accesses business site. Recommendations Engine. Recommendations to user 3 Data Mining for Personalization

Similar documents
Spatial Processing using Oracle Table Functions

Efficient Processing of Large Spatial Queries Using Interior Approximations

Development of Efficient & Optimized Algorithm for Knowledge Discovery in Spatial Database Systems

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

Analysis and Extensions of Popular Clustering Algorithms

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Clustering for Mining in Large Spatial Databases

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Chapter 1, Introduction

An Efficient Approach for Color Pattern Matching Using Image Mining

Best Keyword Cover Search

Handling Multiple Instances of Symbols in. Pictorial Queries by Image Similarity

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

Video Representation. Video Analysis

C-NBC: Neighborhood-Based Clustering with Constraints

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM

Web page recommendation using a stochastic process model

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Research on outlier intrusion detection technologybased on data mining

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

Clustering Algorithms for Data Stream

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Closest Keywords Search on Spatial Databases

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches

Indexing Non-uniform Spatial Data

Ecient Parallel Data Mining for Association Rules. Jong Soo Park, Ming-Syan Chen and Philip S. Yu. IBM Thomas J. Watson Research Center

Data warehouse and Data Mining

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Clustering Large Dynamic Datasets Using Exemplar Points

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

Pictorial Queries by Image Similarity

CONTENT BASED IMAGE RETRIEVAL SYSTEM USING IMAGE CLASSIFICATION

rule mining can be used to analyze the share price R 1 : When the prices of IBM and SUN go up, at 80% same day.


Jarek Szlichta

Mining Quantitative Association Rules on Overlapped Intervals

A Super-Peer Selection Strategy for Peer-to-Peer Systems

CS412 Homework #3 Answer Set

DESIGNING AN INTEREST SEARCH MODEL USING THE KEYWORD FROM THE CLUSTERED DATASETS

Association Rules Mining using BOINC based Enterprise Desktop Grid

Structure-Based Similarity Search with Graph Histograms

PATENT DATA CLUSTERING: A MEASURING UNIT FOR INNOVATORS

Income < 50k. low. high

Algorithm for Efficient Multilevel Association Rule Mining

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

OPTICS-OF: Identifying Local Outliers

Data Mining Algorithms

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

(a) DM method. (b) FX method (d) GFIB method. (c) HCAM method. Symbol

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Proc. of the 1st Int. Wkshp on Image Databases and Multi Media Search, pp. 51{58, Handling Multiple Instances of Symbols in

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Association Rule Mining. Introduction 46. Study core 46

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

An Improved Apriori Algorithm for Association Rules

Clustering in Big Data Using K-Means Algorithm

Fast Similarity Search for High-Dimensional Dataset

The Fuzzy Search for Association Rules with Interestingness Measure

Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web

Clustering part II 1

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

A Framework for Clustering Massive Text and Categorical Data Streams

Distributed k-nn Query Processing for Location Services

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

Mining Web Data. Lijun Zhang

DMQL: A Data Mining Query Language for Relational Databases. Jiawei Han Yongjian Fu Wei Wang Krzysztof Koperski Osmar Zaiane

Spatial Index Keyword Search in Multi- Dimensional Database

Mining Web Data. Lijun Zhang

Data Clustering With Leaders and Subleaders Algorithm

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

MOBILE VIDEO COMMUNICATIONS IN WIRELESS ENVIRONMENTS. Jozsef Vass Shelley Zhuang Jia Yao Xinhua Zhuang. University of Missouri-Columbia

Knowledge Discovery in Spatial Databases

SPATIAL RANGE QUERY. Rooma Rathore Graduate Student University of Minnesota

D B M G Data Base and Data Mining Group of Politecnico di Torino

Benchmarks Prove the Value of an Analytical Database for Big Data

Balanced COD-CLARANS: A Constrained Clustering Algorithm to Optimize Logistics Distribution Network

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

Temporal Weighted Association Rule Mining for Classification

Data Mining Concepts & Tasks

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Similarity Search in Time Series Databases

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

CS570: Introduction to Data Mining

Efficient Spatial Query Processing in Geographic Database Systems

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Non-Hierarchical Clustering with Rival. Retrieval? Irwin King and Tak-Kan Lau. The Chinese University of Hong Kong. Shatin, New Territories, Hong Kong

9. Conclusions. 9.1 Definition KDD

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

Data Mining Course Overview

Maximum Average Minimum. Distance between Points Dimensionality. Nearest Neighbor. Query Point

Recommendation on the Web Search by Using Co-Occurrence

Knowledge Discovery and Data Mining

Transcription:

Personalization and Location-based Technologies for E-Commerce Applications K. V. Ravi Kanth, and Siva Ravada Spatial Technologies, NEDC, Oracle Corporation, Nashua NH 03062. fravi.kothuri, Siva.Ravadag@oracle.com Abstract Tailoring web-pages to dierent user characteristics such as location, preferences and previous history (page-hits, products bought) have been shown to be eective tools for personalizing webcontent. In this paper, we briey summarize the techniques in these state-of-the-art personalization technologies. We rst describe personalization using user preferences or history and then describe personalization based on user's current location. Whereas the former is applicable for deployment in web-sites, the latter is useful in providing location-based content to mobile users and wireless applications. 1 Introduction In the traditional business industry, database marketing enables the identication of prospective customers who are likely to respond to mass product mailings. Instead of randomly mailing product listings to prospective customers, database marketing uses parameters such as demographics, recent purchase history, and customer proles. In the case of e-businesses, similar techniques can be applied using personalization techniques. Purchase history is kept track by the business web-sites, products that could be interesting to the user are kept track using the page-hit history, along with user-prole characteristics such as location, and income. Several e-businesses employ such personalization techniques for successful marketing: Amazon.com points a customer to related items whenever the customer searches for a specic item and accesses the corresponding web-page. Likewise demographics can be used to successfully infer user's characteristics. For example, Carrier Corp. of Connecticut uses customer's zip-code to market specic products to customers. The zip-code and other address data are used to infer whether the customer comes from an auent neighborhood or not and product recommendations are then inferred. In some cases, the address of the customer may not be of importance. Instead, the current location of the user may be used for personalization. This is especially true in the case of mobile users. The wireless service provider automatically identies the location of the user and could provide dynamically rendered web content showing places and features of interest at that specic location. For example, ads on movie theaters, restaurants and other common places of interest may be displayed whenever the customer uses the mobile device to retrieve stock quotes or other information. In what follows, we rst summarize the techniques behind personalization technology and then describe how location can be used both for mobile and non-mobile e-customers. 1

User Profiles Content Profiles User accesses business site Recommendations Engine (Mining for Personalization) Recommendations to user Figure 1: Personalization Components for E-commerce. 2 Personalization for E-commerce In this section, we present a general overview of the e-business architecture to enable personalized marketing. Detailed explanations and models are presented in [4]. Most E-businesses maintain three components: (1) User proles, (2) Content Categories, and (3) Business rules as depicted in Figure 1. We describe each of these in more detail. User Proles User proles capture the characteristics/interests of the customer. These typically include demographics such ascity, address of residence, job prole, specic interests etc. This information could be leveraged to categorize a new customer and lead him to a tailored-website for that userprole. In addition, user proles could be implicitly updated based on the user behavior e.g., pages/products accessed by the user etc. In some cases, the users may not register with the business and hence there may not be have any previous history. Such users are referred to as visitors (whereas others are referred to as customers). The recommendations for visitors are made using the information from the current session and any demographic data, if available. Content Prole Content proles keep all data about the content: this includes the information associated with the content (e.g., the composer, singer for a song item) as well as the business value for the content such as the number of items in stock, number of times the product is considered (website accessed) and the number of items for this product sold. These proles enable analysis by the business and allow them to identify whether a product needs to be discounted, or if the market has caught up for the item (or if the item is under-stock) if it is be removed from the discounted list. Business Recommendations/Rules Engine Using the user proles and content proles, the businesses apply data mining techniques [10, 11] to identify appropriate business rules. These rules could involve a simple classication of the users using their proles and the website click-streams, association between content proles and user behavior, or association between dierent products (e.g., customer buying \Concrete Mathematics" is highly likely to also buy \The Art of Computer Programming"). In what follows, we describe these data mining techniques in more detail. 3 Data Mining for Personalization Classical data mining techniques include classication of users, nding associations between dierent product items or customer behavior, and clustering of users. Implicit concept generalization techniques 2

such as updating the content proles for a brand name, when a customer buys a product from that brand name also need to be applied. Details on this can be found in [] and will not be dealt with here. Next, we describe the main data mining techniques for use in personalization. 3.1 Clustering Given a set of users (or proles), and their buying patterns, clustering combines the users into clusters which can later be used for prediction and personalization for prospective customers. Each product represents a separate dimension and the preferences of each user (for all the products) constitute a pointvector in a multi-dimensional product space. Identifying groups of users that select similar products then transforms to nding closely-knit clusters in the multi-dimensional product space. Several algorithms such as k-mean, k-medoids, CLARANS, and BIRCH have been proposed for such clustering. If the number of products is high, the corresponding high-dimensional clustering techniques could be ineective. For such cases, dimensionality-reduction techniques [13, 9, 3] could be employed. 3.2 Classication Classication categorizes the customers based on their user proles and their product-access behavior. A set of pre-classied customers are used to train the classication network. Classication is performed using one of the following popular techniques: decision trees, k-nearest neighbor, or neural networks. Most commercial data mining engine vendors such as Oracle and IBM provide on or more of these techniques. 3.3 Association Rules Association rules identify the interest (purchase of access) in more than one item in the same transaction. For example, students buying one book are also highly likely to buy another since both the books are more complimentary in most graduate course studies. An association rule is a condition of the form X implies Y where X and Y are two product items. The support for the rule is the fraction of transactions that contain both X and Y and the condence for the rule is the fraction of transactions containing X, which also contain Y. The business sets the support and condence and the underlying mining engine identies all interesting associations that satisfy the specied support and condence measures. Such identication is performed using several well-known algorithms such asa priori [18, 17] etc. The identied associations can be then used to market item Y whenever a customer buys item X. 4 Location in E-commerce Location plays a vital role in both personalization (as described above) as well as an enabling component. As an enabling component, location is used to identify interesting stores in his location using a mapquestlike locator service. For example, a customer could identify the nearest store sites within a specied distance from his current location. More details on this aspect could be found in recent spatial database research [1]. Next, we examine the role of location in personalization and wireless e-commerce. For most businesses, customer location is a frequently used attribute in personalization of the content. For example, customers from the Southern states may have special preferences such as country music CDs. To detect such hidden relationships between location and customer behavior, location could be explicitly requested from the customer at the time of registration (or implicitly inferred from the customer's behavior). Given a training sample of customers, location categorized to zip-code, city, county, and state levels can be used to classify customers [7, 6, 5]. Dierent types of classication with and without priority to location can be performed. Likewise, location could also be used in clustering when there are no pre-classied training samples [8, 20]. Since customers who are likely to be close to each 3

User s location User s business of interest (within specified distance) Database of Businesses Data Spatial Indexes Business Links Business map it links Finally,user makes online reservations or transactions Figure 2: Using Location to Identify Businesses of interest In Mobile User's Region: Wireless service provider uses the customer's current location along with business type to identify interesting businesses (e.g. restaurants, movie theaters of interest) within a specied radius using its business database. User makes online reservations or transactions by going to the identied links. other may exert inuence on one another, [16, 2] describe new regression models that incorporate spatial autocorrelation in prediction. Wireless service providers can provide location-based services such as nearest malls, theaters, restaurants, from the location of a mobile user. As illustrated in Figure 2, location-based services can be combined with e-business applications to serve mobile users. As part of the location-based services, the wireless provider maintains the database of e-businesses along with their locations. The data is indexed using spatial indexes such as Quadtrees or R-trees [14, 15, 19] to eciently retrieve businesses within a specied distance from user-specied location. Partitioned local indexes could be constructed to restrict the spatial searches to businesses of the user-specied type. For example, all restaurants could be stored in a separate partition and all theaters as a separate partition. Having local-partitioned indexes for such partitions [12] would ensure only restaurants are searched when the user species the business typetobe \restaurant". This approach ensures fast searching for business that satisfy user-constraints within the vicinity of a mobile user. 5 Conclusions In this paper, we summarized the techniques for personalization of e-business sites. We described several approaches for such personalization using user's location as an input parameter. Such location-based personalization is quite useful in wireless applications where the location is the current mobile location of the user or in conventional static applications where the location refers to the address of the user. References [1] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R* tree: An ecient and robust access method for points and rectangles. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 322{331, 1990. [2] S. Chawla, S. Shekhar, W. Wu, and U. Ozesmi. Extending data mining for spatial applications: A case study in predicting nest locations. In ACM SIGMOD Workshop in Research issues in Data Mining and Knowledge Discovery, Dallas, TX, 2000. [3] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, and W. Equitz. Ecient and eective querying by image content. In Journal of Intelligent Information Systems, volume 3, pages 231{ 262, 1994. [4] K. Instone. Information architecture and personalization. In White paper: Argus Associates, 2000. [5] K. Koperski, J. Adhikary, and J. Han. Geominer: A system prototype for spatial data mining. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ, 1997. 4

[6] K. Koperski, J. Han, and J. Adhikary. Mining knowledge in geographical data. In Commun. ACM. [7] K. Koperski, J. Han, and N. Stefanovic. An ecient two-step method for classication of spatial data. In Symposium on Spatial Data Handling, Vancouver, 1998. [8] R. Ng and J. Han. Ecient and eective clustering methods for spatial data. In Proc_of the Int. Conf. on Very Large Data Bases, Santiago, Chile, 1994. [9] R. Ng and A. Sedighian. Evaluating multi-dimensional indexing structures for images transformed by principle component analysis. Proc. of the SPIE, 2670:50{61, 1994. [10] I. Press. Ibm intelligent miner. In IBM Documentation, 2001. [11] O. Press. Oracle personalization. In Oracle Documentation, 2001. [12] O. Press. Partitioning in oracle. In Oracle Documentation, 2001. [13] K. V. Ravi Kanth, D. Agrawal, Amr El Abbadi, and Ambuj K. Singh. Dimensionality reduction for similarity searching in dynamic databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998. [14] K. V. Ravi Kanth, Siva Ravada, J. Sharma, and J. Banerjee. Indexing medium-dimensionality data in oracle. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1999. [15] H. Samet. Recent developments in linear quadtree-based geographic information systems. Image and Vision Computing, 5(3):187{197, Aug. 1987. [16] S. Sekhar and Y. Huang. Discovering spatial co-location patterns: A summary of results. In Symposium on Advanced Spatial and Temporal Databases, Redondo Beach, CA, 2001. [17] R. Shrikant and R. Agarwal. Mining generalized association rules. In Proc_of the Int. Conf. on Very Large Data Bases, 1995. [18] R. Shrikant and R. Agarwal. Mining quantitative association rules in large relational tables. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1996. [19] F. Wang. Relational-linear quadtree approach for two-dimensional spatial representation and manipulation. IEEE Trans. on Knowledge and Data Engineering, 3(1):118{122, Mar. 1991. [20] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: A new data clustering algorithm and its applications. Journal of Data Mining and Knowledge Discovery, 1(2):141{182, 1995. 5