VisoLink: A User-Centric Social Relationship Mining
|
|
- Maximilian Perry
- 5 years ago
- Views:
Transcription
1 VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract. With the popularity of Web 2.0 websites, online social networking has thriven rapidly over the last few years. Lots of research attention have been attracted to the large-scale social network extraction and analysis. However, these studies are mostly beneficial to sociologists and researchers in the area of social community studies, but rarely useful to individual users. In this paper, we present a friends ranking system - visolink which is a personal social network analysis service based on user s reading and writing interest. In order to provide a better understanding to user s personal network, a weighted personal social representation and visualization are proposed. Our system prototype shows a much more user friendly design on personal networks than the classical node-edge distance based network visualization. Key Words: Web mining, Social network, User centric 1 Introduction Writing blogs, sharing photos and videos are the most popular user behaviors on the Web. In the past two years, Web 2.0 brought lots of user participation onto Internet, especially in the area of social networking. Millions of users are contributing contents including texts, pictures and videos to the social network sites. These huge amounts of contents and user activity patterns on the Web become a great source for social network analysis and Web data mining. Recently, researchers from computer science and sociology have been attracted to computational social networking study [2] [4] [5]. With the number of participants in online social networks increasing dramatically, for managing social relationships online, a common feature from the current online social networking sites is to provide users a linear Friend List. The problem with this list is that while the number of contacts increases, users hardly find out the most important friends in the list. One proposed solution from Anthony Dekker is to define the distance function between network entities based on the frequency of the communications of the user with other friends [1]. However, traditional daily communications is hard to be captured and recorded without a mechanism.
2 Blog-based social networking sites are content intensive. Most of the content reflects author s opinions and interests. From the computer science perspective, it contains much less noise data to mine user s interest. Our research motivation is to employ the latest Web Mining techniques to provide users a better way to manage their online social relationships. The proposed framework ranks user s friends based on their online reading and writing interest. In our system prototype, visolink also provides a user friendly graphical interface to present personal network. 2 Related Work Social network analysis mainly analyzes the relationships between people or groups of people within the social networks. Generally, a social network is computationally represented by a node-edge undirected graph. Most of the study in social network analysis use binary relationship representation. In [1], conceptual distance is considered in the social network analysis. The edge distance between every two entities in the social network, represents the closeness between two entities in the network. The link value is simply obtained by times of communication between two entities from daily life. For example, the value is assigned to 1.0, if the communication occurs every day; 0.6, if occurs once per week. It can be easily seen that the frequency of daily life communication is hard to be captured without a mechanism. Because of the popularity of blog, interest similarity measure between bloggers has attracted researchers attentions. [6] proposed an author-topic model to compute the similarity between authors over topics distributed on documents of their writings. Most of recent research works just focus on this kind of Web content analysis aspect using content mining techniques, but not on user s online activities pattern. The Web Mining technology opens the opportunity to mine relationships among users on the Web [7]. Times of online communications can be simply found from server log file. [2] evaluated the author-topic model and proposed their two-step method which combines probabilistic topics similarity in first step and finer content similarity measure in second step. The second step measuring considers the temporal factor of published post entries, since people s interest could be changed while time passes. The second step measure demonstrates the improvement by considering the time intervals related to author s interest. However, all of these methods are only based on author s writing interest. There are still lots of users surfing on the Web only being readers rather than writers. How to analyze user s reading interest? Web usage mining technique provides a possibility to find the solution. Web Usage Mining techniques are used to analyze user s behavior on a Website [7] [8] [14]. The study from [8] shows a proposed approach combining content and usage together to measure the similarity of behaviors between two visitors. In [10], authors introduce a model to find patterns between visitors in order to build an effective recommender system. Nevertheless, those studies are only classifying users based on their behaviors, but not their real interest.
3 3 The Proposed User-centric Personal Network In order to start our social network analysis, the proposed personal network is defined as follows: Each actor has his or her own network which is represented as a weighted graph G = (V, E, W ). In this network, a centric user represents the root node of the graph. Vertices V represent the friends of the centric user in the social network. The interest of each centric user is reflected by all the related content, including his or her own blog entries, and also other blog entries he or she browsed or read. Edges E represent the relationships between different users in the network. W denotes the weight of a relationship Rel(i, j) = W ij, Rel(i, j) denotes the relationship between user i and user j. W ij indicates the closeness between two users. According to our review study, there is nearly no previous research providing a mechanism to weight users social relationships. As a result, our study only focuses on personal network. Firstly, personal network is much less complex than the entire network. Secondly, personal network analysis is designed to be more user-oriented. Additionally, our proposed network design also considers that one relationship could have different values based on different centric-user. In other words, Rel(i, j) Rel(j, i). The importance of the relationship is different from each actor in the network. 4 User Interest Mining In order to weight different relationships for centric user, two basic principles for interest mining are needed to design. First one is: if two share more similar interest, these two contacts should consider to have a closer relationship. The second principle: More times one spending or more frequently visiting the other one s website indicates that the later one s site owner or site content is more interesting and important. Thus, based on these two principles, our task here is converted to user interest similarity measure. 4.1 Writing Content Analysis Writing content analysis concentrates on mining centric-user s self-generated content. Blog content mining has been studied in some recent research works [2] [3] [4] [5]. One of the two main approaches in the previous works is to utilize topic distribution model based on probabilistic theory. Another method uses the statistical term frequency content-based approach which is mainly used in the area of information retrieval. Each blog entry from blog websites may contain several topics. All the text corpus from each user is viewed as a combination of different topics. Each topic
4 occurring in a content corpus produces a probability value. With the help of entropy-based technology, such as KL-divergence, probabilities on the topics shared by two writers is able to be obtained. Topic model for learning the interest of authors from text corpus was introduced in [6] [8], and Rosen-Zvi proposed Author-Topic model to extend the basic LDA model [6]. Both of these two methods need to learn the parameters in estimation approach. In our study, the topic probability distributions are directly obtained from tags (keywords) distribution, since tags are inserted by authors themselves. Similar to the approach in [6], the similarity measure between user i and j is shown in Equation 1, D(i, j) = T t=1 [θ it log θ it θ jt + θ jt log θ jt θ it ], (1) where T denotes the set of topics, and θ it denotes the probability of topic t from user i. This method applies KL-divergence to compute the similarity between user i and j. The term-frequency model is well studied in the area of text document classification. After stop-word removal, spamming and low frequency terms removal, the terms in the text occurring more frequently contribute more importance to the whole document. According to [2], in its second stage of similarity computation, temporal factors are considered to affect the similarity. For example, the topics of two different pieces of content are very similar, but the interest similarity value is still low if the time interval between two published dates is large. According to [2], the similarity function is defined in Equation 2, where entry k denotes a blog entry from the entry set E it of user i, m(k) m(l) denotes the month difference of published date between entry k and entry l. Additionally, in Equation 2, λ takes the value 1, if it is set to consider time difference; otherwise, it takes 0. In order to take average similarity value from all the entry content, the sum of similarity values are divided by the numbers of total entries from user i and j which denote as n i and n j. Sim(i, j) = k E i l E j S(entry k, entry l ) e λ m(k) m(l) n i n j (2) 4.2 Reading Interest Analysis Measuring user interest based on blog entry content, however, only considers user s writing content on the Web. Although large number of Web users are contributing contents, the majority of the Web users are still readers. Based on this reality, detecting reading interest of users is highly necessary. Web log analysis is to study the access patterns of user s online activities. In the context of social networking, the browsing history of user i on j s website indicates user j s content is interested to user i. Therefore, if user i stays on page p longer than a threshold time length l, where p is not in E i. E i denotes the
5 pages of user i s personal website. It can be concluded that user i is interested in the content of page p. In the first stage of Web usage analysis, the raw data for usage analysis is extracted from the Web server log files. Since no user identities in Web Server log files which recorded IP address as client identification, problem encounters when multiple users logon using a same machine. Fortunately, In social networking websites, users log in and start their online social life with their own account. In our project, the logging history is extracted from application level, HTTP sessions. Once one logs in, the application would create a session for each user. Privacy issue may arise, if users do not want their browsing history being manipulated. As a result, in order to handle this situation, our proposed framework consider that browsing history is denied to be processed. A set of visited pages from browsing history for user i is denoted as R i. R i could be an empty set, if history data is denied to be processed. 4.3 Our Proposed Framework Combining Reading and Writing Interest Two set of pages are defined in our proposed framework. One is a set of pages of which are centric-user generated content. The second set of pages is from content which the centric user has read. Based on these two sets of content, the system tries to analyze the content not only what users write, but also what users read. It attempts to address the problem that some users prefer reading other s content rather than writing his/her own blog content, which is a very common phenomenon on the Web. The main task is to measure the similarity between centric-user i and a friend j. Due to the privacy issue needs to be considered, the whole measuring process is divided into five stages as follows: The similarity S 1 between user i and j based on their writings is computed using the Equation 3. The content data in this phase is from blog entries of user i and j. The result is multiplied by the weight factor β 0. Since users log data from both i and j is collected, the similarity S 2 between the content of i s writing and j s reading is able to be computed. The similarity result is multiplied by a weight factor β 1. Same to the process in phrase two, the similarity S 3 between the content of i s reading and j s writing is computed. The result is multiplied by a weight factor β 1. Similarly, the similarity S 4 between the content of i s reading and j s reading is computed. The result is multiplied by a weight factor β 2. Finally, we sum up S 1, S 2, S 3 and S 4 and then multiplies it with another weight factor α. alpha is a factor that considers how often user i visits j s website. If i visits j s website. User j means more important to user i. S 1 = Sim(W i, W j ) β 0, (3)
6 S 2 = Sim(W i, R j ) β 1, (4) S 3 = Sim(R i, W j ) β 1, (5) S 4 = Sim(R i, R j ) β 2, (6) Similarity(i, j) = (S 1 + S 2 + S 3 + S 4 ) α, (7) where Sim() function is content similarity measure function from Equation 2, weight factors β 0 > β 1 > β 2, W i denotes the writing content from user i. R i denotes the reading content of user i, and W j does not belong to R i. If user i denies the application to process log data, S 3 will take value 0. Similarly, if user j denies, S 2 takes 0. The values of weight factors β 0, β 1 and β 2 are defined as follows: β 0 > β 1 > β 2, because writing interest has more impact on reflecting personal interest than reading which could occur arbitrarily. α is the weight factor that indicates how often user i visits j s website. In section 4.1, in equation 1, the content analysis model is introduced. By replacing Sim(i, j) in equation 3 with equation 1, the similarity value between two users i and j is able to be obtained. After applying equation 3 to each relationship between each friend and centric-user, the values of ranking criteria for the friend list are generated. As a result, the system is able to rank the friend list based on the common sharing interest. Fig. 1: A screenshot from a user s blog-based personal website of system prototype visolink 5 System Prototype Implementation In order to evaluate our ranking method, the system prototype, namely visolink, has been under development. This prototype system provides the similar services
7 as the current online social networking sites, such as blog service, photo sharing and friendship management. Experimental data is collected when users are using the site. For example, topic probabilities are extracted from the user s blog post tagging annotation. User s reading behaviors are extracted from the server Web logs. As shown in Figure 1, the personal interest are mainly represented by his or her writing content of his blog-based personal website, such as blog posts, photo titles, descriptions and comments on the other s website. The final goal of the system is to present the ranking of social relationships. Actually showing the order of the ranking is more important than the actual ranking scores. As a result, system prototype visolink provides an enhanced view of friends ranking. Based on our principle system design concept, it is useful to show the order of online social relationship ranking, instead of show meaningless individual ranking score. As shown in Figure 2, the personal social network of centric-user Anson is generated from an automatic graph drawing algorithm. The main contact Anson, is placed into the center of the graph. Unlike the classical graph drawing using length of edges representing the distance between two entities, visolink visualizes the network by using vector-based graphical technique which allows those less important nodes being smaller and more transparent. This kind of representation of the network with criteria of clearness and node size is much better for users to judge which nodes are more important, rather than letting users to measure the distance or length between nodes by using their eyes. We design our visualization component to provide users a better understanding on their own personal networks. Most important contacts should be emphasized, and others that have low similarity values should be ignored. A fake 3D view of personal network is generated to end user as shown in Figure 2. visolink includes personal network friends ranking and recommendation. In the current phase, we have proposed a framework to generate ranking automatically. The prototype website has started to collect experimental user data. Fig. 2: A screenshot of our proposed visualization of personal network ranking result
8 6 Conclusions and Future Work In this paper, an approach combining content and usage analysis for user interest mining of online social networks has been proposed. It measures user s interests based on both users writing and reading interests. This similarity measure between online users provides a fundamental support for personal social network visualization and the personalized recommendation. The existing dataset online available for our system to perform experiment is hard to be found. Because both blog content and application logging data are needed. In the next phase of the project, we will perform evaluation experiments to examine the accuracy and effect of the ranking method from our own site visolink.com. A recommendation system based on online social relationship ranking will be explored in the future. References 1. Dekker, A.: Conceptual Distance in Social Network Analysis. Journal of Social Structure. 6(3) (2005) 2. Shen, D., Sun, J., Yang, Q., Chen, Z.: Latent Friend Mining from Blog Data. In: 6th International Conference on Data Mining, pp Hong Kong, China (2006) 3. Takama, Y., Matsumura A., Kajinami, T.: Interactive Visualization of News Distribution in Blog Space. In: 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology, pp IEEE Press, Hong Kong, China (2006) 4. Markrehchi, M., Kamel, M., S.: Learning Social Networks from Web Documents Using Support Vector Classifier. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp IEEE Press, Hong Kong, China (2006) 5. Spertus, E., Sahami, M., Buyukkokten, O.: Evaluating Similarity Measures: A Large-Scale Study in the Orkut Social Network. In: 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pp Chicago, U.S.A (2005) 6. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: 20th conference on Uncertainty in artificial intelligence, pp Arlington, Virginia, U.S.A (2004) 7. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data, Springer (2006) 8. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, (2003) 9. Murata, T., Saito, K.: Extracting User s interests from Web Log Data. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp Hong Kong, China (2006) 10. Mobasher, B., Dai, H., Luo, T., Sun, Y., Zhu, J.: Integrating Web Usage and Content Mining for More Effective Personalization. In: Int l Conf. on E-Commerce and Web Technologies, ECWeb2000, pp UK (2000)
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationLink Recommendation Method Based on Web Content and Usage Mining
Link Recommendation Method Based on Web Content and Usage Mining Przemys law Kazienko and Maciej Kiewra Wroc law University of Technology, Wyb. Wyspiańskiego 27, Wroc law, Poland, kazienko@pwr.wroc.pl,
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationMubug: a mobile service for rapid bug tracking
. MOO PAPER. SCIENCE CHINA Information Sciences January 2016, Vol. 59 013101:1 013101:5 doi: 10.1007/s11432-015-5506-4 Mubug: a mobile service for rapid bug tracking Yang FENG, Qin LIU *,MengyuDOU,JiaLIU&ZhenyuCHEN
More informationComment Extraction from Blog Posts and Its Applications to Opinion Mining
Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan
More informationMobile Web User Behavior Modeling
Mobile Web User Behavior Modeling Bozhi Yuan 1,2,BinXu 1,2,ChaoWu 1,2, and Yuanchao Ma 1,2 1 Department of Computer Science and Technology, Tsinghua University, China 2 Tsinghua National Laboratory for
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationImplementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky
Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky The Chinese University of Hong Kong Abstract Husky is a distributed computing system, achieving outstanding
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationAn improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi Sun, Songjiang Li
3rd International Conference on Mechatronics and Industrial Informatics (ICMII 2015) An improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationjldadmm: A Java package for the LDA and DMM topic models
jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical
More informationSQTime: Time-enhanced Social Search Querying
SQTime: Time-enhanced Social Search Querying Panagiotis Lionakis 1, Kostas Stefanidis 2, and Georgia Koloniari 3 1 Department of Computer Science, University of Crete, Heraklion, Greece lionakis@csd.uoc.gr
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationLinking Entities in Chinese Queries to Knowledge Graph
Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn
More informationRSDC 09: Tag Recommendation Using Keywords and Association Rules
RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA
More informationResearch on Design and Application of Computer Database Quality Evaluation Model
Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the
More informationMining for User Navigation Patterns Based on Page Contents
WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationText Document Clustering Using DPM with Concept and Feature Analysis
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationMining User - Aware Rare Sequential Topic Pattern in Document Streams
Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationRanking models in Information Retrieval: A Survey
Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor
More informationWEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE
WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,
More informationA Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo
A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo
More informationAutomated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationBUAA AUDR at ImageCLEF 2012 Photo Annotation Task
BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn
More informationEvaluating the suitability of Web 2.0 technologies for online atlas access interfaces
Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces Ender ÖZERDEM, Georg GARTNER, Felix ORTAG Department of Geoinformation and Cartography, Vienna University of Technology
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationAn Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database
Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management
More informationBehaviour Recovery and Complicated Pattern Definition in Web Usage Mining
Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.
More informationUser Contribution Measurement in Online Forum with Fraud Immunity
www.ijcsi.org 457 User Contribution Measurement in Online Forum with Fraud Immunity Guo-Ying WANG 1 and Shen-Ming QU 2 1 Information Engineering College, Zhejiang A&F University Hangzhou, 311300, China
More informationSpatial Latent Dirichlet Allocation
Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA
More informationParallelism for LDA Yang Ruan, Changsi An
Parallelism for LDA Yang Ruan, Changsi An (yangruan@indiana.edu, anch@indiana.edu) 1. Overview As parallelism is very important for large scale of data, we want to use different technology to parallelize
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:
Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,
More informationOntological Topic Modeling to Extract Twitter users' Topics of Interest
Ontological Topic Modeling to Extract Twitter users' Topics of Interest Ounas Asfari, Lilia Hannachi, Fadila Bentayeb and Omar Boussaid Abstract--Twitter, as the most notable services of micro-blogs, has
More informationA System for Identifying Voyage Package Using Different Recommendations Techniques
GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A System for Identifying Voyage Package Using Different Recommendations Techniques 1 Gajjela.Sandeep, 2 R. Chandrashekar 1 M.Tech (CS),Department of Computer Science
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationModelling Structures in Data Mining Techniques
Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationA Data Classification Algorithm of Internet of Things Based on Neural Network
A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To
More informationFramework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm
Framework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm Huiqi Zhao 1,2,3, Yinglong Wang 2,3*, Minglei Shu 2,3 1 Department of Information
More informationRECOMMENDATIONS HOW TO ATTRACT CLIENTS TO ROBOFOREX
RECOMMENDATIONS HOW TO ATTRACT CLIENTS TO ROBOFOREX Your success as a partner directly depends on the number of attracted clients and their trading activity. You can hardly influence clients trading activity,
More informationSurvey on Recommendation of Personalized Travel Sequence
Survey on Recommendation of Personalized Travel Sequence Mayuri D. Aswale 1, Dr. S. C. Dharmadhikari 2 ME Student, Department of Information Technology, PICT, Pune, India 1 Head of Department, Department
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationWeb Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document
More informationClassification with Class Overlapping: A Systematic Study
Classification with Class Overlapping: A Systematic Study Haitao Xiong 1 Junjie Wu 1 Lu Liu 1 1 School of Economics and Management, Beihang University, Beijing 100191, China Abstract Class overlapping has
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationMethod to Study and Analyze Fraud Ranking In Mobile Apps
Method to Study and Analyze Fraud Ranking In Mobile Apps Ms. Priyanka R. Patil M.Tech student Marri Laxman Reddy Institute of Technology & Management Hyderabad. Abstract: Ranking fraud in the mobile App
More informationTaccumulation of the social network data has raised
International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and
More informationAn Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid
An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationCharacterizing Web Usage Regularities with Information Foraging Agents
Characterizing Web Usage Regularities with Information Foraging Agents Jiming Liu 1, Shiwu Zhang 2 and Jie Yang 2 COMP-03-001 Released Date: February 4, 2003 1 (corresponding author) Department of Computer
More informationPrioritizing the Links on the Homepage: Evidence from a University Website Lian-lian SONG 1,a* and Geoffrey TSO 2,b
2017 3rd International Conference on E-commerce and Contemporary Economic Development (ECED 2017) ISBN: 978-1-60595-446-2 Prioritizing the Links on the Homepage: Evidence from a University Website Lian-lian
More informationA Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation
A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation * A. H. M. Al-Helali, * W. A. Mahmmoud, and * H. A. Ali * Al- Isra Private University Email: adnan_hadi@yahoo.com Abstract:
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationThe Curated Web: A Recommendation Challenge. Saaya, Zurina; Rafter, Rachael; Schaal, Markus; Smyth, Barry. RecSys 13, Hong Kong, China
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title The Curated Web: A Recommendation Challenge
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationIMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM
IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationCollaborative Filtering using Euclidean Distance in Recommendation Engine
Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationAutomatic New Topic Identification in Search Engine Transaction Log Using Goal Programming
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log
More informationFSRM Feedback Algorithm based on Learning Theory
Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationProxy Server Systems Improvement Using Frequent Itemset Pattern-Based Techniques
Proceedings of the 2nd International Conference on Intelligent Systems and Image Processing 2014 Proxy Systems Improvement Using Frequent Itemset Pattern-Based Techniques Saranyoo Butkote *, Jiratta Phuboon-op,
More informationA Survey on Postive and Unlabelled Learning
A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled
More informationAn Adaptive Threshold LBP Algorithm for Face Recognition
An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent
More informationPattern Classification based on Web Usage Mining using Neural Network Technique
International Journal of Computer Applications (975 8887) Pattern Classification based on Web Usage Mining using Neural Network Technique Er. Romil V Patel PIET, VADODARA Dheeraj Kumar Singh, PIET, VADODARA
More informationTheme Identification in RDF Graphs
Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published
More informationCompetitive Intelligence and Web Mining:
Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 398 Web Usage Mining has Pattern Discovery DR.A.Venumadhav : venumadhavaka@yahoo.in/ akavenu17@rediffmail.com
More informationSupervised Random Walks
Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationExploring archives with probabilistic models: Topic modelling for the European Commission Archives
Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Simon Hengchen, Mathias Coeckelbergs, Seth van Hooland, Ruben Verborgh & Thomas Steiner Université libre
More informationPrivacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in
More informationP2P Contents Distribution System with Routing and Trust Management
The Sixth International Symposium on Operations Research and Its Applications (ISORA 06) Xinjiang, China, August 8 12, 2006 Copyright 2006 ORSC & APORC pp. 319 326 P2P Contents Distribution System with
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationReview on Techniques of Collaborative Tagging
Review on Techniques of Collaborative Tagging Ms. Benazeer S. Inamdar 1, Mrs. Gyankamal J. Chhajed 2 1 Student, M. E. Computer Engineering, VPCOE Baramati, Savitribai Phule Pune University, India benazeer.inamdar@gmail.com
More informationMinimal Test Cost Feature Selection with Positive Region Constraint
Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationEvaluating the Usefulness of Sentiment Information for Focused Crawlers
Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,
More informationISSN: [Shubhangi* et al., 6(8): August, 2017] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY DE-DUPLICABLE EFFECTIVE VALIDATION of CAPACITY for DYNAMIC USER ENVIRONMENT Dr. Shubhangi D C *1 & Pooja 2 *1 HOD, Department
More informationA New Evaluation Method of Node Importance in Directed Weighted Complex Networks
Journal of Systems Science and Information Aug., 2017, Vol. 5, No. 4, pp. 367 375 DOI: 10.21078/JSSI-2017-367-09 A New Evaluation Method of Node Importance in Directed Weighted Complex Networks Yu WANG
More informationFraud Detection of Mobile Apps
Fraud Detection of Mobile Apps Urmila Aware*, Prof. Amruta Deshmuk** *(Student, Dept of Computer Engineering, Flora Institute Of Technology Pune, Maharashtra, India **( Assistant Professor, Dept of Computer
More informationA Web Recommendation System Based on Maximum Entropy
A Web Recommendation System Based on Maximum Entropy Xin Jin, Bamshad Mobasher,Yanzan Zhou Center for Web Intelligence School of Computer Science, Telecommunication, and Information Systems DePaul University,
More informationKnowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey
Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya
More informationBipartite Graph Partitioning and Content-based Image Clustering
Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the
More informationA Decision-Theoretic Rough Set Model
A Decision-Theoretic Rough Set Model Yiyu Yao and Jingtao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,jtyao}@cs.uregina.ca Special Thanks to Professor
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationMaking Privacy a Fundamental Component of Web Resources
Making Privacy a Fundamental Component of Web Resources Thomas Duebendorfer (Google Switzerland GmbH), Christoph Renner (Google Switzerland GmbH/ETH Zurich), Tyrone Grandison (IBM), Michael Maximilien
More informationOntology Generation from Session Data for Web Personalization
Int. J. of Advanced Networking and Application 241 Ontology Generation from Session Data for Web Personalization P.Arun Research Associate, Madurai Kamaraj University, Madurai 62 021, Tamil Nadu, India.
More information