Detecting Blog Spam Hashtags Using Topic Modeling

Size: px
Start display at page:

Download "Detecting Blog Spam Hashtags Using Topic Modeling"

Transcription

1 Detecting Blog Spam Hashtags Using Topic Modeling Yoonjin Hyun Ph.D. Candidate, Graduate School of Business Information Technology, Kookmin University 77 Jeongneung-ro, Seongbuk-gu, Seoul, 02707, Korea Namgyu Kim Associate Professor, School of Management Information Systems, Kookmin University 77 Jeongneung-ro, Seongbuk-gu, Seoul, 02707, Korea ABSTRACT Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or nonmaliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include spam detection, web spam detection, and opinion spam detection. Spam is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags. CCS Concepts Information Systems World Wide Web Web searching and information discovery Web search engines Spam detection Keywords Text Mining; Topic Modeling; Hash Tag Spam; Spam Detection 1. INTRODUCTION With the growth of the Internet and popularization of smart devices, tremendous amounts of data are generated daily. The Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ICEC '16, August 17-19, 2016, Suwon, Republic of Korea Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM /16/08 $15.00 DOI: generation of big data and its importance have been addressed in various publications, such as The Economist (2011) [1], McKinsey (2011) [2], and Gartner (2011) [3]. Therefore, the demands and interests of big data analysis are still a major concern. In particular, unstructured text data generated through news, blogs, and social media has gained the attention of many researchers due to the trove of information contained regarding real-time consumer opinions and behavior. However, as the use of text data becomes popular and the applicability of text data is extended, attempts to achieve specific effects by distorting text data maliciously or non-maliciously are increasing. The increasing spam text data not only causes trouble to those users who want to get useful information but also reduces data reliability. Therefore, it is important to conduct research on solving the spam problem. Spam can be classified into three types: , social network service (SNS), and blog spams. spam is called junk mail or bulk mail. These unwanted commercial spam s are transmitted to random addresses that have been collected either from community sites or bulletin boards or created by combining words or numbers randomly. This spam not only interrupts the process of searching s but can also cause overload on the receiver s server(s). SNS spam is usually distributed using mention or hashtag. However, spammers have become smarter, and their strategies have evolved. Spam is distributed either through a coordinated posting behavior among spammers, en masse and variously generated by finite-state machine-based spam templates, or through exposure by passive spam when a user searches a specific keyword [4]. SNS spam creates a lot of trouble for other online users who want a safe and free environment for communication that excludes unwanted advertising posts. In addition, many users are also exposed to an unspecified number of unwanted advertising posts through unwanted following of the spammers. To solve these problems, Twitter, a popular SNS service, introduced a system whereby Twitter users can directly report spam, such as tweeting malicious links, sending unsolicited messages to legitimate users, and hijacking trending topics. When Twitter receives a spam report, the offending account is temporarily suspended. Recently, Facebook changed its algorithm for screening spam articles by calculating the time taken to read articles via a newsfeed ranking since some article titles do not reflect the actual content or contain phishing attempts. These cases are indicative of the damage caused by spam. Finally, blog spam is created by posting fake articles regarding products and services or using unrelated hashtags to intentionally expose the article to other random users. In this case, not only do users face difficulties in searching for their desired information but also the reliability of the relevant blog is reduced.

2 Spam detection has been studied for a long time in an effort to prevent the various side effects of spam activation. The most representative studies include opinion spamming, opinion spam detection, spam detection, and web spam detection [5]. However, the criterion that defines spam is different for each study. There are three criteria: (1) if a certain user is recognized as a spammer through his/her identity or usage pattern, then all content created or posted by the relevant user is recognized as spam; (2) if a certain content is exposed to other users (regardless of the users intention), then the content is recognized as spam (i.e., spam, fake links, etc.); and (3) any content that contains malicious or nonmalicious false information and is written differently from fact is recognized as spam. For case (1), relevant studies using SNS data have been actively conducted using the user s access pattern and demographic information [6, 7, 8, 9, 10]. Thus, several applications to block spam have been developed and are commercially available. Case (2) describes the classic field of spam detection for spam transmitted to unspecified individuals and SNS spam through embedded URLs [11]. For cases (1) and (2), many studies have been conducted to detect spam by taking advantage of the metadata apart from the content. However, there are insufficient studies in case (3) as this requires going through the content to determine whether or not it is spam. In hashtag spamming, for example, a hashtag that is irrelevant to the content is used to attract other users. Let s assume that a nonrelevant hashtag is attached to a particular post; then, even though both the hashtag and the post are not spam, the hashtag is likely to be spam if it is attached to the post. The details are illustrated in Figure 1. Figure 1. Example of Hashtag Spam Figure 1 shows the posts relevant to (a) Health and (b) Movie and their hashtags. Post (a) is attached to the Exercise and Diet hashtags, whereas post (b) is attached to the Avengers, Black Widow, and Scarlett Johansson hashtags. None of these posts and hashtags are recognized as spam because the posts and hashtags are properly attached. However, if the Diet hashtag of post (a) is attached to post (d) Movie, which is totally unrelated, then the whole post can be considered spam even though the content of the post is not. Likewise, if the Scarlett Johansson hashtag of post (b) is attached to the unrelated post (c) Health, then the whole healthrelated post can be recognized as spam. In this type of spamming, the spam should not be detected only using the content of the post but should also consider part or whole of the hashtags. In this manner, it is impossible to solve the problem using the methods mentioned in cases (1) and (2). Therefore, this study conducts a content analysis to detect abnormal connection between the posts and the hashtags. In general, the topics of a particular content can be identified through topic modeling; then, a set of hashtags that is used above a certain level for each topic is derived. If there is a hashtag that does not belong to the derived set of hashtags used, then this hashtag is likely to be spam. The remainder of this paper is organized as follows. The next section introduces related work on topic modeling and spam detection. Section 3 describes the proposed methodology in detail. Section 4 presents the concluding remarks and includes future plan for experiments, contributions, and limitations to overcome. 2. Related Works 2.1 Topic Modeling Traditionally, data mining has been used to extract new knowledge through structured data analysis [12]. Recently, large volumes of unstructured text data have become available for distribution through different social media platforms, such as news articles, blogs, and social media systems. Thus, to discover new knowledge and useful patterns from this unstructured text data, text mining study has been conducted. Text mining plays an important role in many fields that use text data. Text mining is a comprehensive technique used in information extraction, information retrieval, natural language processing, topic tracking, text summarization, and categorization [13, 14]. Thus, in addition to resolving the traditional topics [15, 16, 17], there is further scope for the use of topic modeling in more diverse topics. Among the various contemporary text mining-related applications, topic modeling is the most actively utilized and shows tangible results in many areas. In topic modeling, a document is used as the minimum unit in the analysis of its title, abstract/summary, content, and comment. Topic modeling groups a large number of documents on the basis of their similarity and describes each group through representative keywords. The keywords of each document are selected according to term frequency. Depending on the purpose, term frequency can be measured using a binary model, three-value model, or TF-IDF (Term Frequency Inverse Document Frequency) [18]. A document that belongs to a topic group can simultaneously belong to multiple topic groups, which is a feature that differentiates this method from traditional cluster analysis. Recently, various attempts have been made to solve the problems in different areas using topic modeling. Following the extraction of a large number of issues, Kim et al. (2014) [19] and Hyun et al. (2015) [20] derived the main issues using a clustering method; however, the results vary in accordance with different perspectives. Choi et al. (2015) [21] showed that a recommendation system can be improved by topic modeling for analyzing users interests. In addition, other related studies, such as customer segmentation based on users issues of interest through topic modeling [22], public opinion analysis of science and technology issues [23], and the analysis of the dynamic mutation process regarding issues [24]. In this study, topic modeling is utilized in content analysis for content-based spam detection. 2.2 Spam Detection To overcome the risk of spam in the unprotected Internet world, spam detection has been a topic of study for a long time. The three

3 types of spam detection most studied are , web, and opinion spam. Since there is not a lot of variation in structure, many e- mail spam-related studies have shown relatively desired results. Different kinds of methods and applications to block spam have been commercialized. The most representative methodology is spam filtering using the Bayesian approach [26, 27]. Recently, many spam studies have utilized the spam dataset released by Enron, a U.S. company found guilty of massive accounting fraud in 2001 [27, 28]. Web spam can be classified into two types: link spam and content spam [5]. Many attempts have been made to prevent web spams. For example, the TrustRank algorithm is proposed to compute trust scores, where good pages are given higher scores. Based on the calculated trust score, spam pages can be filtered out during the search engine process [29]. Moreover, mass measurement of spam is proposed to identify link spamming based on web link structures [30]. Study on web spam detection through content analysis of web pages also has been introduced [31]. However, due to the fluid nature and massive volumes of web spam, its detection remains problematic. Unlike and web spam, opinion spam-related study is relatively insufficient. Opinion spam aims to achieve the specific purpose of identifying fake opinion about social and political issues or fake reviews about certain products or services. Opinion spam detection is far more difficult because it requires a visual review to make a final determination of spam. In the case of fake product reviews, it is very difficult to detect opinion spam because it is impossible to confirm whether the user actually used the product or not. Thus, there are still numerous challenges facing opinion spam detection [5]. In recent years, SNS and blogs spam have also been studied. The most representative studies identify spam based on social attributes [7], detecting abnormal behaviors [6], spam detection through utilizing users account or message network structure [8, 9, 10], and spam detection through classifying tweet-embedded URLs [11]. In particular, hashtag spam detection draws the most attention from researchers and practitioners. Hashtags are used to share content with unspecified individuals and expose the content to groups of people who are interested. However, as it is easy to share information through hashtags, they are also more likely to be used as spam vectors. Therefore, attempts to prevent hashtag spam or hashtag hijacking are being made. For example, several studies are analyzing the types of hashtag hijacking attacks [32] and investigating the effect of spam on hashtag recommendations for tweets [33]. However, hashtag spam detection-related study is insufficient, and most studies focus on Twitter to take advantage of its metadata, such as user accounts, follow relationships, content, network structure, and so on. However, Twitter data has many disadvantages; foremost, it is very simple and short as the content is limited to 140 characters. Furthermore, Twitter data contain significant noise, i.e., garbage data that makes it quite difficult to perform content analysis. Thus, methodologies that rely on Twitter metadata can hardly be applied to researching other content sources. In this study, a methodology is introduced to detect hashtag spam based on blog content analysis without metadata. 3. Contents-Based Blog Hashtag Spam Detection 3.1 Research Overview In this section, the methodology of content-based blog hashtag spam detection is explained in detail. The term hashtag spam refers to a hashtag that does not fit with the topic of the document to which it has been attached. Simply put, the hashtag does not fit the theme of the content. The blog data used in this study is assumed not to be spam. Figure 2 illustrates an overview of the entire process. In Figure 2, the cylinders represent the data source (blog data, document text data, and hashtag information), the rectangles represent the main analysis processes, and the dotted-line boxes represent each output. There are a total of six important processes: (1) blog data is classified into content (document text) and hashtag information; (2) topic modeling is performed on the document text to analyze the content and assign a topic cluster; (3) spam hashtags are added randomly to the existing hashtag information, and the hashtag table for each document derived. By analyzing hashtag frequency using the results of steps (2) and (3), the valid hashtag lists for each topic and document are derived in steps (4) and (5). Finally, by comparing the analyzed results of steps (3) and (5), spam hashtags can be detected, with verification conducted using the F-score. The whole process is explained further through examples in the next section. Figure 2. Research Overview 3.2 Contents Analysis using Topic Modeling In this subsection, processes (1) and (2) in Figure 1 are explained in detail. First, blog data was refined for it to be appropriate for analysis, and databases of (i) document text data and (ii) hashtag information were constructed. The fields of database (i) are document number, date, title, and content, whereas the fields of

4 database (ii) are document number and hashtag. After performing topic modeling on database (i), the document clusters for each topic were derived. As the topic modeling process in this paper mirrors the general methodology, it is not described in this paper. Figure 3 shows the virtual example of topic modeling for five documents: d1 d5. In Figure 3, two topics were derived: travel and movie. Documents d1 d3 were grouped under travel and documents d2 d5 were grouped under movie. In addition, d2 and d3 belong to travel and movie simultaneously because one document can belong to multiple topic groups in topic modeling, which is different than in clustering. 3.3 Extracting Valid Hashtag list The method for extracting valid hashtag lists for each topic and document is explained using the outcomes of the previous subsection. This subsection includes processes (3) (5) in Figure 1. Prior to extracting the valid hashtags, the spam hashtags are added to the existing hashtag information. When collecting blog data, it is very difficult to distinguish spam from something that is not spam, and there is the possibility of no spam being collected from the blog data. Therefore, the blog data was assumed as containing no spam, and the hashtag list for each document was derived by inserting spam hashtags randomly. Figure 3. Example of Inserting Spam Hashtag Figure 3 is a diagrammatic representation of the random insertion of spam hashtags. Spam 1 was added to the hashtag list of d1, and spam 2 was added to the hashtag list of d5. After inserting spam hashtags, the valid hashtags for each topic were derived by generating a frequency analysis on the hashtag list for each document. If the frequency of a hashtag is above a certain threshold in the document cluster for a specific topic, then this hashtag is selected as a valid hashtag for that topic. Figure 4 shows the virtual valid hashtag lists for each topic. Figure 4. The process of Extracting Valid Hashtag for each Topic For results (b) and (c) in Figure 4, the number of hashtags used is more than the majority of the documents in the document cluster for each topic. Specifically, the hashtags,,, and were derived as valid hashtags of T1 because they were used in two or more documents in document cluster T1. Likewise,,, Mission Impossible, and were derived as valid hashtags because they were used in two documents in the document cluster T2, which has four documents. After the valid hashtag list for each topic was derived, the list is used as a criterion to derive the valid hashtag list of each document. In other words, the valid hashtag list of a specific topic comprises the valid hashtag lists for all the documents in that topic. Table 1 shows the virtual result of extracting valid hashtag lists for each document. Table 1. The Virtual Result of Extracting Valid Hashtag list for each Doc. Doc. No V_tag d1 {,, } d2 {,, } d3 {,, } d4 {, } d5 {, } 3.4 Detecting Hashtag Spam Finally, in process (6) of Figure 2, the spam hashtag is identified using the outcomes of the previous subsection. Detecting spam hashtags is achieved using a comparative analysis of hashtag lists that includes both the spam hashtag and valid hashtag lists for each document. Table 2 shows the virtual result of detecting spam hashtags.

5 Table 2. The Virtual Result of Detecting Spam Hashtag Doc. No d1 d2 d3 d4 d5 V_tag A_tag Detected spam1 Movie spam2 spam1 X X Movie spam2 In Table 2, Spam1 and Spam2 were detected as spam in d1 and d5, whereas for other documents, although Movie is not a spam hashtag, it was unfortunately detected as spam in d4. In the actual experiment, detecting spam hashtags will be conducted in the same way as shown in the proposed methodology. Verification will also be performed using the F-score calculated based on the harmonic average of recall and precision. 4. Concluding Remarks This study remains in progress and the actual experiment using the proposed methodology will be conducted. To enable further progress, target experiment data should be collected beforehand. The most important data for this study is blog data that contains hashtag information. Using a customized crawler, we collected about 14,000 random blog samples from the most popular website in Korea. This data comprises the document number, date, title, hashtag, and content. The experiment will only use document number, hashtag, and content. Using SAS Enterprise Guide and Excel VBA, we will refine and analyze the data that was collected. In addition, we will analyze the contents through topic modeling using SAS Enterprise Miner REFERENCES [1] Economist Intelligence Unit Big Data Harnessing a Game-Changing Asset. The Economist. [2] McKinsey Global Institute Big Data: The next Frontier for Innovation, Competition, and Productivity. McKinsey and Company. [3] Gartner Inc Hype Cycle for Emerging Technologies. Gartner Inc. [4] Chen, C., Zhang, J., Xiang, Y., and Zhou, W Spammers Are Becoming Smarter on Twitter. Browse Journals & Magazines. 18, 2. DOI= [5] Liu, B Sentiment Analysis and Opinion Mining. Syntehesis Lectures on Human Language Technologies #16, Morgan & Claypool Publisiers. [6] Egele, M., Stringhini, G., Kruegel, C., and Vigna, G Compa: Detecting Compromised Accounts on Social Networks. Proc. Ann. Network and Distributed System Security Symp. ompa.pdf. [7] Song, J., Lee, S., and Kim, J Spam Filtering in Twitter Using Sender-Receiver Relationship. Recent Advances in Intrusion Detection. Volume 6961 of the series Lecture Notes in Computer Science DOI= [8] Yarde, S., Romero, D., Schoenebeck, G., and Boyd, D Detecting Spam in a Twitter Network. Peer-reviewed journal on the Internet. 15, 1(January. 2010). DOI= [9] Wang, A. H Don t Follow Me: Spam Detection in Twitter. Security and Cryptography(SECRYPT), Proceedings of the 2010 International Conference on, (July ). [10] Ma, Y., Niu, Y., Ren, Y., and Xue, Y Detecting Spam on Sina Weibo. International Workshop on Cloud Computing and Information Security(CCIS), (October. 2013). DOI= [11] Lee, S. and Kim, J Warningbird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream. IEEE Transactions on Dependable and Secure Computing. 10, 3 (April. 2013), DOI= [12] Han, J., Kamber, M., and Pei, J Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers. [13] Mooney, R.J. and Bunescu, R Mining Knowledge from Text using Information Extraction. ACM SIGKDD Explorations Newsletter - Natural language processing and text mining, 7, 1(June. 2006), DOI= [14] Rijsbergen, C. J. V., Information Retrieval, 2nd edition, Butterworth, London, [15] Kim, K. and Ahn. H Development of Web-based Intelligent Recommender Systems using Advanced Data Mining Techniques. Journal of Information Technology Applications and Management. 12, 3 (September. 2005), [16] Hur, J. and Kim, J. W Characteristics on Inconsistency Pattern Modeling as Hybrid Data Mining Techniques. Journal of Information Technology Applications and Management, 15, 1 (March. 2008), [17] Hwang, I A Study on Dynamic Query Expansion Using Web Mining in Information Retrieval. Journal of Information Technology Applications and Management. 11, 2 (June. 2004), [18] Weiss, S. M., Indurkhya, N., and Zhang, T Fundamentals of Predictive Text Mining, Springer. [19] Kim, J., Kim. N., and Cho, Y User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network

6 Analysis. Journal of Intelligence and Information Systems. 20, 2 (June. 2014), DOI= [20] Hyun, Y., Kim, N., and Cho, Y A Multi-Dimensional Issue Clustering from the Perspective Consumers Interests and R&D. Journal of Information Technology Services. 14, 1 (March. 2015), DOI= [21] Choi, S., Hyun, Y., and Kim, N Improving Performance of Recommendation Systems Using Topic Modeling. Journal of Intelligence and Information Systems. 21, 3 (September. 2015), DOI= [22] Hyun, Y., Kim, N., and Cho, Y Interest-based Customer Segmentation Methodology Using Topic Modeling. Journal of Information Technology Applications & Management. 22, 1 (March. 2015), [23] Kim, D., Wong, W. X. S., Lim, M., Liu, C., Kim, N., Park, J., Kil, W., and Yoon, H A Methodology for Analyzing Public Opinion about Science and Technology Issues Using Text Analysis. Journal of Information Technology Services, 14, 3 (September. 2015), DOI= [24] Lim, M. and Kim, N Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis. Journal of Intelligence and Information Systems. 22, 1 (March. 2016), DOI= [25] Grier, C., Thomas, K., Paxson, V., and Zhang, M. The Underground on 140 Characters or Less. Proceedings of the 17th ACM conference on Computer and communications security DOI= [26] Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E A bayesian approach to filtering junk . In AAAI Workshop on Learning for Text Categorization. [27] Jia, X., Zheng, K., Li, W., Liu, T., and Shang, L Three-Way Decisions Solution to Filter Spam An Empirical Study. Rough Sets and Current Trends in Computing. Volume 7413 of the series Lecture Notes in Computer Science, DOI= [28] Klimt, B. and Yang, Y Introducing the Enron corpus. In CEAS The Conference on and Anti-Spam. [29] Gyongyi, Z., Garcia-Molina, H., and Pedersen, J Combating web spam with trustrank. In VLDB 04: Proceedings of the Thirtieth international conference on Very large data bases VLDB Endowment. DOI= [30] Gyongyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J Link spam detection based on mass estimation. In VLDB 06: Proceedings of the 32nd international conference on Very large data bases VLDB Endowment. [31] Ntoulas, A., Najork, M., Manasse, M., and Retterly, D Detecting spam web pages through content analysis. Proceedings of the 15th international conference on World Wide Web. (May. 2006), DOI= [32] Xanthopoulos, P., Panagopoulos, O. P., Bakamitsos, G. A., and Freudmann, E Hashtag Hijacking: What it is, why it happens and how to avoid it. Journal of Digital & Social Media Marketing, 3, 4 (February. 2016), [33] Sedhai, S. and Sun, A Effect on Spam on Hashtag Recommendation for Tweets. Proceedings of the 25th International Conference Companion on World Wide Web. (April. 2016), DOI=

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

URL ATTACKS: Classification of URLs via Analysis and Learning

URL ATTACKS: Classification of URLs via Analysis and Learning International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 3, June 2016, pp. 980 ~ 985 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i3.7208 980 URL ATTACKS: Classification of URLs via Analysis

More information

A study of Video Response Spam Detection on YouTube

A study of Video Response Spam Detection on YouTube A study of Video Response Spam Detection on YouTube Suman 1 and Vipin Arora 2 1 Research Scholar, Department of CSE, BITS, Bhiwani, Haryana (India) 2 Asst. Prof., Department of CSE, BITS, Bhiwani, Haryana

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Method to Study and Analyze Fraud Ranking In Mobile Apps

Method to Study and Analyze Fraud Ranking In Mobile Apps Method to Study and Analyze Fraud Ranking In Mobile Apps Ms. Priyanka R. Patil M.Tech student Marri Laxman Reddy Institute of Technology & Management Hyderabad. Abstract: Ranking fraud in the mobile App

More information

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at:

More information

Inferring User Search for Feedback Sessions

Inferring User Search for Feedback Sessions Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department

More information

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets Varsha Palandulkar 1, Siddhesh Bhujbal 2, Aayesha Momin 3, Vandana Kirane 4, Raj Naybal 5 Professor, AISSMS Polytechnic

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

Fraud Detection of Mobile Apps

Fraud Detection of Mobile Apps Fraud Detection of Mobile Apps Urmila Aware*, Prof. Amruta Deshmuk** *(Student, Dept of Computer Engineering, Flora Institute Of Technology Pune, Maharashtra, India **( Assistant Professor, Dept of Computer

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information

An Efficient Informal Data Processing Method by Removing Duplicated Data

An Efficient Informal Data Processing Method by Removing Duplicated Data An Efficient Informal Data Processing Method by Removing Duplicated Data Jaejeong Lee 1, Hyeongrak Park and Byoungchul Ahn * Dept. of Computer Engineering, Yeungnam University, Gyeongsan, Korea. *Corresponding

More information

Text Clustering Incremental Algorithm in Sensitive Topic Detection

Text Clustering Incremental Algorithm in Sensitive Topic Detection International Journal of Information and Communication Sciences 2018; 3(3): 88-95 http://www.sciencepublishinggroup.com/j/ijics doi: 10.11648/j.ijics.20180303.12 ISSN: 2575-1700 (Print); ISSN: 2575-1719

More information

A New Logging-based IP Traceback Approach using Data Mining Techniques

A New Logging-based IP Traceback Approach using Data Mining Techniques using Data Mining Techniques Internet & Multimedia Engineering, Konkuk University, Seoul, Republic of Korea hsriverv@gmail.com, kimsr@konuk.ac.kr Abstract IP Traceback is a way to search for sources of

More information

Review on Techniques of Collaborative Tagging

Review on Techniques of Collaborative Tagging Review on Techniques of Collaborative Tagging Ms. Benazeer S. Inamdar 1, Mrs. Gyankamal J. Chhajed 2 1 Student, M. E. Computer Engineering, VPCOE Baramati, Savitribai Phule Pune University, India benazeer.inamdar@gmail.com

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

A Personal Information Retrieval System in a Web Environment

A Personal Information Retrieval System in a Web Environment Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Chengfeng Lin alex_lin@sjtu.edu.cn Yi Zhou zy_21th@sjtu.edu.cn Kai Chen kchen@sjtu.edu.cn Jianhua He Aston University j.he7@aston.ac.uk

More information

Mining Social Media Users Interest

Mining Social Media Users Interest Mining Social Media Users Interest Presenters: Heng Wang,Man Yuan April, 4 th, 2016 Agenda Introduction to Text Mining Tool & Dataset Data Pre-processing Text Mining on Twitter Summary & Future Improvement

More information

Collaborative Spam Mail Filtering Model Design

Collaborative Spam Mail Filtering Model Design I.J. Education and Management Engineering, 2013, 2, 66-71 Published Online February 2013 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2013.02.11 Available online at http://www.mecs-press.net/ijeme

More information

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Mining User - Aware Rare Sequential Topic Pattern in Document Streams Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,

More information

Analyzing and Detecting Review Spam

Analyzing and Detecting Review Spam Seventh IEEE International Conference on Data Mining Analyzing and Detecting Review Spam Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago nitin.jindal@gmail.com,

More information

Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets

Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets 2016 IEEE 16th International Conference on Data Mining Workshops Inferring Variable Labels Considering Co-occurrence of Variable Labels in Data Jackets Teruaki Hayashi Department of Systems Innovation

More information

IRCE at the NTCIR-12 IMine-2 Task

IRCE at the NTCIR-12 IMine-2 Task IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN)

Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN) , pp.421-426 http://dx.doi.org/10.14257/astl.2017.147.60 Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN) N. Ganesh 1, K. SaiShirini 1, Ch. AlekhyaSri 1 and Venkata

More information

Schematizing a Global SPAM Indicative Probability

Schematizing a Global SPAM Indicative Probability Schematizing a Global SPAM Indicative Probability NIKOLAOS KORFIATIS MARIOS POULOS SOZON PAPAVLASSOPOULOS Department of Management Science and Technology Athens University of Economics and Business Athens,

More information

Anti-Phishing Method for Detecting Suspicious URLs in Twitter

Anti-Phishing Method for Detecting Suspicious URLs in Twitter Anti-Phishing Method for Detecting Suspicious URLs in Twitter Salu Sudhakar 1, Narasimhan T 2 P.G. Scholar, Dept of Computer Science, Mohandas College of engineering and technology Anad, TVM 1 Assistant

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

An Improved Document Clustering Approach Using Weighted K-Means Algorithm An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.

More information

Phishing Activity Trends Report October, 2004

Phishing Activity Trends Report October, 2004 Phishing Activity Trends Report October, 2004 Phishing is a form of online identity theft that uses spoofed emails designed to lure recipients to fraudulent websites which attempt to trick them into divulging

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

An Enhanced Approach for Secure Pattern. Classification in Adversarial Environment

An Enhanced Approach for Secure Pattern. Classification in Adversarial Environment Contemporary Engineering Sciences, Vol. 8, 2015, no. 12, 533-538 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2015.5269 An Enhanced Approach for Secure Pattern Classification in Adversarial

More information

Detection and Mitigation of Web Application Vulnerabilities Based on Security Testing *

Detection and Mitigation of Web Application Vulnerabilities Based on Security Testing * Detection and Mitigation of Web Application Vulnerabilities Based on Security Testing * Taeseung Lee 1, Giyoun Won 2, Seongje Cho 2, Namje Park 3, and Dongho Won 1,** 1 College of Information and Communication

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

Bayesian Spam Detection System Using Hybrid Feature Selection Method

Bayesian Spam Detection System Using Hybrid Feature Selection Method 2016 International Conference on Manufacturing Science and Information Engineering (ICMSIE 2016) ISBN: 978-1-60595-325-0 Bayesian Spam Detection System Using Hybrid Feature Selection Method JUNYING CHEN,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008 Countering Spam Using Classification Techniques Steve Webb webb@cc.gatech.edu Data Mining Guest Lecture February 21, 2008 Overview Introduction Countering Email Spam Problem Description Classification

More information

Phishing Activity Trends

Phishing Activity Trends Phishing Activity Trends Report for the Month of, 27 Summarization of Report Findings The number of phishing reports received rose to 24,853 in, an increase of over 1, from February but still more than

More information

Survey Paper for WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream

Survey Paper for WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 3 Issue 5, May 2014, Page No. 5866-5872 Survey Paper for WARNINGBIRD: Detecting Suspicious URLs in Twitter

More information

Phishing Activity Trends Report August, 2006

Phishing Activity Trends Report August, 2006 Phishing Activity Trends Report, 26 Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

Analysis of Website for Improvement of Quality and User Experience

Analysis of Website for Improvement of Quality and User Experience Analysis of Website for Improvement of Quality and User Experience 1 Kalpesh Prajapati, 2 Viral Borisagar 1 ME Scholar, 2 Assistant Professor 1 Computer Engineering Department, 1 Government Engineering

More information

Ranking Assessment of Event Tweets for Credibility

Ranking Assessment of Event Tweets for Credibility Ranking Assessment of Event Tweets for Credibility Sravan Kumar G Student, Computer Science in CVR College of Engineering, JNTUH, Hyderabad, India Abstract: Online social network services have become a

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Trends Manipulation and Spam Detection in Twitter

Trends Manipulation and Spam Detection in Twitter Trends Manipulation and Spam Detection in Twitter Dr. P. Maragathavalli 1, B. Lekha 2, M. Girija 3, R. Karthikeyan 4 1, 2, 3, 4 Information Technology, Pondicherry Engineering College, India Abstract:

More information

Filtering Spam by Using Factors Hyperbolic Trees

Filtering Spam by Using Factors Hyperbolic Trees Filtering Spam by Using Factors Hyperbolic Trees Hailong Hou*, Yan Chen, Raheem Beyah, Yan-Qing Zhang Department of Computer science Georgia State University P.O. Box 3994 Atlanta, GA 30302-3994, USA *Contact

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Create an Account... 2 Setting up your account... 2 Send a Tweet... 4 Add Link... 4 Add Photo... 5 Delete a Tweet...

Create an Account... 2 Setting up your account... 2 Send a Tweet... 4 Add Link... 4 Add Photo... 5 Delete a Tweet... Twitter is a social networking site allowing users to post thoughts and ideas in 140 characters or less. http://www.twitter.com Create an Account... 2 Setting up your account... 2 Send a Tweet... 4 Add

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

Coursework Completion

Coursework Completion Half Term 1 5 th September 12 th September 19 th September 26 th September 3 rd October 10 th October 17 th October Coursework Completion This first half term will be dedicated to ensuring that all students

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM TASKQUARTERLYvol.19,No4,2015,pp.445 453 SERVICE RECOMMENDATION ON WIKI-WS PLATFORM ANDRZEJ SOBECKI Academic Computer Centre, Gdansk University of Technology Narutowicza 11/12, 80-233 Gdansk, Poland (received:

More information

CS 8803 AIAD Prof Ling Liu. Project Proposal for Automated Classification of Spam Based on Textual Features Gopal Pai

CS 8803 AIAD Prof Ling Liu. Project Proposal for Automated Classification of Spam Based on Textual Features Gopal Pai CS 8803 AIAD Prof Ling Liu Project Proposal for Automated Classification of Spam Based on Textual Features Gopal Pai Under the supervision of Steve Webb Motivations and Objectives Spam, which was until

More information

INTERNATIONAL JOURNAL OF MERGING TECHNOLOGY AND ADVANCED RESEARCH IN COMPUTING ON MULTIMEDIA CONTENT TRUST MODELING APPROACHES SOCIAL TAGGING

INTERNATIONAL JOURNAL OF MERGING TECHNOLOGY AND ADVANCED RESEARCH IN COMPUTING ON MULTIMEDIA CONTENT TRUST MODELING APPROACHES SOCIAL TAGGING ON MULTIMEDIA CONTENT TRUST MODELING APPROACHES SOCIAL TAGGING [1] Soppari Swapna M.Tech(CSE) Sree Dattha Institute Of Engineering & Sciences, Hyd [2] L ROSHINI Assistant professor Computer Science Department

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Sentiment Analysis for Customer Review Sites

Sentiment Analysis for Customer Review Sites Sentiment Analysis for Customer Review Sites Chi-Hwan Choi 1, Jeong-Eun Lee 2, Gyeong-Su Park 2, Jonghwa Na 3, Wan-Sup Cho 4 1 Dept. of Bio-Information Technology 2 Dept. of Business Data Convergence 3

More information

Extraction of Web Image Information: Semantic or Visual Cues?

Extraction of Web Image Information: Semantic or Visual Cues? Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

UNIVERSITY REFERENCING IN GOOGLE DOCS WITH PAPERPILE

UNIVERSITY REFERENCING IN GOOGLE DOCS WITH PAPERPILE Oct 15 UNIVERSITY REFERENCING IN GOOGLE DOCS WITH PAPERPILE By Unknown On Wednesday, October 14, 2015 In Google, Google Docs, Useful Apps With No Comments Many universities and colleges require the use

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Classification Methods for Spam Detection In Online Social Network

Classification Methods for Spam Detection In Online Social Network International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-56 Volume: 4 Issue: 7 July-217 www.irjet.net p-issn: 2395-72 Classification Methods for Spam Detection In Online Social

More information

The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing

The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing Sung Ho Jang, Tae Young Kim, Jae Kwon Kim and Jong Sik Lee School of Information Engineering Inha University #253, YongHyun-Dong,

More information

Phishing Activity Trends Report August, 2005

Phishing Activity Trends Report August, 2005 Phishing Activity Trends Report August, 25 Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial

More information

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA How to cite this paper: Yuhanis Yusof & Omar Hadeb Sadoon. (2017). Detecting video spammers in youtube social media in Zulikha, J. & N. H. Zakaria (Eds.), Proceedings of the 6th International Conference

More information

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington

More information

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY T.Ramya 1, A.Mithra 2, J.Sathiya 3, T.Abirami 4 1 Assistant Professor, 2,3,4 Nadar Saraswathi college of Arts and Science, Theni, Tamil Nadu (India)

More information

SE Labs Test Plan for Q Endpoint Protection : Enterprise, Small Business, and Consumer

SE Labs Test Plan for Q Endpoint Protection : Enterprise, Small Business, and Consumer Keywords: anti-malware; compliance; assessment; testing; test plan; template; endpoint; security; SE Labs SE Labs and AMTSO Preparation Date : July 20, 2017 Documentation Source Dates : June 2017 Version

More information

A Supervised Method for Multi-keyword Web Crawling on Web Forums

A Supervised Method for Multi-keyword Web Crawling on Web Forums Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,

More information

Role of big data in classification and novel class detection in data streams

Role of big data in classification and novel class detection in data streams DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful

More information

Survey on Recommendation of Personalized Travel Sequence

Survey on Recommendation of Personalized Travel Sequence Survey on Recommendation of Personalized Travel Sequence Mayuri D. Aswale 1, Dr. S. C. Dharmadhikari 2 ME Student, Department of Information Technology, PICT, Pune, India 1 Head of Department, Department

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

A Metric for Inferring User Search Goals in Search Engines

A Metric for Inferring User Search Goals in Search Engines International Journal of Engineering and Technical Research (IJETR) A Metric for Inferring User Search Goals in Search Engines M. Monika, N. Rajesh, K.Rameshbabu Abstract For a broad topic, different users

More information

Web Mining Evolution & Comparative Study with Data Mining

Web Mining Evolution & Comparative Study with Data Mining Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India

More information

Chapter 3: Google Penguin, Panda, & Hummingbird

Chapter 3: Google Penguin, Panda, & Hummingbird Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites

More information

Movie Recommendation System Based On Agglomerative Hierarchical Clustering

Movie Recommendation System Based On Agglomerative Hierarchical Clustering ISSN No: 2454-9614 Movie Recommendation System Based On Agglomerative Hierarchical Clustering P. Rengashree, K. Soniya *, ZeenathJasmin Abbas Ali, K. Kalaiselvi Department Of Computer Science and Engineering,

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

A Study of the Correlation between the Spatial Attributes on Twitter

A Study of the Correlation between the Spatial Attributes on Twitter A Study of the Correlation between the Spatial Attributes on Twitter Bumsuk Lee, Byung-Yeon Hwang Dept. of Computer Science and Engineering, The Catholic University of Korea 3 Jibong-ro, Wonmi-gu, Bucheon-si,

More information

A Survey Based on Product Usability and Feature Fatigue Analysis Methods for Online Product

A Survey Based on Product Usability and Feature Fatigue Analysis Methods for Online Product A Survey Based on Product Usability and Feature Fatigue Analysis Methods for Online Product Nirali Patel, Student, CSE, PIET, Vadodara, India Dheeraj Kumar Singh, Assistant Professor, Department of IT,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

An Empirical Performance Comparison of Machine Learning Methods for Spam Categorization

An Empirical Performance Comparison of Machine Learning Methods for Spam  Categorization An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization Chih-Chin Lai a Ming-Chi Tsai b a Dept. of Computer Science and Information Engineering National University

More information

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation

More information

Dukpa Kim FIELDS OF INTEREST. Econometrics, Time Series Econometrics ACADEMIC POSITIONS

Dukpa Kim FIELDS OF INTEREST. Econometrics, Time Series Econometrics ACADEMIC POSITIONS Dukpa Kim Contact Information Department of Economics Phone: 82-2-3290-5131 Korea University Fax: 82-2-3290-2661 145 Anam-ro, Seongbuk-gu Email: dukpakim@korea.ac.kr Seoul, 02841 Korea FIELDS OF INTEREST

More information

Effective Scheme for Reducing Spam in System

Effective Scheme for Reducing Spam in  System Effective Scheme for Reducing Spam in Email System 1 S. Venkatesh, 2 K. Geetha, 3 P. Manju Priya, 4 N. Metha Rani 1 Assistant Professor, 2,3,4 UG Scholar Department of Computer science and engineering

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information