ELM-based spammer detection in social networks

Size: px
Start display at page:

Download "ELM-based spammer detection in social networks"

Transcription

1 J Supercomput DOI.7/s ELM-based spammer detection in social networks Xianghan Zheng,2 Xueying Zhang,2 Yuanlong Yu,2 Tahar Kechadi 3 Chunming Rong 4 Springer Science+Business Media New York 25 Abstract Online social networks, such as Facebook, Twitter, and Weibo have played an important role in people s common life. Most existing social network platforms, however, face the challenges of dealing with undesirable users and their malicious spam activities that disseminate content, malware, viruses, etc. to the legitimate users of the service. The spreading of spam degrades user experience and also negatively impacts server-side functions such as data mining, user behavior analysis, and resource recommendation. In this paper, an extreme learning machine (ELM)-based supervised machine is proposed for effective spammer detection. The work first constructs the labeled dataset through crawling Sina Weibo data and manually classifying corresponding users into spammer and non-spammer categories. A set of features is then extracted from message content and user behavior and applies them to the ELM-based spammer classification algorithm. The experiment and evaluation show that the proposed solution provides excellent performance with a true positive rate of spammers and non-spammers reaching 99 and %, respectively. As the results suggest, the proposed solution could achieve better reliability and feasibility compared with existing SVM-based approaches. B Yuanlong Yu yu.yuanlong@fzu.edu.cn College of Mathematics and Computer Science, Fuzhou University, Fuzhou 358, China 2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou 358, China 3 School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland 4 Department of Electrical Engineering and Computer Science, University of Stavanger, 436 Stavanger, Norway

2 X. Zheng et al. Keywords machine Social network Spammer Machine learning Extreme learning Introduction With the development of science and technology, social networking sites, such as Facebook, Twitter, and Weibo (previously Sina Weibo), have become important platforms for users to interact with their friends, post messages, discuss hot topics and share views, etc. According to a Statista report [], the number of social network users has reached 2.75 billion until June 24, and is estimated to remain around 2.33 billion users globally until the end of 27. However, online social platforms also attract huge interest from spammers to spread advertisements, disseminate pornography and viruses, and expose phishing and so on. The spreading of spam degrades the user experience and also negatively impacts server-side functions such as data mining, user behavior analysis, and resource recommendation [2,3]. According to Nexgate s report [4], during the first half of 23, the growth of social spam has been 355 %, much faster than the growth rate of authentic accounts and messages on most branded social networks. Since spammers typically behave like legitimate users, detecting and discriminating spam is difficult. Therefore, it becomes highly desirable to develop techniques and methods for identifying spammers and their behavior in online social networks. Currently, there have been a few proposals from industry and academia, discussing possible solutions for spammer detection and analysis. These solutions, however, are either ineffective or based on too many considered conditions (lots of content and behavior features, etc.). This paper investigates social spammer content and behavior issues and proposes an effective extreme learning machine (ELM)-based machine learning model for spammer detection. In conclusion, the paper contains the following main contributions:. The paper adopts the spammer feature to detect spammers and test the results over Sina Weibo, the biggest social network site in China. Under the Weibo API, a specific dataset crawler is developed to extract any unauthorized users public messages inside the Weibo platform. This is the first step of the data analysis. 2. The major novelty of the paper is to study a set of the most important features related to message content and user behaviors and then apply them to the ELM-based classification algorithm for spammer detection. The experiment and comparison work shows that the proposed solution provides higher spammer detection accuracy. 3. The paper compares several aspects of the ELM-based approach to the SVM-based approach, including training and testing time and the sensitivity of parameters. The performance comparison further validates the better feasibility, stability and strong generalization ability of ELM algorithm. It is worth mentioning that although the proposed approach is currently tested specifically in the Sina Weibo social network, it is applicable to all other existing social sites with minor revisions. The rest of the paper is organized as follows: Sect. 2 introduces background information related to social networks, social networking platforms, and surveys existing work on social spammer detection. Section 3 illustrates

3 ELM-based spammer detection in social networks the dataset collection and feature extraction related to content and behavior. Section 4 describes the ELM-based spammer detection model, the experiments, and corresponding evaluation. Finally, the conclusion is given in Sect Background and existing works 2. The social network Sina Weibo is one of the largest social networks in China and attracts millions of users online every day. Weibo is a platform based on user relationships and instantly sharing information through short posts not more than 4 characters via computer or mobile phone [5]. Specifically, Weibo contains the following functions: Follower and Following : each user can choose to start following another user to receive the latest messages and statuses of his/her friends. The user who is followed could either accept or reject the request to following back. Post and Repost: short messages with not more than 4 characters, including punctuation. These posted messages are delivered to followers immediately and the message is made public for anyone to read. Mention: represented meaning that the message sender is willing to share something with the user mentioned. Using a mention, a notification will automatically inform the mentioned user that a message has been sent and is available on his/her homepage. Label: users can post messages containing labels (# #) to identify a specific topic. If enough users pick up this topic, it appears in the list of trending topics. 2.2 Machine learning techniques In the field of machine learning, a series of traditional machine learning algorithms were improved to satisfy the higher data processing needs. For instance, the model of Recently SVM, Least Squares SVM, Limited Newton LSVM [6,7], and so on which reduced the difficulty of solving a certain extent, improved the solution speed. However, they still exist two problems: () the solution speed could not satisfy the processing needs for large data; (2) the model related to SVM needs to manual adjustment parameters (C,γ)frequently and repeat training to obtain the optimal solution with tedious time-consuming process and poor generalization ability. Under the circumstances, extreme learning machine provides a new way to solve these problems. Extreme learning machine (ELM) is a novel machine learning model proposed by Huang [8] as a least square-based learning algorithm for single hidden layer neural networks (SLFNNs). In comparison with traditional neural networks which usually employ back propagation (BP) algorithm [9] to train the connection weights, the tedious process of iterative parameter tuning is eliminated and the slow convergence and local minimum problems are avoided. Currently, ELM has been an important research topic due to its high efficiency, easy-implementation, unification of classification and regression, and therefore might be capable to be implemented in social spammer detection field [].

4 X. Zheng et al. 2.3 Existing works In the past ten years, spam detection and filtering mechanisms have been widely implemented. The main work could be summarized into two categories: a contentbased model and an identity-based model. In the content-based model, a series of machine learning approaches [,2] are implemented that parse content according to keywords and patterns that are potentially spam. In the identity-based model, the most commonly used approach is that each user maintains a whitelist and a blacklist of addresses of people whose s should and should not be blocked by antispam mechanism [3,4]. More recent work is to leverage social network into spam identification according to the Bayesian probability [5]. The concept is to use the social relationship between a sender and a receiver to decide the closeness and trust level in a given relationship, and then increase or decrease the Bayesian probability according to these values. With the rapid development of social networks, social spam has attracted a lot of attention from both industry and academia. In industry, Facebook proposes an EdgeRank algorithm [6] that assigns each post with a score generated from a few features (e.g., number of likes, number of comments, number of reposts, etc.). Therefore, the higher EdgeRank scores, the less possibility to be a spammer. The disadvantage of this solution is that spammers could join their networks and continuously like and comment each other to achieve a high EdgeRank score. In academia, Wang [7] proposes a naïve Bayesian-based spammer classification algorithm to distinguish suspicious behaviors from normal ones on Twitter, with the precision result (F-measure) of 89 %. Yard et al. [8] study the behavior feature of a small sample of spammers on Twitter and find that the behavior of spammers is different than legitimate users in regard to posting tweets, followers, friends, and so on. Stringhini et al. [9] further investigate the spammer features by creating a number of honey-profiles in three large social network sites (Facebook, Titter, and Myspace) and identify five common features (followee-to-follower, URL ratio, message similarity, message sent, and friend number) that may help detect potential spammer activity. Gao et al. [2] adopt a set of novel features for effectively reconstructing spam messages into campaigns rather than examining them individually (with precision value over 8 %). Benevenuto et al. [2] collect a large dataset from Twitter and identify 62 features related to tweet content and user social behaviors. These characteristics are regarded as attributes of machine learning process for classifying users as either spammers or non-spammers. Zheng et al. [22] apply a set of features on SVM classifier to detect spammer and obtain a better classification result; however, this approach leads to higher training time and requires manual adjustment in optimized parameter selection. Besides, many of the researchers had suggested a mechanism via setting active Honeypots running without human inspection and logging information of its fans [23,24], and proposed a feature analysis Spammers mechanism and made a comparison on these features. Furthermore, Zachary et al. [25] proposed two stream clustering algorithms, StreamKM++ and DenStream, which were modified to facilitate spam identification. As a summary, the concept of existing social spam detection work is to extract a set of features that distinguish normal users from spammers and apply that information

5 ELM-based spammer detection in social networks into different classifier models to detect suspicious behavior. Due to the differences in the considered data sources and features, different classifiers might achieve different performance. Generally, this paper follows these similar concepts, however, with two distinct points:. Our proposed ELM-based classification model considers only 8 feature items and achieves the best performance result, with the F-measure value reaching over 99 %. This is the best result ever achieved (although different approaches might not be comparable due to difference of collected dataset). 2. As verified by the experiment results, ELM-based classification tends to achieve better generalization performance than SVM-based solutions. The proposed solution is also less sensitive to user-specified parameters and could be easily implemented. 3 Dataset and feature analysis 3. Dataset collection While Sina Weibo provides a relatively complete API for developers, there are still a lot of constraints in the data collection process. Accordingly, a specific data crawler and feature collection mechanism are developed to solve this problem. Figure describes the basic framework of the data collection and feature extraction. Firstly, we randomly selected normal users from Weibo social network. Because most of the normal users are unlikely to follow spammers in reality, we can crawl the list of users who are following other legitimate users. Similarly, those who follow spam accounts are probably also spammers, which improve the degree of mutual concern. Therefore, the sample set of spammers could be obtained from 5 original spammers. For each user, we crawl corresponding information inside 5 recent messages (although the returned real number of microblogs is less than 5). The Weibo API converts each Weibo ID to details message. 3.2 Feature analysis Spammers usually aim at the commercial intent such as advertisement spreading. In the paper, we randomly select 5 spam messages and 5 normal messages respectively from collected dataset, and assign each message with a random integer value ranged from to 5. We also set the maximum number of reposts, comments and likes to. Figure 2a shows the difference in proportion between the original messages posted by normal user and spammer. Most legitimate users post messages to share personal knowledge and feelings with their friends. On the other hand, most spammers repost messages from others, and therefore cause the proportion of original messages less than %. Figure 2b indicates the proportion of messages containing URLs (the proportion of message contains the URL to the total number of messages). This figure shows that most spammers have at least one URL in each message.

6 X. Zheng et al. Data Source Non-Spammers Spammers Data Crawle Followee Crawler Message Crawler User Data Non-Spammers Spammers Username Feature Extraction Number of Followees Weibo API Number of Followers Created Days Message Crawler Messages IDs Converter Reposts Comments Likes ELM based Feature Learning Non-Spammers Spammers Fig. Dataset craw and feature extract Figure 2c displays the difference in average number of friends mentioned. Considering that most spammers focus on advertising and spend little time interacting with friends, the message content is mostly advertising words and pictures. Legitimate users, however, frequently mention their friends and share funny things. To offer a more specific description, this paper also introduces the cumulative distribution function (CDF) to illustrate the distribution of users behavioral characteristics. The cumulative distribution function (Eq. ) describes the probability that a sample of a random variable X will be less than or equal to a value x, where x is a real value. If X is a continuous random variable then F is a continuous function, and conversely. F (x) = P (X x) ()

7 ELM-based spammer detection in social networks The proportion of the original Weibo NonSpammer Spammer (a) Fraction of message containing URL NonSpammer Spammer (b) Average number of friends mentioned NonSpammer Spammer (c) CDF Spammer Non-Spammer 2 3 (d).8.8 CDF (e) Spammer Non-Spammer CDF Spammer Non-Spammer Fig. 2 Distribution and cumulative distribution function of feature, a the proportion of original messages, b the fraction of messages containing URL, c the average number of friends mentioned, d the number of followees, e the fraction of followees per followers, f the number of created days (f) Figure 2d analyzes the number of people following each user. Normally, spammers try to follow a multitude of legitimate users so as to be followed back. However, it does not work most of the time. This behavior, then, makes the fraction of followees per followers very large in comparison to non-spammers, as illustrated in Fig. 2e.

8 X. Zheng et al. Figure 2f reveals the feature difference in the number of created days. Compared with normal users, most spammers usually own less created day because of anti-spam mechanism that would eventually detect and automatically clean spammer accounts. 4 Spammer detection Based on the dataset and feature collection described in the previous section, a supervised machine learning model is introduced for spammer identification. Supervised learning is the machine learning task of inferring a function from labeled training data that consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called supervisory signal). Through analysis of the training data, supervised learning produces a classification model for predicting new examples. 4. Extreme learning machine Extreme learning machine (ELM) [26] is based on the empirical risk minimization theory and makes use of a single-layer feedforward network for the training of single hidden layer feedforward neural networks (SLFNs) (as illustrated in Fig. 3). The learning process needs only one single iteration and avoids multiple iterations and local minimization. Compared with conventional neural network algorithms, ELMs are capable of achieving faster training speeds and can overcome the problem of overfitting. LetbeasetofP different samples D = (x i, o i ), i =,...,P, where {x i } R m, and {o i } R n. Thus the goal is to find a relationship between {x i } and {o i }. Standard single hidden layer feedforward networks (SLFNs) with N nodes could be mathematically modeled by: N y j = h k f ( ) w k, x j k= (2) where j P,w k stands for the parameters of the k element of the hidden layer, h k refers to the weight that connects k hidden element with the output layer, and f Fig. 3 Feedforward neural network with single hidden layer f ( w, x ).. f ( w, x 2 ) f ( w, x 3 ).. f ( w, x N ) h h 2 h 3 h N y

9 ELM-based spammer detection in social networks represents the function that gives the output of the hidden layer. Equation (2) can be expressed in matrix notation as y = G h, where h is the vector of weights of the output layer and G is given by: f (w, x )... f (w N, x ) G = (3) f (w, x P )... f (w N, x P ) where N is the number of hidden nodes. As mentioned above, ELM proposes a random initialization of the parameters in the hidden layer w k, being the weights of the output layer obtained by the Moore Penrose s generalized inverse [27] according to the expression h = G + o, where G + = ( G T G ) G T is the pseudo-inverse matrix (superscript T means matrix transposition). 4.2 ELM-based spammer detection model Figure 4 illustrates the basic concept of the proposed spammer detection model. In this solution, training data are converted into a series of feature vectors that consists of a set of formulated attribute values. These vectors construct the input value of a supervised machine learning algorithm. After training, a classification model is applied to distinguish whether the specific user belongs to either a normal user or spammer. Because spammers and non-spammers have different social behaviors, it is capable to distinguish abnormal behaviors from legitimate ones. In this paper, we used a model based on 8 features, which were the following: the number of followees, the number of followers, the number of messages, the number of friends following each other, the number of favorites, the number of created days, fraction of followees per followers, fraction of original messages, number of messages per day, the average number of reposts, the average number of comments, average number of likes, the average number of URLs, the average number of pictures, the average number of hashtags, the average Social Network Feature Extraction Web Crawler Feature Vectors Classifier Model Detection Results Data Standardization Extreme Learning Machine Fig. 4 Spammer detection model

10 X. Zheng et al. Table Example of confusion matrix Predicted Spammer Non-spammer True Spammer TP FN Non-spammer FP TN number of user mentions, fraction of messages containing URLs, and fraction of messages containing pictures. To evaluate the effectiveness of the experiment results, we consider a confusion matrix illustrated in Table, where true positive (TP) represents the number of spammers correctly classified, false negative (FN) refers to the number of spammers misclassified as non-spammers, false positive (FP ) expresses the number of non-spammers misclassified as spammers, and true negative (TN) is the number of non-spammers classified correctly. According to the confusion matrix, a set of metrics commonly evaluated in machine learning field are introduced, including: precision (P), recall (R) and F-measure (F). P is the ratio of number of instances correctly classified to the total number of instances and is expressed by the formula: P = TP TP + FP (4) R is the ratio of the number of instances correctly classified to the total number of predicted instances and is expressed with the formula: R = TP TP + FN (5) F-measure is the harmonic mean between precision and recall, and is defined as: F = 2RP R + P (6) For an evaluation of classifiers performance, F measure value is more precise because it is a combination value to summarize both the precision and recall value. 4.3 Classification result and comparison The simulation for ELM algorithms is carried out in MATLAB environment running in a Core i5-347, 3.2 GHZ CPU. Table 2 shows a confusion matrix obtained by ELM classifiers. It shows that our proposed solution is quite efficient, with 99. % spammers and 99.9 % non-spammers classified correctly, leaving only a small fraction of spammers and non-spammers misclassified. Table 3 shows the value of evaluation metrics, in which precision, recall, and F measure are calculated for spammer and non-spammer, respectively.

11 ELM-based spammer detection in social networks Table 2 Confusion matrix Predicted Spammer (%) Non-spammer (%) True Spammer Non-spammer Table 3 Classification evaluation Precision Recall F-measure Spammer Non-spammer Table 4 Comparison between ELM and other classifiers Classifier Precision Recall F-measure Spammer Non-spammer Spammer Non-spammer Spammer Non-spammer ELM SVM Decision tree Naïve Bayes Bayes network Table 5 Comparison between ELM and SVM Classifier Training time (s) Testing time (s) ELM SVM We also compare the proposed solution with other classifiers, including: Decision Tree, Naïve Bayes and Bayes Network, with implementation provided by Weka, a Java data mining software. For each classifier, the same evaluation metrics (precision, recall, and F-measure) are calculated for both spammers and non-spammers. With the results illustrated in Table 4, it is clear that both ELM and SVM classifiers are capable of achieving high accuracy. This observation indicates that ELM- and SVM-based approaches could clearly separate training data into two parts with maximum margin. Besides, it is shown that the three other classifiers also achieve good accuracy. This is because suitable features (including content and user behavior) are selected and capable of effectively distinguishing spammers from non-spammers. Furthermore, we compare training and testing time between SVM-based and ELMbased solutions and the experiment results are illustrated in Table 5. The results indicate that the ELM-based solution is much faster than SVM-based solution, and is therefore more efficient.

12 X. Zheng et al. Finally, to further prove the effectiveness of the proposed spammer detection model, we consider two use scenarios, data standardized and data non-standardized. The paper compares the training time together with testing accuracy under different activation functions (Sin, Sig and Hardlim) and different number of hidden nodes (L). The evaluation is illustrated in Figures 5, 6, 7. Train Time (a) sig-zscore sig-non-zscore Number of Hidden Neurons Test Accuracy (b) sig-zscore sig-non-zscore Number of Hidden Neurons Fig. 5 Comparison of training time and testing accuracy on Sig function, a training time on Sig function with different number of hidden nodes, b testing accuracy on Sig function with different number of hidden 6 Train Time (a) sin-zscore sin-non-zscore Number of Hidden Neurons Test Accuracy (b) sin-zscore sin-non-zscore Number of Hidden Neurons Fig. 6 Comparison of training time and testing accuracy under Sin function, a training time on Sin function with different number of hidden nodes, b testing accuracy on Sin function with different number of hidden Train Time (a) hardlim-zscore hardlim-non-zscore Number of Hidden Neurons Test Accuracy (b) hardlim-zscore hardlim-non-zscore Number of Hidden Neurons Fig. 7 Comparison of training time and testing accuracy under hardlim function, a training time on Hardlim function with different number of hidden nodes, b testing accuracy on Hardlim function with different number of hidden

13 ELM-based spammer detection in social networks Figures 5a, 6a, and 7a show that training time is not significantly influenced under different activation functions whether the data are standardized or non-standardized. Testing accuracy (of standardized data), however, is greatly improved in the case of sin activation function (as shown in Fig. 6b). Therefore, we suggest the formulated dataset be standardized before classification. 4.4 Stability enhancement To achieve good generalization performance, the cost parameter C and kernel parameter of SVM[28,29] need to be chosen appropriately. Furthermore, the ELM should also contain the parameter L that could be adjusted. Accordingly, Fig. 8 compares classification performances between the ELM and SVM solution under different parameters for further stability evaluation. We have used 9 different values of C and 9 different values of γ resulting in a total of 8 pairs of result. The result in Fig. 8a shows that the generalization performance of SVM depends greatly on the combination of (C, γ). Therefore, the SVM-based approach might require tedious and time-consuming parameter tuning in real implementation. On the other hand, the generalization performance of ELM tends to monotonically increase with the increasing number of hidden nodes L, and remains stable when L is larger than 5 (see Fig. 8b). Therefore, from the implementation point of view, another advantage of the ELM-based approach is the stability enhancement. 5 Conclusions and future works The paper presents an ELM-based spammer detection method for social network platforms. Using data crawled from Sina Weibo, a set of content and behavior features are extracted and applied into an ELM-based classification algorithm. Through a set Testing Accuracy(%) (a) C Testing Accuracy(%) Fig. 8 The Stability performance under different parameters, a the performance of SVM is sensitive to the parameters (C, γ ), b the performance of ELM is not sensitive to the parameters (L) (b) L

14 X. Zheng et al. of experiments and evaluation work, our proposed solution is proved to be feasible, efficient, and significantly more stable than existing SVM-based models. However, any amount of labeled data might not be enough in a social network environment with a huge quantity of highly diverse characteristics. Therefore, further work on the subject might include the investigation of a collaborative training-based semi-supervised learning model that is capable to train itself automatically based on a small amount of labeled data. On the other hand, features extracted in our proposed solution (and other existing approaches) are based on statistical analysis and manual selection. In the era of big data with huge data volumes and convenient access, feature extraction mechanisms in our solution might be low in adaptability and somewhat costive. Therefore, considering how to import the concept of Machine Learning technology (e.g., deep learning algorithms [3 33]) into automatic feature learning and extraction has become an important question. Acknowledgments This paper is supported by the National Natural Science Foundation of China under Grant No and No.272, the Key Project of Chinese Ministry of Education under Grant No.2286; the Technology Innovation Platform Project of Fujian Province under Grant No. 29J7, No. 23H6 and 23J228; the Key Project Development Foundation of Education Committee of Fujian province under Grand No. JA and JA26. References. Nexgate (23) State of social media spam. Nexgate-23-State-of-Social-Media-Spam-Research-Report.pdf 2. Bhat SY, Abulaish M (23) Community-based features for identifying spammers in online social networks. In: Proceedings of the 23 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 7 3. Grier C, Thomas K, Paxson V et al (2) At spam: the underground on 4 characters or less[c]. In: Proceedings of the 7th ACM conference on computer and communications security. ACM, pp Liu Y, Wu B, Wang B et al (24) SDHM: a hybrid model for spammer detection in Weibo. Advances in Social networks analysis and mining (ASONAM), 24 IEEE/ACM international conference on. IEEE, pp Rong HJ, Ong YS, Tan AH et al (28) A fast pruned-extreme learning machine for classification problem. Neurocomputing 72(): Hsu C-W, Lin C-J (22) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 3(2): Huang GB, Zhu QY, Siew CK (24) Extreme learning machine: a new learning scheme of feedforward neural networks. Neural Networks 24. In: Proceedings 24 IEEE international joint conference on. IEEE, vol 2, pp Hirose Y, Yamashita K, Hijiya S (99) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4():6 66. Shen H, Li Z (24) Leveraging social networks for effective spam filtering. IEEE Trans Comput : Uemura M, Tabata T (28) Design and evaluation of a Bayesian-filter-based image spam filtering method, international conference on information security and assurance (ISA), IEEE, pp Zhou B, Yao Y, Luo J (23) Cost-sensitive three-way spam filtering. J Intell Inf Syst 42(): Jung J, Sit E (24) An empirical study of spam traffic and the use of DNS black Lists. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp

15 ELM-based spammer detection in social networks 4. Antonakakis M, Perdisci R, Dagon D, Lee W, Feamster N (2) Building a dynamic reputation system for DNS, In: Proceedings of the third USENIX workshop on large-scale exploits and emergent threats (LEET) 5. Xu L, Zheng X, Rong C (23) Trust evaluation based content filtering in social interactive data. In: Cloud computing and big data (CloudCom-Asia), 23 international conference on. IEEE, pp Kincaid J (2) EdgeRank: the secret sauce that makes Facebook s news feed tick. TechCrunch 7. Wang AH (2) Don t follow me: Spam detection in twitter. Security and cryptography (SECRYPT), Proceedings of the 2 international conference on. IEEE, pp 8. Yardi S, Romero D, Schoenebeck G (29) Detecting spam in a twitter network. First Monday 5() 9. Stringhini G, Kruegel C, Vigna G (2) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 9 2. Gao H, Chen Y, Lee K et al (22) Towards online spam filtering in social networks, NDSS 2. Benevenuto F, Magno G, Rodrigues T et al (2) Detecting spammers on twitter. Collab, Elect Messag Anti Abuse Spam Conf (CEAS), 6:2 22. Zheng X, Zeng Z, Chen Z et al (25) Detecting spammers on social networks. Neurocomputing 59: Lee K, Caverlee J, Webb S (2) The social honeypot project: protecting online communities from spammers. In: Proceedings of the 9th international conference on World wide web. ACM, pp Zhou Y, Chen K, Song L et al (22) Feature analysis of spammers in social networks with active honeypots: a case study of Chinese microblogging networks. In: Proceedings of the 22 international conference on advances in social networks analysis and mining (ASONAM 22). IEEE Computer Society, pp Miller Z, Dickinson B, Deitrick W et al (24) Twitter spammer detection using data stream clustering. Inf Sci 26: Huang GB, Zhu QY, Siew CK (26) Extreme learning machine: theory and applications. Neurocomputing 7(): Rao CR, Mitra SK (97) Generalized inverse of matrices and its applications. Wiley, New York 28. Ghanty P, Paul S, Pal NR (29) NEUROSVM: an architecture to reduce the effect of the choice of kernel on the performance of SVM. J Mach Learn Res : Huang GB, Ding X, Zhou H (2) Optimization method based extreme learning machine for classification. Neurocomputing 74(): Zheng XH, Chen N, Chen Z et al (24) Mobile cloud based framework for remote-resident multimedia discovery and access. J Intern Technol 5(6): Hinton GE (27) Learning multiple layers of representation. Trends Cogn Sci (): Bengio Y (24) Scaling up deep learning. In: Proceedings of the 2th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, p Zhou S, Chen Q, Wang X (23) Active deep learning method for semi-supervised sentiment classification. Neurocomputing 2:

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Review: Efficient Spam Detection on Social Network

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Review: Efficient Spam Detection on Social Network Review: Efficient Spam Detection on Social Network Girisha Khurana 1, Mr Marish Kumar 2 1 Student, 2 Assistant Professor, Department Of Computer Science GNI Mullana Kurushetra University Abstract With

More information

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Chengfeng Lin alex_lin@sjtu.edu.cn Yi Zhou zy_21th@sjtu.edu.cn Kai Chen kchen@sjtu.edu.cn Jianhua He Aston University j.he7@aston.ac.uk

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Classification Methods for Spam Detection In Online Social Network

Classification Methods for Spam Detection In Online Social Network International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-56 Volume: 4 Issue: 7 July-217 www.irjet.net p-issn: 2395-72 Classification Methods for Spam Detection In Online Social

More information

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets Varsha Palandulkar 1, Siddhesh Bhujbal 2, Aayesha Momin 3, Vandana Kirane 4, Raj Naybal 5 Professor, AISSMS Polytechnic

More information

URL ATTACKS: Classification of URLs via Analysis and Learning

URL ATTACKS: Classification of URLs via Analysis and Learning International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 3, June 2016, pp. 980 ~ 985 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i3.7208 980 URL ATTACKS: Classification of URLs via Analysis

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at:

More information

An ELM-based traffic flow prediction method adapted to different data types Wang Xingchao1, a, Hu Jianming2, b, Zhang Yi3 and Wang Zhenyu4

An ELM-based traffic flow prediction method adapted to different data types Wang Xingchao1, a, Hu Jianming2, b, Zhang Yi3 and Wang Zhenyu4 6th International Conference on Information Engineering for Mechanics and Materials (ICIMM 206) An ELM-based traffic flow prediction method adapted to different data types Wang Xingchao, a, Hu Jianming2,

More information

Discovering Advertisement Links by Using URL Text

Discovering Advertisement Links by Using URL Text 017 3rd International Conference on Computational Systems and Communications (ICCSC 017) Discovering Advertisement Links by Using URL Text Jing-Shan Xu1, a, Peng Chang, b,* and Yong-Zheng Zhang, c 1 School

More information

Fast Learning for Big Data Using Dynamic Function

Fast Learning for Big Data Using Dynamic Function IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Fast Learning for Big Data Using Dynamic Function To cite this article: T Alwajeeh et al 2017 IOP Conf. Ser.: Mater. Sci. Eng.

More information

Collaborative Spam Mail Filtering Model Design

Collaborative Spam Mail Filtering Model Design I.J. Education and Management Engineering, 2013, 2, 66-71 Published Online February 2013 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2013.02.11 Available online at http://www.mecs-press.net/ijeme

More information

A Generic Statistical Approach for Spam Detection in Online Social Networks

A Generic Statistical Approach for Spam Detection in Online Social Networks Final version of the accepted paper. Cite as: F. Ahmad and M. Abulaish, A Generic Statistical Approach for Spam Detection in Online Social Networks, Computer Communications, 36(10-11), Elsevier, pp. 1120-1129,

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

A study of Video Response Spam Detection on YouTube

A study of Video Response Spam Detection on YouTube A study of Video Response Spam Detection on YouTube Suman 1 and Vipin Arora 2 1 Research Scholar, Department of CSE, BITS, Bhiwani, Haryana (India) 2 Asst. Prof., Department of CSE, BITS, Bhiwani, Haryana

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Cost-sensitive C4.5 with post-pruning and competition

Cost-sensitive C4.5 with post-pruning and competition Cost-sensitive C4.5 with post-pruning and competition Zilong Xu, Fan Min, William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363, China Abstract Decision tree is an effective

More information

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA How to cite this paper: Yuhanis Yusof & Omar Hadeb Sadoon. (2017). Detecting video spammers in youtube social media in Zulikha, J. & N. H. Zakaria (Eds.), Proceedings of the 6th International Conference

More information

Network Traffic Classification Based on Deep Learning

Network Traffic Classification Based on Deep Learning Journal of Physics: Conference Series PAPER OPEN ACCESS Network Traffic Classification Based on Deep Learning To cite this article: Jun Hua Shu et al 2018 J. Phys.: Conf. Ser. 1087 062021 View the article

More information

An MCL-Based Approach for Spam Profile Detection in Online Social Networks

An MCL-Based Approach for Spam Profile Detection in Online Social Networks 12 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications An MCL-Based Approach for Profile Detection in Online Social Networks Faraz Ahmed Center of Excellence

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

by the customer who is going to purchase the product.

by the customer who is going to purchase the product. SURVEY ON WORD ALIGNMENT MODEL IN OPINION MINING R.Monisha 1,D.Mani 2,V.Meenasree 3, S.Navaneetha krishnan 4 SNS College of Technology, Coimbatore. megaladev@gmail.com, meenaveerasamy31@gmail.com. ABSTRACT-

More information

Filtering Unwanted Messages from (OSN) User Wall s Using MLT

Filtering Unwanted Messages from (OSN) User Wall s Using MLT Filtering Unwanted Messages from (OSN) User Wall s Using MLT Prof.Sarika.N.Zaware 1, Anjiri Ambadkar 2, Nishigandha Bhor 3, Shiva Mamidi 4, Chetan Patil 5 1 Department of Computer Engineering, AISSMS IOIT,

More information

An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm

An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm A. Lari, A. Khosravi and A. Alfi Faculty of Electrical and Computer Engineering, Noushirvani

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs

Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs Mark van Heeswijk 1 and Yoan Miche 2 and Erkki Oja 1 and Amaury Lendasse 1 1 Helsinki University of Technology - Dept. of Information

More information

Effective Detecting Microblog Spammers Using Big Data Fusion Algorithm

Effective Detecting Microblog Spammers Using Big Data Fusion Algorithm Int'l Conf. on Advances in Big Data Analytics ABDA'16 59 Effective Detecting Microblog Spammers Using Big Data Fusion Algorithm Yang Qiao 1, Huaping Zhang 1, Yanping Zhao 2, Yu Zhang 1, Yu Min 1 1 School

More information

Efficient Classifier for Detecting Spam in Social Networks

Efficient Classifier for Detecting Spam in Social Networks Efficient Classifier for Detecting Spam in Social Networks E.Nalarubiga M.E-Software Engineering Rajalakshmi Engineering College Chennai, India nalarubiga.e.2014.mese@rajalakshmi.edu.in M.Sindhuja, M.E.,

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks An Based on the Temporal-spatial Correlation in Wireless Sensor Networks 1 Department of Computer Science & Technology, Harbin Institute of Technology at Weihai,Weihai, 264209, China E-mail: Liuyang322@hit.edu.cn

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Image Classification using Fast Learning Convolutional Neural Networks

Image Classification using Fast Learning Convolutional Neural Networks , pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing

More information

A Novel Parallel Hierarchical Community Detection Method for Large Networks

A Novel Parallel Hierarchical Community Detection Method for Large Networks A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of

More information

Ranking Assessment of Event Tweets for Credibility

Ranking Assessment of Event Tweets for Credibility Ranking Assessment of Event Tweets for Credibility Sravan Kumar G Student, Computer Science in CVR College of Engineering, JNTUH, Hyderabad, India Abstract: Online social network services have become a

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

Text Classification for Spam Using Naïve Bayesian Classifier

Text Classification for  Spam Using Naïve Bayesian Classifier Text Classification for E-mail Spam Using Naïve Bayesian Classifier Priyanka Sao 1, Shilpi Chaubey 2, Sonali Katailiha 3 1,2,3 Assistant ProfessorCSE Dept, Columbia Institute of Engg&Tech, Columbia Institute

More information

Collecting social media data based on open APIs

Collecting social media data based on open APIs Collecting social media data based on open APIs Ye Li With Qunyan Zhang, Haixin Ma, Weining Qian, and Aoying Zhou http://database.ecnu.edu.cn/ Outline Social Media Data Set Data Feature Data Model Data

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Bayesian Spam Detection System Using Hybrid Feature Selection Method

Bayesian Spam Detection System Using Hybrid Feature Selection Method 2016 International Conference on Manufacturing Science and Information Engineering (ICMSIE 2016) ISBN: 978-1-60595-325-0 Bayesian Spam Detection System Using Hybrid Feature Selection Method JUNYING CHEN,

More information

Think before RT: An Experimental Study of Abusing Twitter Trends

Think before RT: An Experimental Study of Abusing Twitter Trends Think before RT: An Experimental Study of Abusing Twitter Trends Despoina Antonakaki 1, Iasonas Polakis 2, Elias Athanasopoulos 1, Sotiris Ioannidis 1, and Paraskevi Fragopoulou 1 1 FORTH-ICS, Greece {despoina,elathan,sotiris,fragopou}@ics.forth.gr

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

Research on adaptive network theft Trojan detection model Ting Wu

Research on adaptive network theft Trojan detection model Ting Wu International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 215) Research on adaptive network theft Trojan detection model Ting Wu Guangdong Teachers College of Foreign

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

A Cascading Framework for Uncovering Spammers in Social Networks

A Cascading Framework for Uncovering Spammers in Social Networks A Cascading Framework for Uncovering Spammers in Social Networks Zejia Chen, Jiahai Yang, Jessie Hui Wang Tsinghua National Laboratory for Information Science and Technology Dept. of Computer Science and

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Mubug: a mobile service for rapid bug tracking

Mubug: a mobile service for rapid bug tracking . MOO PAPER. SCIENCE CHINA Information Sciences January 2016, Vol. 59 013101:1 013101:5 doi: 10.1007/s11432-015-5506-4 Mubug: a mobile service for rapid bug tracking Yang FENG, Qin LIU *,MengyuDOU,JiaLIU&ZhenyuCHEN

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,

More information

A New Evaluation Method of Node Importance in Directed Weighted Complex Networks

A New Evaluation Method of Node Importance in Directed Weighted Complex Networks Journal of Systems Science and Information Aug., 2017, Vol. 5, No. 4, pp. 367 375 DOI: 10.21078/JSSI-2017-367-09 A New Evaluation Method of Node Importance in Directed Weighted Complex Networks Yu WANG

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

An Efficient Learning Scheme for Extreme Learning Machine and Its Application

An Efficient Learning Scheme for Extreme Learning Machine and Its Application An Efficient Learning Scheme for Extreme Learning Machine and Its Application Kheon-Hee Lee, Miso Jang, Keun Park, Dong-Chul Park, Yong-Mu Jeong and Soo-Young Min Abstract An efficient learning scheme

More information

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily

More information

Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine

Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine Michael Leuenberger and Mikhail Kanevski University of Lausanne - Institute of Earth Surface Dynamics

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

A Method and System for Thunder Traffic Online Identification

A Method and System for Thunder Traffic Online Identification 2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Method and System for Thunder Traffic Online Identification Jinfu Chen Institute of Information

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography

More information

Location-Aware Web Service Recommendation Using Personalized Collaborative Filtering

Location-Aware Web Service Recommendation Using Personalized Collaborative Filtering ISSN 2395-1621 Location-Aware Web Service Recommendation Using Personalized Collaborative Filtering #1 Shweta A. Bhalerao, #2 Prof. R. N. Phursule 1 Shweta.bhalerao75@gmail.com 2 rphursule@gmail.com #12

More information

A platform for automatic identification of phishing URLs in mobile text messages

A platform for automatic identification of phishing URLs in mobile text messages Journal of Physics: Conference Series PAPER OPEN ACCESS A platform for automatic identification of phishing URLs in mobile text messages To cite this article: Xiang Xun Sun et al 208 J. Phys.: Conf. Ser.

More information

Computational Intelligence Meets the NetFlix Prize

Computational Intelligence Meets the NetFlix Prize Computational Intelligence Meets the NetFlix Prize Ryan J. Meuth, Paul Robinette, Donald C. Wunsch II Abstract The NetFlix Prize is a research contest that will award $1 Million to the first group to improve

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Twi$er s Trending Topics exploita4on pa$erns

Twi$er s Trending Topics exploita4on pa$erns Twi$er s Trending Topics exploita4on pa$erns Despoina Antonakaki Paraskevi Fragopoulou, So6ris Ioannidis isocial Mee6ng, February 4-5th, 2014 Online Users World popula6ons percentage of online users: 39%

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Whitepaper US SEO Ranking Factors 2012

Whitepaper US SEO Ranking Factors 2012 Whitepaper US SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics Inc. 1115 Broadway 12th Floor, Room 1213 New York, NY 10010 Phone: 1 866-411-9494 E-Mail: sales-us@searchmetrics.com

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

CSI5387: Data Mining Project

CSI5387: Data Mining Project CSI5387: Data Mining Project Terri Oda April 14, 2008 1 Introduction Web pages have become more like applications that documents. Not only do they provide dynamic content, they also allow users to play

More information

Copyright 2011 please consult the authors

Copyright 2011 please consult the authors Alsaleh, Slah, Nayak, Richi, Xu, Yue, & Chen, Lin (2011) Improving matching process in social network using implicit and explicit user information. In: Proceedings of the Asia-Pacific Web Conference (APWeb

More information

An advanced data leakage detection system analyzing relations between data leak activity

An advanced data leakage detection system analyzing relations between data leak activity An advanced data leakage detection system analyzing relations between data leak activity Min-Ji Seo 1 Ph. D. Student, Software Convergence Department, Soongsil University, Seoul, 156-743, Korea. 1 Orcid

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Research on Design and Application of Computer Database Quality Evaluation Model

Research on Design and Application of Computer Database Quality Evaluation Model Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the

More information

Reversible Image Data Hiding with Local Adaptive Contrast Enhancement

Reversible Image Data Hiding with Local Adaptive Contrast Enhancement Reversible Image Data Hiding with Local Adaptive Contrast Enhancement Ruiqi Jiang, Weiming Zhang, Jiajia Xu, Nenghai Yu and Xiaocheng Hu Abstract Recently, a novel reversible data hiding scheme is proposed

More information

Analysis of Website for Improvement of Quality and User Experience

Analysis of Website for Improvement of Quality and User Experience Analysis of Website for Improvement of Quality and User Experience 1 Kalpesh Prajapati, 2 Viral Borisagar 1 ME Scholar, 2 Assistant Professor 1 Computer Engineering Department, 1 Government Engineering

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Whitepaper Spain SEO Ranking Factors 2012

Whitepaper Spain SEO Ranking Factors 2012 Whitepaper Spain SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations Title A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations Author(s) Wang, C; Hou, Y Citation The IEEE Power and Energy Society (PES) General Meeting, Washington, USA, 27-31

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Link Analysis in Weibo

Link Analysis in Weibo Link Analysis in Weibo Liwen Sun AMPLab, EECS liwen@cs.berkeley.edu Di Wang Theory Group, EECS wangd@eecs.berkeley.edu Abstract With the widespread use of social network applications, online user behaviors,

More information

A study of classification algorithms using Rapidminer

A study of classification algorithms using Rapidminer Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja

More information

Robust Steganography Using Texture Synthesis

Robust Steganography Using Texture Synthesis Robust Steganography Using Texture Synthesis Zhenxing Qian 1, Hang Zhou 2, Weiming Zhang 2, Xinpeng Zhang 1 1. School of Communication and Information Engineering, Shanghai University, Shanghai, 200444,

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Nelder-Mead Enhanced Extreme Learning Machine

Nelder-Mead Enhanced Extreme Learning Machine Philip Reiner, Bogdan M. Wilamowski, "Nelder-Mead Enhanced Extreme Learning Machine", 7-th IEEE Intelligent Engineering Systems Conference, INES 23, Costa Rica, June 9-2., 29, pp. 225-23 Nelder-Mead Enhanced

More information

Content Based Spam Filtering

Content Based Spam  Filtering 2016 International Conference on Collaboration Technologies and Systems Content Based Spam E-mail Filtering 2nd Author Pingchuan Liu and Teng-Sheng Moh Department of Computer Science San Jose State University

More information