Generalization Algorithm For Prevent Inference Attacks In Social Network Data

Size: px
Start display at page:

Download "Generalization Algorithm For Prevent Inference Attacks In Social Network Data"

Transcription

1 Generalization Algorithm For Prevent Inference Attacks In Social Network Data Chethana Nair, Neethu Krishna, Siby Abraham 1Dept of computer Science And Engg, Christ Knowledge City, M annoor, 2,3 Dept Of Computer Science And Engg, M usaliar College Of Engg & Technology cnchethananair@gmail.com, Pathtanamthitta,kerala Abstract - Online social networking has become one of the most popular activities on the web. Online social networks (OS Ns), such as Facebook, are increasingly utilized by many people. OS Ns allow users to control and customize what personal information is available to other users. These networks allow users to publish details about themselves and to connect to their friends. S ome of the information revealed inside these networks is meant to be private. A privacy breach occurs when sensitive information about the user, the information that an individual wants to keep from public, is disclosed to an adversary. Yet it is possible to use learning algorithms on released data to predict private information. Private information leakage could be an important issue in some cases. And explore how to launch inference attacks using released social networking data to predict private information. Desired use of data and individual privacy presents an opportunity for privacy-preserving social network data mining. Then devise three possible sanitization techniques that could be used in various situations. The effect of removing details and links in preventing sensitive information leakage. Removing details and friendship links together is the best way to reduce classifier accuracy. This is probably infeasible in maintaining the use of social networks. Explore the effectiveness of these techniques and attempt to use methods of collective inference to discover sensitive attributes of the data set. Decrease the effectiveness of both local and relational classification algorithms by using the sanitization methods. I. INT RODUCT ION The rapid growth and ubiquity of online social media services has given an impact to the way people interact with each other. Online social networking has become one of the most popular activities on the web. Social network analysis has been a key technique in modern sociology, geography, economics, and information science The data generated by social media services often referred to as the social network data. In many situations, the data needs to be published and shared with others. Social networks are online applications that allow their users to connect by means of various linktypes. As part of their professional network; because of users specify details which are related to their professional life. These sites gather extensive personal information, social network application providers have a rare opportunity direct use of this information could be useful to advertisers for direct marketing. Publish data for others to analyze, even though it may create severe privacy threats, or they can withhold data because of privacy concerns, even though that makes the analysis impossible. A privacy breach occurs when sensitive information about the user, the information that an individual wants to keep from public, is disclosed to an adversary. For examples, business companies are analysing the social connections in social network data to uncover customer relationship that can benefit their services and product sales. The analysis result of social network data is believed to potentially provide an alternative view of real-world phenomena due to the strong connection between the actors behind the network data and real world entities. Social-network data makes commerce much more profitable. On the other hand, the request to use the data can also come from third party applications embedded in the social media application itself. For instance, Facebook has thousands of third party applications and the number is growing exponentially. Even though the process of data sharing in this case is implicit, the data is indeed passed over from the data owner (service provider) to different party (the application). Published by IJRCCT ( Page 60

2 The data given to these applications is usuall notsanitized to protect users privacy. Desired use of data and individual privacy presents an opportunity for privacy-preserving social network data mining. That is, the discovery of information and relationships from social network datawithout violating privacy. II..RELATED WORK The area of privacy inside a social network encompasses a large breadth, based on how privacy is defined. In Anonymized Social Networks [2] consider an attack against an anonymized network. In their model, the network consists of only nodes and edges. Detail values are not included. The goal of the attacker is simply to identify people. Further, their problem is very different than the one considered, because they ignore details and do not consider the effect of the existence of details on privacy. Other papers have tried to infer private information inside social networks. Inference Attacks by Third-Party Extensions to Social Network Systems[1] identify the threat of social networks site API inference attacks, provide a taxonomy of these attacks, and propose a risk assessment scheme to help users understand the risk of subscribing to a third-party application in an extensible SNS. The extension of the metric to account for uneven popularity of authentication questions, and the design of a secure API for extensible SNSs. And create a benchmark, formulate the feasibility predicates, and empirically assess the inference accuracy of the inference algorithms in the benchmark. This would allow us to empirically evaluate the effectiveness of the risk assessment scheme. One limitation of the risk assessment scheme is that it assumes all authentication questions in the benchmark are equally popular. An improvement is to reformulate the metric so that it takes into account the uneven popularity of the authentication questions. An interesting research question would be to determine which version of the risk metric is actually more effective in steering users privacy expectations. Inferring Privacy Information from Social Networks [3], consider ways to infer private information via friendship links by creating a Bayesian network from the links inside a social network. While they crawl a real social network, Live Journal, they use hypothetical attributes to analyze their learning algorithm. Techniques that can help with choosing the most effective details or links that need to be removed for protecting privacy. The effect of collective inference techniques in possible inference attacks. Preserving the privacy of sensitive relationships in graph data[4], a method of link reidentification. That is, they assume that the social network has various link types embedded, and that some of these link types are sensitive. Several methods of social graph anonymization, focusing mainly on the idea that by anonymizing both the nodes in the group and the link structure, that one thereby anonymizes the graph as a whole. However, their methods all focus on anonymity in the structure itself. For example, through the use of kanonymity or t-closeness, depending on the quasi-identifiers which are chosen, much of the uniqueness in the data may be lost. Through our method of anonymity preservation, we maintain the full uniqueness in each node, which allows more information in the data post release. The general method by which they hide links is by either random elimination or by link aggregation. Instead of attempting to identify sensitive links between individuals, we attempt to identify sensitive traits of individuals by using a graph that initially has a full listing of friendship links. Also, instead of random elimination of links between nodes, develop an heuristic for removing those links between individuals that will reduce the accuracy of our classifiers the most. Use of automatic crawlers [5] to gather users profile information for the purpose of launching profile cloning attacks. Once enough personal information is harvested, the attacker can clone the profile of the victim, either within the same SNS, or in an SNS in which the victim is not registered. Armed with the cloned profile, the attacker now attempts to befriend the friends of the victim. Empirical data suggests that the victim s friends will likely consent to the forging of friendship, thus granting the attacker access to their personal information. Both our work and theirs Published by IJRCCT ( Page 61

3 attempt to gauge the impact of inference attacks through some form of operationalization: gaining the trust of the victim s friends in, and subverting authentication in ours. However, in that third -party applications are controlled by a different access control model than a user-mimicking crawler. The data made available in the two contexts are also different. III. EXISTING SYSTEM Existing work consider only ways to infer private information via friendship links by creating a Bayesian network from the links inside a social network. Infer private information inside social networks. While they crawl a real social network, Live Journal, they use hypothetical attributes to analyze their learning algorithm. Use hypothetical attributes to analyze learning algorithm. The threat of social networks site API inference attacks, provide a taxonomy of these attacks, and propose a risk assessment scheme to help users understand the risk of subscribing to a third-party application in an extensible SNS. The extension of the metric to account for uneven popularity of authentication questions, and the design of a secure API for extensible SNSs. And create a benchmark, formulate the feasibility predicates, and empirically assess the inference accuracy of the inference algorithms in the benchmark. This would allow us to empirically evaluate the effectiveness of the risk assessment scheme. One limitation of the risk assessment scheme is that it assumes all authentication questions in the benchmark are equally popular. An improvement is to reformulate the metric so that it takes into account the uneven popularity of the authentication questions. An interesting research question would be to determine which version of the risk metric is actually more effective in steering users privacy expectations. In anonymized network, the network consists of only nodes and edges. Detail values are not included. The goal of the attacker is simply to identify people. Further, their problem is very different than the one considered, because they ignore details and do not consider the effect of the existence of details on privacy. Other papers have tried to infer private information inside social networks. Use of automatic crawlers to gather users profile information for the Modified NaiveBayes algorithm predicts privacy sensitive trait information using both node traits and link structure. The accuracy of our learning method based on link structure against the accuracy of our learning method based on node traits. The existing work could model and analyze access control requirements with respect to collaborative authorization management of shared data in OSNs. The need of joint management for data sharing, especially photo sharing, in OSNs has been recognized by the recent work provided a solution for IV. SOCIAL NETWORK ARCHITECTURE A high level system component of social network is shown in Figure 1. In the architecture, there are users, social media services, data owner and third party data recipients. Online social media services have been provided in many forms. Generally, there are six different forms of social media: collaborative projects, blogs, content communities, social networking sites, virtual game worlds, and virtual communities. Social media service users can be any real world entity that uses the servicor organizatwhen a user uses an purpose of launching profile cloning attacks. Once enough personal information is harvested, the attacker can clone the profile of the victim, either within the same SNS, or in an SNS in which the victim is not registered. Armed with the cloned profile, the attacker now attempts to befriend the friends of the victim. Empirical data suggests that the victim s friends will likely consent to the forging of friendship, thus granting the attacker access to their personal information. Both our work and theirs attempt to gauge the impact of inference attacks through some form of operationalization: gaining the trust of the victim s friends in, and subverting authentication in ours. However, in that third-party applications are controlled by a different access control model than a user-mimicking crawler. The data made available in the two contexts are also different. Problem of inferring private traits using real-life social network data and possible sanitization approaches to prevent such inference. A modification of NaiveBayes classification that is suitable for classifying large amount of social network data. collective privacy management in OSNs. Their work considered access control policies of a content that is co-owned by multiple users in an OSN, such that Published by IJRCCT ( Page 62

4 each co-owner may separately specify her/his own privacy preference for the shared content. Disadvantages Of Existing System: Problem of private information leakage could be an important issue in some cases. Attacker is simply to identify people. online social media service, they usually are asked to create a profile and to give information about themselves. This information includes personal identifiable information like s ocial security number, name and phone number which uniquely identify a person. Sensitive information can include religion, political view, type of disease (as in healthcare network) or generated income (as in financial network). There are also data generated from the social activity from the services. In many situations, the data needs to be published and shared with others. The data usually contain valuable information that can enable better social targeting of advertisements. The Social networking sites, the most famous form of social media are applications that enable participants to connect by creating personal information profiles, inviting friends and colleagues to have access to those profiles, and sending s and instant messages between each other. These personal profiles can include any type of information, including photos, video, audio files, and blogs. Indeed, this form mixes several social media types into one package. Facebook (facebook.com) is the most popular application of this kind where it currently has more than 500 million active users and they spend over 700 billion minutes per month of using the application. Privacy concerns of individuals in a social network can be classified into two categories: privacy after data release, and private information leakage. Instances of privacy after data release involve the identification of specific individuals in a data set subsequent to its release to the general public or to paying customers for a specific usage. Private information leakage, conversely, is related to details about an individual that are not explicitly stated, but, rather, are inferred through other details released and/ or relationships to individuals who may express that detail. online social network data could be used to predict some individual private detail that a user is not ly list their affiliation, but also through inference could determine the affiliation of other users in their data, this would obviously be a privacy violation of hidden details. Explore how the online social network data could be used to predict some individual private detail that a user is not willing to disclose (e.g., political or religious affiliation, sexual orientation) and explore the effect of possible data sanitization approaches on preventing such private information leakage, while allowing the recipient of the sanitized data to do inference on non-private details. Explore willing to disclose (e.g., political or religious affiliation,) and explore the effect of possible data sanitization approaches on preventing such private information leakage, while allowing the recipient of the sanitized data to do inference on nonprivate details. 4.1Learning Methods On Social Networks Social network data could be used to predict some individual private detail that a user is not willing to disclose. The problem of private information leakage for individuals as a direct result of their actions as being part of an online social network. A privacy breach occurs when sensitive information about the user, the information that an individual wants to keep from public, is disclosed to an adversary. Yet it is possible to use learning algorithms on released data to predict private information. Private information leakage could be an important issue in some cases. And explore how to launch inference attacks using released social networking data to predict private information. Model an attack scenario as follows: Suppose Facebook wishes to release data to electronic arts for their use in advertising games to interested people. However, once electronic arts has this data, they want to identify the political affiliation of users in their data for lobbying efforts. Because they would not only use the names of those individuals who explicit the effectiveness of these techniques and attempt to use methods of collective inference to discover sensitive attributes of the data set. Decrease the Published by IJRCCT ( Page 63

5 effectiveness of both local and relational classification algorithms by using the sanitization methods. The problem of sanitizing a social network to prevent inference of social network data and then examines the effectiveness of those approaches on a real-world data set. In order to protect privacy, sanitize both details and the underlying link structure of the graph. That is, delete some information from a user s profile and remove some links between friends. Also examine the effects of generalizing detail values to more generic values. Figure 2 illustrates an example of social network as a graph. The vertices usually represent real world actors or entities like individuals or organizations. Each vertex has a profile that usually contains personal attributes, such as name, gender, birth date, political view, religion etc. These individuals are usually connected by edges to represent some sort of social tie or link made between them. For example, in Social Networking Sites, these edges represent the connected friend each member has. Therefore, edge can also have its attributes to describe the properties of the connection. Definition 1:- A social network is represented as a graph, G ={ѵ, Ʃ, D}, where ѵ is the set of nodes in the graph, wher each node ni represents a unique user of the social network. Ʃ represents the set of edges in the graph, which are the links defined in the social network. For any friendship link between user ni and user nj, we assume that both ε Ʃ and ε D is the set of details from the social network. set of all detail types is represented by Ҥ. A detail value is a string defined over an alphabet Ʃ that represents a user s input for a detail type. A detail is a (detail type, detail value) pair, represented uniquely by an identifier. i isthe jth (detail type, detail value) pair specified by the user ni. is the set of all I for a node ni. Ɗ is the set of for all i. To evaluate the effect that changing a pers on s details has on their privacy, first create a learning method that could predict a person s private details (for the sake of example, assume that political affiliation is unspecified for some subset of our population). To understand the feasibility of possible inference attacks and the effectiveness of various sanitization techniques combating against those attacks, initially used a simple naive Bayes classifier. Using naive Bayes as our learning algorithm allowed us to easily scale our implementation to the large size and diverseness of the Facebook data set. It also has the added advantage of allowing simple selection techniques to remove detail and link information when trying to hide the class of a network node. Finally, it has shown itself to be extremely effective in these classification tasks Naïve Bayes Classification Determining an individual s political affiliation is an exercise in graph classification. Given a node ni with m details and p potential classification labels C 1 to C x, the probability of ni being in class Cx, is given by the equation Naïve Bayes on Friendship Links The problem of determining the class detail value of person ni given their friendship links using a naive Bayes model. That is, of calculating Using friendship link, from person ni to nj is, Definition 2.:- A detail type is a string defined over an alphabet Ʃ that represents a specific category name within the social network details set. The Published by IJRCCT ( Page 64

6 4.1.3 Weighing Friendships There are many ways to weigh friendship links, the method used is very easy to calculate and is based on the assumption that the more public details two people share, the more private details they are likely to share. The formula for W i,j, which represents the weight of a friendship link from ni to node nj, 4.2 Network Classification Collective inference is a method of classifying social network data using a combination of node details and connecting links in the social graph. Each of these classifiers consists of three components: a local classifier, a relational classifier, and a collective inference algorithm Local Classifiers Local classifiers are a type of learning method that are applied in the initial step of collective inference. Typically, it is a classification technique that examines details of a node and constructs a classification scheme based on the details that it finds there. The naive Bayes classifier builds a model based on the details of nodes in the training set. It then applies this model to nodes in the testing set to classify them Relational Classifiers The relational classifier is a separate type of learning algorithm that looks at the link structure of the graph, and uses the labels of nodes in the training set to develop a model which it uses to classify the nodes in the test set Collective Inference Methods Collective inference attempts to make up for these deficiencies by using both local and relational classifiers in a precise manner to attempt to increase the classification accuracy of nodes in the network. By using a local classifier in the first iteration, collective inference ensures that every node will have an initial probabilistic classification, referred to as a prior. The algorithm then uses a relational classifier to reclassify nodes. At each of these steps i > 2, the relational classifier uses the fully labeled graph from step i - 1 to classify each node in the graph. The collective inference method also controls the length of time the algorithm runs. Some algorithms specify a number of iterations to run, while others converge after a general length of time. Each step i, the algorithm uses the probability estimates, not a single classified label, from step i - 1 to calculate new probability estimates. Further, to account for the possibility that there may not be a convergence, there is a decay rate, called α set to 0.99 that discounts the weight of each subsequent iteration compared to the previous iterations. 4.3 Hiding Private Information The result of a differential private algorithm is very similar with or without the data of any single user. Privacy guarantees that the change in one record does not change the result too much. On the other hand, this definition does not protect against the building of an accurate data mining model that can predict sensitive information. Actually many differentially private data mining algorithms have been developed that has similar accuracy to no differentially private versions. Since our goal is to release rich social network data set while preventing sensitive detail disclosure through data mining techniques, differential privacy definition is not directly applicable in our scenario. Release rich social network data set while preventing sensitive detail disclosure through data mining techniques. Two issues, Understanding sensitive information, that used by the adversary can use to launch an inference attack. It is impossible to provide absolute privacy guarantees with respect to all background knowledge. Analyze the potential success of inference attack. To limit the success of an adversary with respect to a given set of classifiers Formal Privacy Definition Published by IJRCCT ( Page 65

7 Privacy definition focuses on preventing inference attacks. Background knowledge, K, is some data that is not necessarily directly related to the social network, but that can be obtained through various means by an attacker. Additional accuracy gained by the attacker represented by max = C- Set of given classifiers Ć- Classification accuracy. Pć(K) -sensitive hidden data. Pc(G,K) - prediction accuracy of the classifier. = 0, attacker does not gain additional accuracy in predicting sensitive hidden data Manipulating Details Manipulated in three ways 1.Adding details to nodes 2.Modifying existing details. 3.Removing details from nodes. ify these into two categories: Perturbation and Anonymization. Choosing Details: Choose which details to remove. Globally remove the most representative details given from, ie, probability on a network level has the highest correlation with a protected class label. Most highly indicative of a class and remove Manipulating Link Information Option for anonymizing social networks is altering links. Unlike details, there are only two methods of altering the link structure: adding or removing links. evaluate the effects of Privacy on removing friendship links instead of adding fake link. Determining detail type using friendship links from = Ʃ. 4.4 Detail Generalization To combat inference attacks on privacy, to provide detail anonymization for social networks. By doing this, to reduce the value of to an acceptable threshold value that matches the desired utility/privacy tradeoff for a release of data. A detail generalization hierarchy (DGH) is an anonymization technique that generates a hierarchical ordering of the details expressed within a given category. The resulting hierarchy is structured as a tree, but the generalization scheme guarantees that all values substituted will be an ancestor, and thus at a maximum may be only as specific as the detail the user initially defined. Detail value decomposition (DVD )is a process by which an attribute is divided into a series of representative tags. These tags do not necessarily reassemble into a unique match to the original attribute Generalization Algorithm Generalize(,G) G While Classify(G) Classify(G ) <= do S all details that can be further generalized s gehighestinfogainattrib(s) Gen(s,G ) end while return g Generalization algorithm determining which attributes can be further generalized without complete removal and keeps a list of the accuracy of this generalization. At the end of each round, we permanently store the individual detail type that provides the greatest privacy. the changed graph,, meets the chosen privacy requirement, savings. V. EXPERIMENTS 5.1 Data Gathering A program to crawl the Facebook network together data for the experiments. Written in Java 1.6, the crawler loaded a profile, parsed the details out of the HTML, and stored the details inside a MySQL database. Then, the crawler loaded all friends of the current profile and stored the friends inside the Published by IJRCCT ( Page 66

8 database both as friendship links and as possible profiles to later crawl. Because of the sheer size of Facebook s social network, the crawler was in limited small network. This means that if two people share a common friend that is outside the network, this is not reflected inside the database. Also, some people have enabled privacy restrictions on their profile which prevented the crawler from seeing their profile details. The total time for the crawl was seven days. Because the data inside a Face book profile is free form text, it is critical that the input be normalized.. The normalization method use is based upon a Porter stemmer. To normalize a detail, it was broken into words and each word was stemmed with a Porter stemmer then recombined. Two details that normalized to the same value were considered the same for the purposes of the learning algorithm. Total crawl resulted in over 167,000 profiles, almost 4.5 million profile details, and over 3 million friendship links.in the graph representation, one large central group of connected nodes that had a maximum path length of 16. Only 22 of the collected users were not inside this group. Some general statistics of our Facebook data set, including the diameter mentioned above. Common knowledge leads us to expect a small diameter in social networks. Note that, although popular, not every person in society has a Facebook account and even those who do still do not have friendship links to every person they know. Additionally, given the limited scope of crawl, it is possible that some connecting individuals maybe outside thenetwork. This consideration allows us to reconcile the information presented in observed network diameter. change. This can account for the decrease in accuracy of the links classifier. Additionally, there is a severe drop in the classification accuracy after the removal of a single detail. However, when looking at the data, this can be explained by the removal of a detail that is very indicative of the conservative class value. When we remove this detail, the probability of being conservative drastically decreases, which leads to a higher number of incorrect classifications. When remove the second detail, which has a similar likelihood for the Liberal classification, then the class value probabilities begin to trend downward at a much smoother rate. Much more volatile classification accuracy. This appears to be as a result of the wider class size disparity in the underlying data.. For instance when remove five details, have lowered the classification accuracy, but for the sixth and seventh details, see an increase in classification accuracy. Then, again see another decrease in accuracy when remove the eighth detail. Link remove generally more stable downward trend, with only a few exceptions. Combined Removal While each measure provides a decrease in classification accuracy, also test what happens in data set if we remove both details and links. To do this, conduct further experiments where we test classification accuracy after removing 0 details and 0 links (the baseline accuracy),0 details and 10 links, 10 details and 0 links, and 10 detailsand 10 links The original class likelihood for those details which will be used as experimental class values. 5.2 Experimental Setup Implemented Detail Removal can be seen from the results, methods are generally successful at reducing the accuracy of classification tasks. Removing the details most highly connected with a class is accurate across the details and average classifiers. Counterintuitively, perhaps, is that the accuracy of our links classifier is also decreased as we remove details. The details of two nodes are compared to find a similarity. Remove details from the network, the set of similar nodes to any given node will also Published by IJRCCT ( Page 67

9 this situation, all three classifiers perform similarly. The greatest variance occurs when remove Numbers because after removing 12 links, to create a number of isolated groups of few nodes or single, disconnected nodes. Additionally, when removed 13details, These sets as 0 details, 0 links; 10 details,0 links; 0 details, 10 links; 10 details, 10 links removed, respectively. Following this, we want to gauge the accuracy of the classifiers for various ratios of labeled versus unlabeled graphs. To do this, we collect a list of all of the available nodes, as discussed above. We then obtain a random permutation of this list using the Java function built-in to the collections class. Next, we divide the list into attest set and a training set, based on the desired ratio. The Average Only algorithm substantially outperformed traditional naive Bayes and the Links Only algorithm. Additionally, the Average Only algorithm generally performed better than the Details Only algorithm with the exception of the(0 details, 10 links) experiments. Also, as a verification of expected results, the Details Only classification accuracy only decreased significantly when removed details from nodes, while the (0 details) accuracies are approximately equivalent. Similarly, the Link Only accuracies were mostly affected by the removal of links between nodes, while the (*, 0 links) points of interest are approximately equal. The difference in accuracy between (0 details, 0 links) and (10 details, 0 links) can be accounted for by the weighting portion of the Links Only calculations, which depends on the similarity between two nodes. These results indicate that the average and details classifiers generally perform at approximately the same accuracy level. The Links Only classifier, however, generally performs significantly worse except in the case where 10 details and no links are removed. In details alone. It may be unexpected that the Links Only classifier has such varied accuracies as a result of removing details, but since our calculation of probabilities for that classifier uses a measure of similarity between people, the removal of details may affect that classifier. To generate the DGH for each activity, book, and show/movie, used Google directories. To generate the DVD for Music, used the Last.fm tagging system. To generate the hierarchy for Groups, we used the classification criteria from the Facebook page of that group. To account for the freeform tagging that Last.fm allows, also store the popularity for each tag that a particular detail has. Last.fm indicates this through the presentation of tags on the page. The font size for a tag is representative of how many users across the system have defined thatparticular tag for the music type. Then keep a list of tag recurrence (weighted by strength) for each Published by IJRCCT ( Page 68

10 user. For Music anonymization, eliminate the lowest scoring tags. A naive Bayes classifier and the implementation of SVM from Weka. Findings from domain generalization. A comparison of simply using K to guess the most populated class from background knowledge, the result of generalizing all trait types, generalizing no trait types, and when we generalize the best single performing trait type (activities). Method of generalization (seen through the All and Activities lines) does indeed decrease the accuracy of classification on the data set. Interestingly, while previous work indicates that group memberships the dominant detail in classification, we see the most benefit here from generalizing only the Activities detail. This is due to the fact that Activities generally have a far larger range of generalization values, because the trees for these detail types are taller than those of groups. Next, show that given a desired increase are able to determine what level to anonymize the data set to. Require less privacy from anonymized graph, fewer categories are generalized to any degree. Groups is most consistently anonymized completely until the required privacy allowances 20 percent. This may be because the nature of the music detail is that it allows us more easily to include or remove details to fit arequired privacy value. Rather than, say, the activities detail type, which has a fixed hierarchy, music has a loosely collected group of tags. Collective Inference Results The Facebook data, there are a limited number of groups that are highly indicative of an individual s political affiliation. When removing details, these are the first that are removed. Assume that conducting the collective inference classifiers after removing only one detail may generate results that are specific forthe particular detail we classify for. For that reason, consider only the removal of 0 details and10 details, the other lowest point on the classification accuracy.. For each, store the predictions made by the details only, linksonly, and average classifiers and use those as the priors forthe NetKit toolkit. For each of those priors, test the final accuracy of the cdrn, wvrn, nlb, and nbc classifiers. For each of the five sets generated for each of the four points of interest. Then take the average of their accuracies for the final accuracy. The results of our experiments using relaxation labeling. The difference in the local classifier and iterative classification steps of experiments indicate that Relaxation Labeling almost always performs better than merely predicting the most frequent class. Generally, it performs at near 80 percent accuracy, which is an increase of approximately 30 percent in their data sets. Relaxation Labeling typically performed no more than approximately 5 percent better than predicting the majority class for political affiliation. This is also substantially less accurate than using only local classifier. Performance is at least partially because our data set is not densely connected. There is very little significant difference in the collective inference classifiers except for cdrn, which performs significantly worse on data sets where there is a small training set. These results also indicate that our Average classifier consistently outperforms relaxation labeling on the pre- and post anonymized data sets. Additionally, while the local classifier s accuracy is directly affected by the removal of details and/or links, this relationship is not shown by using relaxation labeling with the local classifiers as a prior. For each pair of the figures mentioned, the relational classifier portion of the graph remains constant, only the local classifier accuracy changes. From these, the most anonymous graph, meaning the graph structure that has the lowest predictive accuracy, is achieved when remove both details and links from the graph. Effect of Sanitization on Other Attack Techniques further test the removal of details as an anonymization technique by using a variety of different classification algorithms to test the effectiveness of our method. For each number of details removed, we began by removing the indicated number of details in accordance with the method as described in tenfold cross validation on this set 100 times, and conduct this for 0-20 details removed. Effective at reducing the classification of networks for those details which we have classified as sensitive. While the specific accuracy reduction is varied by the number of details removed and by the specific algorithm used for classification, in fact reduce the accuracy across a broad range of classifiers.. Also that decision trees are affected the most, with a roughly 35 percent reduction in classification accuracy. This indicates that by using a Bayesian classifier to perform sanitization, which Published by IJRCCT ( Page 69

11 makes it easier to identify the individual details that make a class label more likely, decrease the accuracy of a far larger set of classifiers. We also see similar results with our generalization method While the specific value of privacy which was defined for naive Bayes does not exactly hold, we still see that by performing generalization, we are able to decrease classification accuracy across multiple types of classifier. VI. CONCLUSION Desired use of data and individual privacy presents an opportunity for privacy-preserving social network data mining. That is, the discovery of information and relationships from social network data without violating privacy. Then devise three possible sanitization techniques that could be used in various situations. Using both friendship links and details together gives better predictability than details alone. In addition, the effect of removing details and links in preventing sensitive information leakage. In the process, discovered situations in which collective inferencing does not improve on using a simple local classification method to identify nodes. Combine the results from the collective inference implications with the individual results, removing details and friends hip links together is the best way to reduce classifier accuracy. This is probably infeasible in maintaining the use of social networks. Removing only details, greatly reduce the accuracy of local classifiers, which give us the maximum accuracy that able to achieve through any combination of classifiers. Assumed full use of the graph information when deciding which details to hide. Useful research could be done on how individuals with limited access to the network could pick which details to hide. The problem of sanitizing a social network to prevent inference of social network data and then examines the effectiveness of those approaches on a real-world data set. In order to protect privacy, sanitize both details and the underlying link structure of the graph. That is, delete some information from a user s profile and remove some links between friends. VII. FUTURE ENHANCEMENT Future work could be conducted in identifying key nodes of the graph structure to see if removing or altering these nodes can decrease information leakage. VIII. REFERENCES [1] Seyed Hossein Ahmadinejad, mohd anwar and philip w. l. fong(2010). Inference Attacks by Third-Party Extensions to Social Network Systems. [2] Backstrom, c. dwork, and j. kleinberg(2010), Wherefore Art Thou r3579x?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography, Proc. 16th Int l Conf. World Wide Web (WWW 07), pp [3] j. he, w. chu, and v. liu(2006), Inferring Privacy Information from Social Networks, Proc. Intelligence and Security Informatics. [4] E. Zheleva And L. Getoor(2008), Preserving The Privacy Of Sensitive Relationships In Graph Data, Proc. First Acm Sigkdd Int l Conf. Privacy, Security, And Trust In Kdd, Pp [5] L. Bilge, T. Strufe, D. Balzarotti, And E. Kirda(2009), All Your Contacts Are Belong To Us, In Proceedings Of Www 09, Madrid, Spain, Pp [6] Ratan Dey, Cong Tang, Keith Ross And Nitesh Saxena(2009). Estimating Age Privacy Leakage In Online Social Networks [7] L. Sweeney(2002), K-Anonymity: A Model For Protecting Privacy, Int l J. Uncertainty, Fuzziness And Knowledge-Based Systems, Pp [8] A. Friedman And A. Schuster(2010), Data Mining With Differential Privacy, Proc. 16th Acm Sigkdd Int l Conf. Knowledge Discovery And Data Mining, Pp [9] C. Clifton, Using Sample Size To Limit Exposure To Data Mining, J. Computer Security, Vol. 8, Pp , Citation.Cfm?Id= , Dec [10] K. Tumer And J. Ghosh, Bayes Error Rate Estimation Using Classifier Ensembles, Int l J. Smart Eng. System Design, Vol. 5,No. 2, Pp , Published by IJRCCT ( Page 70

12 [11] C. Van Rijsbergen, S. Robertson, And M. Porter, New Models In Probabilistic Information Retrieval, Technical Report 5587, British Library, [12] D.J. Watts And S.H. Strogatz, Collective Dynamics Of Small- World Networks, Nature, Vol. 393, No. 6684, Pp , June E. Steel And G. A. Fowler, Facebook In Privacy Breach, The Wall Street Journal, Oct [13] J. He, W. W. Chu, And Z. V. Liu, Inferring Privacy Information From Social Network, In Proceedings Of ISI 06, Ser. LNCS, Vol San Diego, CA, USA: Springer, May 2006, Pp [14] W. Xu, X. Zhou, And L. Li, Inferring Privacy Information Via Social Relations, In Proceedings Of The 24th IEEE ICDE Workshop, Cancun, Mexico, Apr [15] L. Bilge, T. Strufe, D. Balzarotti, And E. Kirda, All Your Contacts Are Belong To Us, In Proceedings Of WWW 09, Madrid, Spain, Apr. 2009, Pp [16] E. Zheleva And L. Getoor, To Join Or Not To Join, In Proc. WWW 09, Madrid, Spain, Apr. 2009, Pp Published by IJRCCT ( Page 71

Sanitization Techniques against Personal Information Inference Attack on Social Network

Sanitization Techniques against Personal Information Inference Attack on Social Network Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 12, December 2014,

More information

Inference Attacks by Third-Party Extensions to Social Network Systems

Inference Attacks by Third-Party Extensions to Social Network Systems Inference Attacks by Third-Party Extensions to Social Network Systems Seyed Hossein Ahmadinejad Mohd Anwar Philip W. L. Fong Department of Computer Science University of Calgary Calgary, Alberta, Canada

More information

Distributed Data Anonymization with Hiding Sensitive Node Labels

Distributed Data Anonymization with Hiding Sensitive Node Labels Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy

More information

CLUSTER BASED ANONYMIZATION FOR PRIVACY PRESERVATION IN SOCIAL NETWORK DATA COMMUNITY

CLUSTER BASED ANONYMIZATION FOR PRIVACY PRESERVATION IN SOCIAL NETWORK DATA COMMUNITY CLUSTER BASED ANONYMIZATION FOR PRIVACY PRESERVATION IN SOCIAL NETWORK DATA COMMUNITY 1 V.VIJEYA KAVERI, 2 Dr.V.MAHESWARI 1 Research Scholar, Sathyabama University, Chennai 2 Prof., Department of Master

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

OSN Attack Automated Identity Theft Attacks

OSN Attack Automated Identity Theft Attacks OSN Attack Automated Identity Theft Attacks John LePage Department of Electrical & Computer Engineering Missouri University of Science and Technology jlpc5@mst.edu 9 November 2016 2014 John LePage Introduction

More information

Mechanisms of Multiparty Access Control in Online Social Network

Mechanisms of Multiparty Access Control in Online Social Network Mechanisms of Multiparty Access Control in Online Social Network Suvitha.D Department of CSE, Sree Sastha Institute of Engineering and Technology, Chennai, India Abstract-In this paper, Online Social Networks

More information

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors Neil Zhenqiang Gong Iowa State University Bin Liu Rutgers University 25 th USENIX Security Symposium,

More information

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,

More information

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Review on Techniques of Collaborative Tagging

Review on Techniques of Collaborative Tagging Review on Techniques of Collaborative Tagging Ms. Benazeer S. Inamdar 1, Mrs. Gyankamal J. Chhajed 2 1 Student, M. E. Computer Engineering, VPCOE Baramati, Savitribai Phule Pune University, India benazeer.inamdar@gmail.com

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

A Review on Privacy Preserving Data Mining Approaches

A Review on Privacy Preserving Data Mining Approaches A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana

More information

Reciprocal Access Direct for Online Social Networks: Model and Mechanisms

Reciprocal Access Direct for Online Social Networks: Model and Mechanisms Reciprocal Access Direct for Online Social Networks: Model and Mechanisms RAVULA VENKATESH M. Tech scholar, Department of CSE Vijay Rural Engineering College, Jntuh Email-id: Venky.5b8@Gmail.Com N.SWAPNA

More information

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust G.Mareeswari 1, V.Anusuya 2 ME, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu,

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Image Similarity Measurements Using Hmok- Simrank

Image Similarity Measurements Using Hmok- Simrank Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

CSI5387: Data Mining Project

CSI5387: Data Mining Project CSI5387: Data Mining Project Terri Oda April 14, 2008 1 Introduction Web pages have become more like applications that documents. Not only do they provide dynamic content, they also allow users to play

More information

Leveraging Social Links for Trust and Privacy

Leveraging Social Links for Trust and Privacy Leveraging Social Links for Trust and Privacy Antonio Cutillo, Refik Molva, Melek Önen, Thorsten Strufe EURECOM Sophia Antipolis refik.molva@eurecom.fr Security and privacy issues in OSNs Threats Current

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

Exploring graph mining approaches for dynamic heterogeneous networks

Exploring graph mining approaches for dynamic heterogeneous networks Georgetown University Institutional Repository http://www.library.georgetown.edu/digitalgeorgetown The author made this article openly available online. Please tell us how this access affects you. Your

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

User Control Mechanisms for Privacy Protection Should Go Hand in Hand with Privacy-Consequence Information: The Case of Smartphone Apps

User Control Mechanisms for Privacy Protection Should Go Hand in Hand with Privacy-Consequence Information: The Case of Smartphone Apps User Control Mechanisms for Privacy Protection Should Go Hand in Hand with Privacy-Consequence Information: The Case of Smartphone Apps Position Paper Gökhan Bal, Kai Rannenberg Goethe University Frankfurt

More information

I. INTRODUCTION. T H Theepigaa. A Bhuvaneswari

I. INTRODUCTION. T H Theepigaa. A Bhuvaneswari Efficient and Controlled Sharing of Privacy Data in Social Network T H Theepigaa Department of Computer Science and Engineering Adhiparasakthi Engineering College Melmaruvathur, India theepi37@gmail.com

More information

Website Designs Australia

Website Designs Australia Proudly Brought To You By: Website Designs Australia Contents Disclaimer... 4 Why Your Local Business Needs Google Plus... 5 1 How Google Plus Can Improve Your Search Engine Rankings... 6 1. Google Search

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Trusted Profile Identification and Validation Model

Trusted Profile Identification and Validation Model International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 7, Issue 1 (May 2013), PP. 01-05 Himanshu Gupta 1, A Arokiaraj Jovith 2 1, 2 Dept.

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Behavioral Data Mining. Lecture 9 Modeling People

Behavioral Data Mining. Lecture 9 Modeling People Behavioral Data Mining Lecture 9 Modeling People Outline Power Laws Big-5 Personality Factors Social Network Structure Power Laws Y-axis = frequency of word, X-axis = rank in decreasing order Power Laws

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

An Approach To Web Content Mining

An Approach To Web Content Mining An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

How to Conduct a Heuristic Evaluation

How to Conduct a Heuristic Evaluation Page 1 of 9 useit.com Papers and Essays Heuristic Evaluation How to conduct a heuristic evaluation How to Conduct a Heuristic Evaluation by Jakob Nielsen Heuristic evaluation (Nielsen and Molich, 1990;

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Applications of Machine Learning on Keyword Extraction of Large Datasets

Applications of Machine Learning on Keyword Extraction of Large Datasets Applications of Machine Learning on Keyword Extraction of Large Datasets 1 2 Meng Yan my259@stanford.edu 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

More information

Tracking 101 DISCOVER HOW TRACKING HELPS YOU UNDERSTAND AND TRULY ENGAGE YOUR AUDIENCES, TURNING INTO RESULTS

Tracking 101 DISCOVER HOW  TRACKING HELPS YOU UNDERSTAND AND TRULY ENGAGE YOUR AUDIENCES, TURNING  INTO RESULTS Email Tracking 101 DISCOVER HOW EMAIL TRACKING HELPS YOU UNDERSTAND AND TRULY ENGAGE YOUR AUDIENCES, TURNING EMAIL INTO RESULTS SUMMARY 2 INTRODUCTION TO EMAIL TRACKING 3 WHAT IS EMAIL TRACKING? 4 WHAT

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Agglomerative clustering on vertically partitioned data

Agglomerative clustering on vertically partitioned data Agglomerative clustering on vertically partitioned data R.Senkamalavalli Research Scholar, Department of Computer Science and Engg., SCSVMV University, Enathur, Kanchipuram 631 561 sengu_cool@yahoo.com

More information

AMCTHEATRES.COM - PRIVACY POLICY

AMCTHEATRES.COM - PRIVACY POLICY Thank you for visiting AMCTheatres.com. AMC Entertainment Inc. values its relationship with guests, members and clients, and is committed to responsible information handling practices. This privacy policy

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Social Voting Techniques: A Comparison of the Methods Used for Explicit Feedback in Recommendation Systems

Social Voting Techniques: A Comparison of the Methods Used for Explicit Feedback in Recommendation Systems Special Issue on Computer Science and Software Engineering Social Voting Techniques: A Comparison of the Methods Used for Explicit Feedback in Recommendation Systems Edward Rolando Nuñez-Valdez 1, Juan

More information

Atlassian. Atlassian Software Development and Collaboration Tools. Bugcrowd Bounty Program Results. Report created on October 04, 2017.

Atlassian. Atlassian Software Development and Collaboration Tools. Bugcrowd Bounty Program Results. Report created on October 04, 2017. Atlassian Software Development and Collaboration Tools Atlassian Bugcrowd Bounty Program Results Report created on October 04, 2017 Prepared by Ryan Black, Director of Technical Operations Table of Contents

More information

I. INFORMATION WE COLLECT

I. INFORMATION WE COLLECT PRIVACY POLICY USIT PRIVACY POLICY Usit (the Company ) is committed to maintaining robust privacy protections for its users. Our Privacy Policy ( Privacy Policy ) is designed to help you understand how

More information

Recommendation System for Location-based Social Network CS224W Project Report

Recommendation System for Location-based Social Network CS224W Project Report Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless

More information

Trust Enhanced Cryptographic Role-based Access Control for Secure Cloud Data Storage

Trust Enhanced Cryptographic Role-based Access Control for Secure Cloud Data Storage 1 Trust Enhanced Cryptographic Role-based Access Control for Secure Cloud Data Storage Lan Zhou,Vijay Varadharajan,and Michael Hitchens Abstract Cloud data storage has provided significant benefits by

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

Peer To Peer Communication Using Heterogeneous Networks

Peer To Peer Communication Using Heterogeneous Networks Volume 4 Issue 10, October 2015 Peer To Peer Communication Using Heterogeneous Networks Khandave Pooja, Karande Ashwini, Kharmale Swati,Vanve Subeda. Dr.D.Y.Patil School Of Engineering and Technology,Lohegaon,Pune

More information

PPKM: Preserving Privacy in Knowledge Management

PPKM: Preserving Privacy in Knowledge Management PPKM: Preserving Privacy in Knowledge Management N. Maheswari (Corresponding Author) P.G. Department of Computer Science Kongu Arts and Science College, Erode-638-107, Tamil Nadu, India E-mail: mahii_14@yahoo.com

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Frequent Itemset Mining With PFP Growth Algorithm (Transaction Splitting)

Frequent Itemset Mining With PFP Growth Algorithm (Transaction Splitting) Frequent Itemset Mining With PFP Growth Algorithm (Transaction Splitting) Nikita Khandare 1 and Shrikant Nagure 2 1,2 Computer Department, RMDSOE Abstract Frequent sets play an important role in many Data

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1105

More information

Secure Image Sharing on Shared Web Sites Using Adaptive Privacy Policy Prediction System

Secure Image Sharing on Shared Web Sites Using Adaptive Privacy Policy Prediction System Secure Image Sharing on Shared Web Sites Using Adaptive Privacy Policy Prediction System Miss Akanksha R.Watane Prof.P.B.Sambhare Abstract: - Many social media sites like Facebook, flicker are performing

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Opinion 02/2012 on facial recognition in online and mobile services

Opinion 02/2012 on facial recognition in online and mobile services ARTICLE 29 DATA PROTECTION WORKING PARTY 00727/12/EN WP 192 Opinion 02/2012 on facial recognition in online and mobile services Adopted on 22 March 2012 This Working Party was set up under Article 29 of

More information

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

CS 224W Final Report Group 37

CS 224W Final Report Group 37 1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the

More information

Recommender Systems using Graph Theory

Recommender Systems using Graph Theory Recommender Systems using Graph Theory Vishal Venkatraman * School of Computing Science and Engineering vishal2010@vit.ac.in Swapnil Vijay School of Computing Science and Engineering swapnil2010@vit.ac.in

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

Mapping Internet Sensors with Probe Response Attacks

Mapping Internet Sensors with Probe Response Attacks Mapping Internet Sensors with Probe Response Attacks John Bethencourt, Jason Franklin, and Mary Vernon {bethenco, jfrankli, vernon}@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

How App Ratings and Reviews Impact Rank on Google Play and the App Store

How App Ratings and Reviews Impact Rank on Google Play and the App Store APP STORE OPTIMIZATION MASTERCLASS How App Ratings and Reviews Impact Rank on Google Play and the App Store BIG APPS GET BIG RATINGS 13,927 AVERAGE NUMBER OF RATINGS FOR TOP-RATED IOS APPS 196,833 AVERAGE

More information

Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department

More information

Odyssey Entertainment Marketing, LLC Privacy Policy

Odyssey Entertainment Marketing, LLC Privacy Policy Odyssey Entertainment Marketing, LLC Privacy Policy We collect the following types of information about you: Information you provide us directly: We ask for certain information such as your username, real

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Chapter 4. Fundamental Concepts and Models

Chapter 4. Fundamental Concepts and Models Chapter 4. Fundamental Concepts and Models 4.1 Roles and Boundaries 4.2 Cloud Characteristics 4.3 Cloud Delivery Models 4.4 Cloud Deployment Models The upcoming sections cover introductory topic areas

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Mining User - Aware Rare Sequential Topic Pattern in Document Streams Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,

More information

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION Justin Z. Zhan, LiWu Chang, Stan Matwin Abstract We propose a new scheme for multiple parties to conduct data mining computations without disclosing

More information

More Efficient Classification of Web Content Using Graph Sampling

More Efficient Classification of Web Content Using Graph Sampling More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple

More information

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP 324 Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP Shivaji Yadav(131322) Assistant Professor, CSE Dept. CSE, IIMT College of Engineering, Greater Noida,

More information

Multilevel Data Aggregated Using Privacy Preserving Data mining

Multilevel Data Aggregated Using Privacy Preserving Data mining Multilevel Data Aggregated Using Privacy Preserving Data mining V.Nirupa Department of Computer Science and Engineering Madanapalle, Andhra Pradesh, India M.V.Jaganadha Reddy Department of Computer Science

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

Fault Identification from Web Log Files by Pattern Discovery

Fault Identification from Web Log Files by Pattern Discovery ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files

More information

Mapping Internet Sensors with Probe Response Attacks

Mapping Internet Sensors with Probe Response Attacks Mapping Internet Sensors with Probe Response Attacks Computer Sciences Department University of Wisconsin, Madison Introduction Outline Background Example Attack Introduction to the Attack Basic Probe

More information

Keywords: geolocation, recommender system, machine learning, Haversine formula, recommendations

Keywords: geolocation, recommender system, machine learning, Haversine formula, recommendations Volume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Geolocation Based

More information