Network Devices Data Visualization Using Weka

Size: px

Start display at page:

Download "Network Devices Data Visualization Using Weka"

Lillian Norris
5 years ago
Views:

Volume 118 No. 17 2018, 599-608 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Network Devices Data Visualization Using Weka B.

1 Volume 118 No , ISSN: (printed version); ISSN: (on-line version) url: ijpam.eu Network Devices Data Visualization Using Weka B.krishna Sagar 1, E. Madhusudhana Reddy 2, S.Ramakrishna 3 1 Research Scholar,Computer Science and Engineering, J.N.T.U.K, Kakinada, India. 2 Professor, Computer Science & Engineering, SRM DRK College of Engineering & Technology,Hyderabad,India. 3 Professor, Computer Science, Sri Venkateswara University,Tirupati, India 1 krishna.sagar521@gmail.com, 2 e mreddy@yahoo.com, 3 drsramakrishna@yahoo.com January 29, 2018 Abstract Big Data brings a new era for research scholars. Different tools are used to process, analyze and visualization of big data. These can help scholars to find problems and study theoretically. Different articles are published in various areas in big data technologies. thus we studied various published research articles to find better approach to process the big data and analyzing it. In particular, we propose an idea to process a big XML data and visualize through Weka tool. Weka tool accepts the CSV or ARFF file for analysis and visualization. Here we proposed one algorithm to convert router generated XML data to csv format to process and analyze the data through Weka. Key Words: XML Data, Weka Tool, ARFF, CSV

2 1 Introduction In recent years processing and analyzing the customer data becomes a crucial part for organizations and institutions. But this data become very large to handle and analyze. For example, Social Media, e-commerce, Logistics, Intuitions Data etc., To process these data, various tools are used like Hadoop, Hive, and Spark etc. and for visualization we have Tableau, Qlikview, Datawrappers, Microsoft Power BI, Oracle Visual Analyzer. The acquired big data are largescale, heterogeneous, and generated at high speed, thus potentially complex to cope with [1]. For example, to analyze the data which are generated from different sensors connected to patient in hospital to provide better treatment. To handle the big data analysis, some data mining algorithms provides better solutions. In recent years scholars are interested in supervised and unsupervised algorithms. For example, classification algorithms [2], and clustering algorithms [3]. For instance, hadoop is used for processing data and Mahout, MLlib for analysis of data. Distributed systems provide an environment that can allow big data processing. Such systems, made up of organized collections of commodity hardware, process big data in a distributed manner. The main difficulty in big data mining to propose an algorithm in such a way to include in big data mining toolkits. The organizations aim to analysis of big data to extract knowledge and better to understand the customer needs. Recent big data technologies provide a user friendly environment to include the algorithm to analysis of big data. such developments require a basic understanding of distributed big data technologies for implementation of algorithms. Weka is a data mining tool that contains different algorithms to analyze the data. In recent years Weka tool is using to analyze the big data in distributed environment. For example DistributedWekaSpark [2], the Distributed Weka for Spark Package has been available in Weka for several years[3]. The success of Weka is to have a wide range of well data analysis algorithms and model evaluation procedures and metrics. Weka also provides an API (application programming interface) for integration of its algorithms and procedures resulting in visual environments such as Tableau and Qlikview. DistributedWekaSpark, is an in-memory processing of Big Data. DistributedWekaSpark contains various 2 600

3 machine learning data analysis algorithms with this spark integration. DistributedWekaSpark toolkit includes packages to provide map-reduce procedures and data processing, classification and execute tasks in Big Data. For example, Classification contains SVM (Support Vector Machine), Nave Bayes, Mapreduce, Decision Tree. These algorithms are implemented in this DistributedWekaSpark toolkit to reduce the processing speed, effective visualization and Load Balancing. In a distributed computing system, a process is used to coordinate many tasks. It is not a problem which systems are doing the work, but there should be a coordinator that will work at any time. So, electing a coordinator or a leader is very essential problem in distributed environment and there are several algorithms that are used in this process. Leader, Ring and Bully algorithms are some of them. In a group of communicating protocols, the elect a new coordinator is essential when a leader is crashes or away from the group. In this scenario in distributed environment, elections are conduct in appropriate situations. Election algorithm has a variety of applications such as key distribution, routing coordination, sensor coordination, and general control. When nodes are mobile, topologies can change and nodes may dynamically join/leave a network. In such networks, election algorithm can play a major role frequently, making it a particularly critical component of system operation. In this paper, we are used this novel election algorithm for delivering messages in absentia of destination system by choosing a nearby system or friend system through its past communication history. The traditional statement of the election problem is to finally elect a distinct coordinator from a set of nodes from many sources. 2 Implementation of Python Program to convert xml data to csv 2.1 Python Program to Convert the XML Data to CSV Data File import xml.etree.elementtree as etree tree = etree.parse( /root/omi12.xml ) 3 601

root = tree.getroot() def normalize(name): if name[0] == : uri, tag = name[1:].split( } ) return tag else: return name def finds child names attrib(i):... for c1 in root[i]:... yield(normalize(c1.

4 root = tree.getroot() def normalize(name): if name[0] == : uri, tag = name[1:].split( } ) return tag else: return name def finds child names attrib(i):... for c1 in root[i]:... yield(normalize(c1.tag),c1.attrib,c1.text)... for c2 in c1:... yield(normalize(c2.tag),c2.attrib,c2.text)... for c3 in c2:... yield(normalize(c3.tag),c3.attrib,c3.text) i=finds child names attrib(1) l=[] for j in i:... l.append(j) 2.2 Python Program to Convert the XML Data to CSV Data File This XML Router Data Taken by HPE, Software Company For Research

3 Data Transformation This Module contains of 3 phases, data selection, data collection, and data normalization. This section is important to improve the data quality for mining.

5 3 Data Transformation This Module contains of 3 phases, data selection, data collection, and data normalization. This section is important to improve the data quality for mining. The data selection phase is handled using Python programming statements. 4 Architecture for XML Data Visualization Fig 1.Architecture for XML Data Visualization 5 Data Visualization This module consists of data inputs to the Weka Tool and Data Visualization. To start the data visualization, the input data in Comma-separated values (CSV) was loaded into the main WEKA GUI which is Explorer menu. WEKA has four main menus which are Explorer, Experimenter, KnowledgeFow, and Simple CLI. To start the data exploration for mining, the input data in the CSV are loaded into the Explorer menu. Fig. 2 shows the WEKA main GUI[4]

various DM available tasks which are preprocess, classify, cluster, associate, select attribute, and visualize.

6 Fig. 2. Entering in to the Weka GUI Inside the Explorer menu, it has six different panels appear on the tabs at the top, that corresponds to the various DM available tasks which are preprocess, classify, cluster, associate, select attribute, and visualize. To begin the data analysis, CSV file are input on the Open file button in preprocess panel. For this research, the file named output.csv were imported on the preprocess panel and 154 instances, 3 Attributes were displayed [4]. As show in fig 3, fig 4 and fig 5. Fig. 3. Weka Explorer -Preprocess tab 6 604

Here we can delete the attributes and visualize a particular attribute.

7 Fig. 4. Choose the CSV File Which is Generated by Python Program Fig. 5. Weka Explorer -Preprocess view for Network Routers Data In fig 5, we can see the complete XML Data. Here we can delete the attributes and visualize a particular attribute. in fig 6, we are visualizing a particular attribute and its instances. for example, the attribute device ID:828baa70-1e8e-71e7-1d25-0fda have Last Modified Time. As shown in fig 6 and fig

8 Fig. 6. Weka Visualization for Network Routers Data Figure.7. Weka Visualization for Instances 8 606

9 6 CONCLUSION This paper presents visualization of complex XML DATA generated by net-work devices. Here we used python libraries to convert xml data to csv and inputs into the Weka GUI tool for visualization. Future work will reduce the noise data from csv and providing distributed environment for analyzing network device data. Acknowledgment In this paper we collected XML DATA from HPE Company for Research. References [1] D. Agrawal, S. Das, and A. El Abbadi, Big data and cloud computing: Current state and future opportunities, in Proceedings of the 14th International Conference on Extending Database Technology, ser. EDBT/ICDT 11.New York, NY, USA: ACM, 2011, pp [2] Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, Firat Tekiner, Goran Nenadic, John Keane, A Parallel DistributedWeka Framework for Big Data Mining using Spark, BigDataCongress , 2015, pp.916. [3] [4] Nur Hafieza Ismail, Fadhilah Ahmad, and Azwa Abdul Aziz Implementing WEKA as a Data Mining Tool to Analyze Students Academic Performances Using Nave Bayes Classifier

10 608

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE