Online Bill Processing System for Public Sectors in Big Data

Similar documents
Embedded Technosolutions

Hadoop An Overview. - Socrates CCDH

A Review Paper on Big data & Hadoop

Big Data Architect.

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

A Review Approach for Big Data and Hadoop Technology

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India

HADOOP FRAMEWORK FOR BIG DATA

Introduction to Big-Data

Big Data A Growing Technology

Certified Big Data and Hadoop Course Curriculum

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

This brief tutorial provides a quick introduction to Big Data, MapReduce algorithm, and Hadoop Distributed File System.

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

A Survey on Big Data

Certified Big Data Hadoop and Spark Scala Course Curriculum

Innovatus Technologies

Hadoop, Yarn and Beyond

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

A BigData Tour HDFS, Ceph and MapReduce

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Top 25 Big Data Interview Questions And Answers

Big Data Hadoop Stack

Chase Wu New Jersey Institute of Technology

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Apache Spark and Hadoop Based Big Data Processing System for Clinical Research

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Databases 2 (VU) ( / )

Hadoop. Introduction / Overview

Stages of Data Processing

Transaction Analysis using Big-Data Analytics

<Insert Picture Here> Introduction to Big Data Technology

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data with Hadoop Ecosystem

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

A Review on Hive and Pig

BIG DATA & HADOOP: A Survey

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life

CISC 7610 Lecture 2b The beginnings of NoSQL

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

A REVIEW: MAPREDUCE AND SPARK FOR BIG DATA ANALYTICS

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Webinar Series TMIP VISION

Introduction to Hadoop and MapReduce

MapReduce, Hadoop and Spark. Bompotas Agorakis

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

DATA SCIENCE USING SPARK: AN INTRODUCTION

Hadoop. copyright 2011 Trainologic LTD

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Hadoop Course Content

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A Survey on Comparative Analysis of Big Data Tools

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

High Performance Computing on MapReduce Programming Framework

Challenges for Data Driven Systems

Hadoop Online Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS

Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms

Twitter data Analytics using Distributed Computing

Microsoft Big Data and Hadoop

Hadoop Development Introduction

Cloud Computing Techniques for Big Data and Hadoop Implementation

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Introduction to BigData, Hadoop:-

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Data Informatics. Seon Ho Kim, Ph.D.

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data Analytics. Description:

Hortonworks and The Internet of Things

Big Data Analytics. Rasoul Karimi

NOSQL Databases: The Need of Enterprises

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage

International Journal of Computer Engineering and Applications, BIG DATA ANALYTICS USING APACHE PIG Prabhjot Kaur

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Data Platforms and Pattern Mining

A Survey on Big Data, Hadoop and it s Ecosystem

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns

Big Data Issues and Challenges in 21 st Century

Global Journal of Engineering Science and Research Management

Automatic Voting Machine using Hadoop

Flash Storage Complementing a Data Lake for Real-Time Insight

2013 AWS Worldwide Public Sector Summit Washington, D.C.

BIG DATA TESTING: A UNIFIED VIEW

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

BIG DATA ANALYTICS IN HEALTHCARE: CHALLENGES, TOOLS AND TECHNIQUES : A SURVEY

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Study of NoSQL Database Along With Security Comparison

Transcription:

IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer Basha A. Angayarkanni Associate Professor Assistant Professor Raak Arts& Science College, Perambai-Puducherry India Raak Arts& Science College, Perambai-Puducherry India Jones. J Student Raak Arts& Science College, Perambai-Puducherry India Mohammadibrahim. U Student Raak Arts& Science College, Perambai-Puducherry India Abstract In this generation we have focused for make an improvement and maintain to the government sectors like electric bill, water bill and commercial tax departments. About their bill payment system for each and every house holder in the both urban and rural area. Now-a-days the bill payment system is becoming a huge problem and their records are not easily identifies if there is any emergency needed due to their different area location allocated. So to overcome this kind of problems we can collect and store all necessary information and their payment structure in a single database server using online big data methods. Keywords: ZooKeeper, Hadoop, MapReduce, HDFS I. INTRODUCTION Big data is an electric blanket term for the non-traditional strategies and technologies needed to collect the large amount of information in the different fields. While the problem of working with data that exceeds the computing power or storage of a single computer is not new, the pervasiveness, scale, and value of this type of computing has greatly expanded in recent years. Bigdata V S: Volume It consists of terabytes, records, transactions, tables, files Velocity It consists of batches, near time, real time, streams Variety It consists of structured, unstructured, semi structured Structured data consist of table format data Unstructured data is audio, video, log(recent files), multiple format files(.ppt) etc.. Semi-structured data is input the file name extensions like text and the output is html format open in browser are under the semi-structured data categories Value It consists of valuable and invaluable data Veracity It likes uncertainty of data All rights reserved by www.ijirst.org 68

Fig 1: 5 V s of Big Data Why Is Big Data Important? The great importance of big data doesn t revolve around how much data you have, but what you do with it. You can take data from any source and scrutinize it to find answers that enable Cost reductions Time reductions New product development and optimized offerings, and Smart decision making. II. BIG POTENTIAL OF BIG DATA The volume of data that s being created and stored on a universal level is almost inconceivable, and it just keeps growing. That means there s even more potential to glean key insights from business information yet only a small percentage of data is actually analyzed. What Comes Under Big Data? Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data. Black Box Data: It is a component of airplanes, and jets, helicopter, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft. Social Media Data: Social media such as Whatsapp, Facebook, Hike and Twitter hold information and the views posted by millions of people across the globe. Stock Exchange Data: Buy and Sell are information of the stock exchange data, decisions made on a share of different companies made by the customers. Power Grid Data: Information consumed by a particular node with respect to a base station is maintained by the power grid data. Transport Data: Model, capacity, distance and availability of a vehicle are maintained in the Transport data. Search Engine Data: It retrieve lots of data from different databases. Applications of Big Data Banking Sectors Security purpose Communications Media Entertainment Healthcare Providers Education Manufacturing and Natural Resources Government Insurance Retail and Whole sale trade Transportation All rights reserved by www.ijirst.org 69

Energy and Utilities Fig. 2: Fields that comes under the umbrella of Big Data. III. HADOOP IN GENERAL Due to emerging new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing fast every year. From the beginning of time till 2003, the amount of data produced by us was 5 billion gigabytes. If you stack up the data in the form of disks it may fill an entire football field. The same amount was formed in every two days in 2011, and in every ten minutes in 2013. This rate is still growing enormously. Though all this information produced is meaningful and can be useful when processed, it is being neglected. Hadoop is a Java-based programming framework and it is an open source, which supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Architecture of Hadoop The following are the four modules of Hadoop framework: Hadoop Common: It contains the Java libraries and utilities required by other Hadoop modules. These libraries provide file system and OS level abstractions and contains the necessary Java files and scripts required to start Hadoop. Hadoop YARN: Scheduling and cluster resource management is followed here. Hadoop Distributed File System (HDFS): It provides high-throughput access to application data. HadoopMapReduce: This is YARN-based system for parallel processing of large data sets. Fig. 3: Hadoop Architecture All rights reserved by www.ijirst.org 70

Hadoop Distributed File System (HDFS): It provides high-throughput access to application data. IV. HADOOP FRAMEWORK The term "Hadoop" often refers not just to the base modules mentioned above but also to the collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark etc. Map Stage: Fig. 4: Hadoop Framework V. MAP/REDUCE PROCESS: The map or mapper s job is to process the input data. Generally the input data is in the form of file or directory and is stored in the hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. Educe Stage: This stage is the combination of the shuffle stage and the reduce stage. The reducer s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS All rights reserved by www.ijirst.org 71

Fig. 5: Map Reduce VI. HBASE: Applications such as hbase, cassandra, couchdb, dynamo, and mongodb are some of the databases that store huge amounts of data and access the data in a random manner. Hbase is a distributed column-oriented database built on top of the hadoop file system. It is an open-source project and is horizontally scalable. Hbase is a data model that is similar to google s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the hadoop file system (HDFS). It is a part of the hadoop ecosystem that provides random real-time read/write access to data in the hadoop file system. One can store the data in hdfs either directly or through hbase. Data consumer reads/accesses the data in HDFS randomly using hbase. Hbase sits on top of the hadoop file system and provides read and write access. VII. HBASE ARCHITECTURE: Fig. 6: Hbase Architecture The Master Server: Assigns regions to the region servers and takes the help of apache zookeeper for this task. Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers. Maintains the state of the cluster by negotiating the load balancing. Is responsible for schema changes and other metadata operations such as creation of tables and column families. The Region Servers Have Regions That: Communicate with the client and handle data-related Operations. Handle read and write requests for all the regions under it. Decide the size of the region by following the region Size thresholds. All rights reserved by www.ijirst.org 72

Zookeeper: Zookeeper is an open-source project that provides services like maintaining configuration information, naming, providing distributed synchronization, etc. Zookeeper has ephemeral nodes representing different region servers. Master servers use these nodes to discover available servers. In addition to availability, the nodes are also used to track server failures or network partitions. Clients communicate with region servers via zookeeper. In pseudo and standalone modes, hbase itself will take care of zookeeper. Features of HBase: Hbase is linearly scalable. It has automatic failure support. It provides consistent read and writes. It integrates with hadoop, both as a source and a destination. It has easy java api for client. It provides data replication across clusters. Hbase Tool will be Solving My Problem: Hbase tool will be solve my problem because to storage the database from government records are in table formats and it will be huge form a single district So Hbase tool handle to manage the record set related such as electric bill, water tax, property tax records are store in the database It will easily access the particular holder data and it will be secure. This tools are process like in a centralized to access the data from anywhere in the government sectors not only in the particular district Feasibility of Implementation: Now a days to remit the bill to different government sector with own workplace only such as e-bill, water tax, and property tax. It takes more time to spend. Such a problem to overcome, ONLINE BILL PROCESSING SYSTEM FOR PUBLIC SECTORS helps all in one place to pay the billing system in single apps and retrieve the data of particular holder. This implementation of this concept is help to maintain and store the data on huge and it will be secure. VIII. CONCLUSION This project based on the concept of bigdata and using this we have to retrieve the data and transfer the data in fast. Large amount of data will be stored in this tool. This application is used to access the process of billing system in single place. It contains less amount of time to spend it. ACKNOWLEGDEMENTS This work was supported by Mr. H. AnwerBasha our guide and Associate Prof and Head of the department in Raak Arts & Science College. He has support us a lot to complete this paper. REFERENCE [1] A Survey on Prediction Using Big Data Analytics M. Supriya and A.J. Deepa (2017). International Journal of Big Data and Analytics in Healthcare (pp. 1-15). [2] Intelligent big data analysis: a review Chun-Wei Tsai; Ya-Lan Yang; Ming-Chao Chiang; Chu-SingYang [3] Optimising virtual machine allocation in MapReduce cloud for improved data locality, T.P. Shabeera; S.D. Madhu Kumar, Int. J. of Big Data Intelligence, 2015 Vol.2, No.1, pp.2-8 All rights reserved by www.ijirst.org 73