EFFICIENT DATA MINING ALGORITHMS FOR AGRICULTURE DATA

Size: px

Start display at page:

Download "EFFICIENT DATA MINING ALGORITHMS FOR AGRICULTURE DATA"

Damian Harrison
6 years ago
Views:

1 EFFICIENT DATA MINING ALGORITHMS FOR AGRICULTURE DATA Anusha A. Shettar 1, Shanmukhappa A. Angadi 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological University, PG Centre, Belagavi, Karnataka, India 2 Department of Computer Science and Engineering, Visvesvaraya Technological University, PG Centre, Belagavi, Karnataka, India, Abstract Agriculture plays a crucial role in Indian economy. Agriculture is a basic need of human beings. All the people depend on the agriculture productivity. Each state has different productivity study of the past data set and one can predict the future results, using information technology tools. Data mining is an important tool for extracting hidden information from large and varied data. In this connection, the paper, explores the possibility of applying different types of mining algorithms for extraction of important information. This work experiments with different data mining issues like classification, clustering and association to study the given data using weka package. Results obtained from analysis for applications such as classification based on yield, depends on two different issues, namely: crop with season and crop with yield. Clustering of data based on yield is experimented using different algorithms like EM and K-Means. It is found that LMT algorithm gives better results compared to others with respect to Classification, and K-means algorithms provides better performance in terms clustering of agricultural data. Key word: J48, LMT, LADTree, ID3, EM and K-Means. I. INTRODUCTION Agriculture field is one of the back-bone of Indian economy. The basic need of the human being and animals is fulfilled by the agriculture sector. The productivity of agriculture depends on geographical conditions and season. Yield of the agriculture, gives one of the measurable parameters that contribute towards the real income of the country. This yield is calculated based on the area and production of each crop in the country. To get a better economic result we must study the agriculture database with information related to crop, season, yield, area, etc. Analysis of huge agriculture database is done through data mining techniques. Data mining is a function of extracting information from a large data set and transforming it into meaningful/ usable information. The data mining algorithms are used to extract information from the data set. Large amount of agricultural information is made available by various government organizations, for agricultural planning. Data mining tools can be employed to predict trends about agricultural output and yield. A. APPLICATIONS Data mining in agricultural organizations are capable of producing descriptive and predictive information as a support for decision making. It is planned to implement the data mining on the agriculture database as follows. Usage of a particular time marker gives the geographical analysis of the crops. In particular a geographical area detects diseases affected crops. By having the geographical condition, it is possible to obtain the quality of the crops. Demonstration of geographic analysis gives information about treatment and nourishment to the plants. B. All Rights Reserved 142

2 There are some literature papers described about data mining techniques to classify and predict the future weather, agriculture crop classification, modeling and prediction of rainfall, and soil classification etc. This project narrates about efficient data mining algorithms for assam state agriculture data. So different classification and clustering algorithms are used to predict and group data respectively. In the classification for prediction LMT (Logistic Model Tree) classification algorithm is efficient one compare to other algorithms. In the clustering of data K-mean gives better performance results compared to other clustering algorithm. First section gives a brief outline for data mining definition and data mining applications and review. Second section gives Agriculture data description and the preprocessing of data. Third section gives data mining methodology. Fourth section gives proposed methodology. Brief description about classification for prediction and clustering of the data. Fifth section of this paper described about testing and analysis of given agriculture data set using different data mining algorithms. II. AGRICULTURAL DATA DESCRIPTION This paper uses agriculture raw data collected from various agriculture statistical departments and through various Indian agriculture web sites. This raw data set is then preprocessed and analyzed by using weka tool. In this work uses, ASSAM state raw agriculture data are considered for prototype analysis of the tools. The given agriculture data base has different attributes like crop, district, season, and yield with their respective values are shown in figure 1. Fig 1: Raw data collection from the Indian agriculture website. Add one more attribute as yield_class based on the given attribute yield and season. Microsoft excel uses the calculation shown in figure 2 to add new values to yield_class attribute in given data set. Fig 2: Preprocessed dataset using Microsoft calculation

3 III. THE DATA MINING MECHANISM Data mining uses different methods for the purpose of getting the necessary information. Different technologies are used for different purposes, where every method has its advantages and disadvantages. Data mining tasks can be divided into 1.Descriptive and 2. Predictive. Descriptive: The goal of this task is to find human interpreted patterns and associations, by knowing the data and the entire model construction, predict some response of interest. This task not does requires a special variable of response. Unify data in a certain manner. Prediction: Predictive tasks require a special variety of responses. The response may be numerical or categorical, which categorizes data mining as classification and regression. Predictive tasks tell about the variable values based on existing information. General algorithms and different data mining tools will be explored and selected algorithm will be implemented in agriculture data set for extraction of information or processing required data. IV. PROPOSED METHODOLOGY Proposed methodology describes about the progress of retrieving required data from given architecture data base. In architecture, database containing details of yield depends on their area and production with particular seasons, district and each state. The given data is pre-processed by adding new attribute, named as yield class. This yield class attribute contains the value as Poor, Good, Very_Good, Excellent depends on their yield value. So there are some algorithms whichh are helpful to classify, clustering given data with the following steps, 1. Agriculture data is used as input data for system. 2. Pre-process data is nothing but removing unwanted data or adding required information to make data mining easier. 3. This pre-processed data is used as input for further implementation. 4. Create two datasets from available data, one for training and other for testing. 5. Apply classification algorithm on train and test file to predict result, and clustering algorithms are used to retrieve required data. Two types of System Designs, 1. Classification for Prediction. 2. Clustering A. Classification for prediction Consider some of the attributes like state_name, district _name, year, season, yield from dataset for making classification based on crop with season and crop with yield. For making prediction following data mining algorithm are used those are ID3, J48, LMT, KNN etc. Fig 3 shows block diagram for classification for prediction. Fig 3: Classification for Prediction

4 A. Clustering Consider some of the attributes like state_name, district _name, year, season, and yield from dataset for making clustering based on yield. For making clustering following data mining algorithm are used those are EM and K-Mean. Fig 4: Clustering of dataset V. TESTING and ANALYSIS In this section the performance analysis of the classification algorithms those are J48, ID3, LMT, LADTree applied to the huge agriculture data set using weka package and KNN algorithm is applied to dataset without using weka package and get different accuracy results. Performance analysis of LMT algorithm given below. Same analysis applied to all other classification algorithm. Apply clustering algorithms like EM and K-Mean on huge agriculture data and cluster data based on yield base. A. Crop with Season Based classification The given huge agriculture data set contains all the attributes like crop name, District name, Year, Production, Area, Yield. In this, classification depends mainly on crop name with season based and crop with yield based. The dataset set divided in to two sets namely train and testt set. Instances of train set are not same as instances of test set. Distinct instances are present in train and test using technique called cross validation. Cross validation divides huge data set into two set based on percentage and random seed. Table 1: Accuracy results using LMT algorithm crop with season based.

5 By analyzing the above result with respect to LMT algorithm the random seed value equal to 50 with percentage of training and test data set with gives highest accuracy result compare to other. The performance graph shown below. Fig 5: Accuracy results with respect to random seed using LMT algorithm crop with season based. B. Crop with yield Based classification Table 2: Accuracy results using LMT algorithm crop with yield based. By analyzing the above result with respect to LMT algorithm the random seed value equal to 70 with percentage of training and test data set with gives highest accuracy result compared to others. Fig 6 LMT algorithm crop with season based.

6 C. KNN algorithm on Crop and Season based data Table 3: Accuracy results for different K values Fig 7: Accuracy results with different K values K= 2 gives highest accuracy result compare to other K values. It means it selects 2 values from its test query and gives prediction. D. Clustering of the data As we know the clustering of data is s nothing but grouping the data with same features. In this project use yield based grouping of the instances. When we give yield value more than 8 it shows the error because no value is present with more than 8 in given data set. The following results show when we choose yield value is 1.5. E. EM Algorithm The EM algorithm runs using assam sam agriculture data set. The fig 8 shows the output of EM algorithm. There 7 attributes namely state_name, Crop_name, District_name, Season, Yield, and Yield_class and 1937 instances. Yield=1.5 gives result up to Log likelihood is refers to probability of identifying correct group of data elements.

7 Fig 8: Result with EM algorithm with yield=1.5. F. K-Mean Algorithm The K-mean runs using assam agriculture dataset. The figure 9 shows output of K-mean algorithm. The dataset contains 1937 instances and yield=1.5. K-mean algorithm divides the whole data set into 2 clusters. 0-cluster refers to 887 for Very_Good class attribute results and 1-cluster refers to 1050 for Good class attribute results. Fig 9: Result with K-Mean algorithm with yield=1.5. VI. CONCLUSION This paper uses weka tool to implement data mining algorithms on the given huge agriculture dataset. Classification algorithms like J48, LMT, LADTree and ID3 gives the better prediction result with respect to yield _class attribute like Poor, Good, Very_Good, and Excellent. This work obtains different accuracy result with different classification algorithms. According to comparative analysis it is concluded that LMT is having more accurate result than other classifier algorithm for result prediction. But although LMT is best, it is taking more execution time. So instead of LMT, use J48 algorithm to give quick and better accuracy prediction result for crop with season based and crop with yield based. One more classification algorithm namely KNN is used to predict result without using WEKA packages. K with value 2 gives best accuracy results compared to other K values. One more issue dealt with in this work, is clustering of the data with yield based clustering of data using EM and K-Mean algorithm. The performance of the EM algorithm is low compared to K-mean algorithm. Overall the exploratory study reported in this paper identifies the data mining algorithms that are suitable for agricultural data. The study can be extended for other types of data. REFERENCES 1. Surabhi Chouhan, Divakar Singh, Anju Singh,2016 A Survey and Analysis of Various Agricultural Crops Classification Techniques, Department of Computer Science University Institute of Technology Barkatullah University, Bhopal. 2. B. MiloviC and V. RadojeviC,2015, Application of Data Mining in Agriculture. 3. Geraldin B. Dela Cruz, Member, IACSIT, Bobby D. Gerardo, and Bartolome T. Tanguilig, 2014, Agricultural Crops Classification Models Based on PCA-GA Implementation in Data Mining,. 4. D Ramesh1, B Vishnu Vardhan, Sep 2013, Data Mining Techniques and Applications toagricultural Yield Data.

8 5. Dr.P. KamalakKannan, K. Hemalatha, Dec 2012, Agro Genius: An Emergent Expert System for Querying Agricultural Clarification Using Data Mining Technique, 6. Sanjay D. Sawaitul, Prof. K. P. Wagh, Dr. P. N. Chatur, 2012 Classification and Prediction of Future Weather by using Back Propagation Algorithm-An Approach, M.Tech Department Computer Science and Engineering-Head of Department. 7. D.Rajesh, Application of Spatial Data Mining for Agriculture, 2011, Application of Spatial Data Mining for Agriculture. 8. V.K.Somvanshi, O.P.Pandey, P.K.Agrawal, N.V.Kalanker, M.Ravi Prakash and Ramesh Chand, April 2006, Modelling and prediction of rainfall using artificial neural network and ARIMA techniques, National Geophysical Research Institute, Hyderabad. 9. YETHIRAJ N G, 2012, APPLYING DATA MINING TECHNIQUES IN THE FIELD OF AGRICULTURE AND ALLIED SCIENCES, Assistant Professor, Department of Computer Science Maharani s Science College for Women, Bangalore, India. 10. L.Sathish Kumar,Mrs.A.Padmapriya.M.C.A, M.Phil,(Ph.D), 2012 ID3Algorithm Performance of DiagnosisFor Common Disease, Department of Comp.sci & Engg,Alagappa University,Karaik udi. 11. S. Veenadhari Dr. Bharat Mishra Dr.CD Singh, Aug 2011, SoybeanProductivityModeling using DecisionTree Algorithms, Research Scholar MGCGV,Chitrakoot. 12. GAO Yi-yang, Ren Nan-ping, 2009, Data Mining And Analysis Of Our Agriculture Based On The Decision Tree. 13. V. K. Somvanshi, et al., April 2006, Modeling and prediction of Rainfall using artificial neural network and ARI. 14. K. Verheyen, D. Adriaens, M. Hermy, and S. Deckers, 2001, High resolution continuous Soil classification using morphological soil profile descriptions, Laboratory for Forest, Nature and Landscape Research, Catholic Uniersity of Leuen, V. Decosterstraat 102, 3000 Leuen, Belgium 15. B. Rajagopalan and U. Lal, Oct 1999, A K-nearest neighbor Simulator for daily Precipitation and other Weather variable, Lamont-Doherty Earth Observatory, Columbia University, Palisades, New York. 16. Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, May 2012, Comparison the various clustering algorithms of weka tools, Department of computer science, Jaypee University of Engg. & Technology. 17. Prajwala T R, Sangeeta V I, January 2014, Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool, PESIT, Bangalore. 18. Wei Peng, Juhua Chen and Haiping Zhou, An Implementation of ID3 --- Decision Tree Learning All Rights Reserved 149

A REVIEW ON CLASSIFICATION TECHNIQUES OVER AGRICULTURAL DATA

A REVIEW ON CLASSIFICATION TECHNIQUES OVER AGRICULTURAL DATA Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.491