International Journal of Advances in Engineering & Scientific Research

International Journal of Advances in Engineering & Scientific Research

Print ISSN : 2349 –4824

Online ISSN : 2349 –3607

Frequency : Continuous

Current Issue : Volume 2 , Issue 9
2015

AN EFFICIENT DATA ANALYSIS FRAMEWORK IN BIG DATA

*Lilly Raffy Cheerotha, ** V.N.Anushya

*Assistant Professor, Department of CSE  I.E.S.College of Engineering Thrissur.,    **AssistantProfessor, Department of CSE  M.A.M. College of Engineering & Technology Trichy                                

DOI : Page No : 30-42

Published Online : 2015-12-30

Download Full Article : PDF Check for Updates


Abstract:

 

Big data refers to huge volume of data. Big data is the process of handling large datasets. In today’s scenario, data is growing exponentially faster than ever so the concept of Big data has emerged.  It can perform data storage, data analysis, and data processing as well as data management techniques in parallel. Big data can process several peta bytes (1015) of data in seconds. It can handle both structured and unstructured data at a time. The aim of this project is to use the classification technique before mapping the tasks into the resources. For mapping the tasks, MapReduce programming model is used which reduces the workload on the resources. The MapReduce will take more time to decide the resource for performing the tasks which is to be allocated. Parallel Database technology is used to increase the performance of Big data because it allocate the tasks in parallel into the resources.

In this model, for classifying the tasks, Ensemble Classifier is used. An Ensemble Classifier is the group of different classifiers which make the classifiers to process in parallel and also shares the knowledge of fastest processing classifier to others. The Support Vector Machine, Decision Tree and K-Nearest Neighbor are the classifiers used to produce an Ensemble Classifier. Therefore, the data’s will be processed with minimal scheduling time (the map class will not take time to decide to which resource the task has to be allocated). Along with Ensemble Classifier, Map Reduce model and Parallel Database Technology is used which increases the efficiency and throughput of Big Data by reducing the scheduling time.

Keywords— MapReduce, Hadoop, EnsembleClassifier, Parallel Database