DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Similar documents
The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Big Data Hadoop Stack

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Innovatus Technologies

Hadoop An Overview. - Socrates CCDH

Hadoop. Introduction / Overview

Introduction to BigData, Hadoop:-

Hadoop Online Training

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Big Data Architect.

Certified Big Data and Hadoop Course Curriculum

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data Analytics using Apache Hadoop and Spark with Scala

Configuring and Deploying Hadoop Cluster Deployment Templates

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

50 Must Read Hadoop Interview Questions & Answers

Microsoft Big Data and Hadoop

microsoft

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Exam Questions

Big Data with Hadoop Ecosystem

Hadoop. copyright 2011 Trainologic LTD

A Review Paper on Big data & Hadoop

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Hadoop Development Introduction

Certified Big Data Hadoop and Spark Scala Course Curriculum

Big Data Analytics. Description:

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Big Data Hadoop Course Content

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Databases 2 (VU) ( / )

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop. Introduction to BIGDATA and HADOOP

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::


HADOOP FRAMEWORK FOR BIG DATA

Expert Lecture plan proposal Hadoop& itsapplication

CISC 7610 Lecture 2b The beginnings of NoSQL

New Approaches to Big Data Processing and Analytics

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

MapReduce, Hadoop and Spark. Bompotas Agorakis

Stages of Data Processing

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Introduction to HDFS and MapReduce

1. Considering functional dependency, one in which removal from some attributes must affect dependency is called

Webinar Series TMIP VISION

A Glimpse of the Hadoop Echosystem

Techno Expert Solutions An institute for specialized studies!

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

A Review Approach for Big Data and Hadoop Technology

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data

Hortonworks Data Platform

Cmprssd Intrduction To

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

docs.hortonworks.com

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Chase Wu New Jersey Institute of Technology

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Online Bill Processing System for Public Sectors in Big Data

Importing and Exporting Data Between Hadoop and MySQL

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop course content

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Data Storage Infrastructure at Facebook

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Introduction to Big-Data

MapR Enterprise Hadoop

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

<Insert Picture Here> Introduction to Big Data Technology

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

MI-PDB, MIE-PDB: Advanced Database Systems

Hadoop, Yarn and Beyond

HDInsight > Hadoop. October 12, 2017

Timeline Dec 2004: Dean/Ghemawat (Google) MapReduce paper 2005: Doug Cutting and Mike Cafarella (Yahoo) create Hadoop, at first only to extend Nutch (

Introduction to Hadoop and MapReduce

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns

South Asian Journal of Engineering and Technology Vol.2, No.50 (2016) 5 10

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Hadoop Overview. Lars George Director EMEA Services

BIS Database Management Systems.

MIS Database Systems.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Transcription:

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013 Academic year: 2017-2018

UNIT I PART-A 1. What is a data model? List the types of data model used. A database model is the theoretical foundation of a database and fundamentally determines in whichmanner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system. The most popularexample of a database model is the relational model. Types of data model used Hierarchical model Network model Relational model Entity-relationship Object-relational model Object model 2. Define database management system? List some applications of DBMS. Database Management System (DBMS) is a collection of interrelated data and a set of programs to accessthose data. Banking Airlines Universities Credit card transactions Tele communication Finance Sales Manufacturing Human resources 3. Give the levels of data abstraction? Physical level Logical level View level 4. Define data model? A data model is a collection of conceptual tools for describing data, data relationships, data semantics andconsistency constraints.

5. What is an entity relationship model? The entity relationship model is a collection of basic objects called entities and relationship among thoseobjects. An entity is a thing or object in the real world that is distinguishable from other objects. 6. What are attributes and relationship? Give examples. An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. Example: possible attributes of customer entity are customer name, customer id, Customer Street,customer city. A relationship is an association among several entities. Example: A depositor relationship associates a customer with each account that he/she has. 7. Define single valued and multivalued attributes. Single valued Attributes: attributes with a single value for a particular entity are called singlevalued attributes. Multivalued Attributes: Attributes with a set of value for a particular entity are called multivaluedattributes. 8. What is meant by normalization of data? It is a process of analyzing the given relation schemas based on their Functional Dependencies (FDs) andprimary key to achieve the properties Minimizing redundancy Minimizing insertion Deletion and updating anomalies 9. Define - Entity set and Relationship set. Entity set: The set of all entities of the same type is termed as an entity set. Relationship set: The set of all relationships of the same type is termed as a relationship set. 10. What are stored, derived, composite attributes? Stored attributes: The attributes stored in a data base are called stored attributes.

Derived attributes: The attributes that are derived from the stored attributes are called derivedattributes. For example: The Age attribute derived from DOB attribute. 11. Define - null values. In some cases a particular entity may not have an applicable value for an attribute or if we do not know thevalue of an attribute for a particular entity. In these cases null value is used. 12. What is meant by the degree of relationship set? The degree of relationship type is the number of participating entity types. 13. Define - Weak and Strong Entity Sets Weak entity set: entity set that do not have key attribute of their own are called weak entity sets. Strong entity set: Entity set that has a primary key is termed a strong entity set. 14. What does the cardinality ratio specify? Mapping cardinalities or cardinality ratios express the number of entities to which another entity canbe associated. Mapping cardinalities must be one of the following: One to one One to many Many to one Many to many 15. What are the two types of participation constraint? Total: The participation of an entity set E in a relationship set R is said to be total if every entity in Eparticipates in at least one relationship in R. Partial: if only some entities in E participate in relationships in R, the participation of entity set E inrelationship R is said to be partial.

16. What is a candidate key and primary key? Minimal super keys are called candidate keys. Primary key is chosen by the database designer as the principal means of identifying an entity in theentity set. 17. Define -Business Rules. Business rules are an excellent tools to document the various aspects of business domain. For example: A student is evaluated for a course through combination of theory and practicalexaminations. 18. What is JDBC? List of JDBC drivers. Java Database Connectivity (JDBC) is an application programming interface (API) for the programminglanguage Java, which defines how a client may access a database. It is part of the Java Standard Editionplatform, from Oracle Corporation. Type 1 - JDBC-ODBC Bridge Driver. Type 2 - Java Native Driver. Type 3 - Java Network Protocol Driver. Type 4 - Pure Java Driver. 19. What are the steps involved to access the database using JDBC? Register the JDBC Driver Creating database connection Executing queries Processing the results Closing the database connection. 20. What are three classes of statements using to execute queries in java? Statement Prepared Statement Callable Statement 21. What is stored procedure? In a database management system (DBMS), a stored procedure is a set of Structured Query Language (SQL) statements with an assigned name that's stored in the database in compiled form so that it can beshared by a number of programs.

The use of stored procedures can be helpful in controlling access to data, preserving data integrity and improving productivity. 22. What do the four V s of Big Data denote? IBM has a nice, simple explanation for the four critical features of big data: Volume Scale of data Velocity Analysis of streaming data Variety Different forms of data 23. List out the companies that use Hadoop. Yahoo (One of the biggest user & more than 80% code contributor to Hadoop)\ Facebook Netflix Amazon Adobe ebay Twitter 24. Distinguish between Structured and Unstructured data. Data which can be stored in traditional database systems in the form of rows and columns, forexample the online purchase transactions can be referred to as Structured Data. Data which can be stored only partially in traditional database systems, for example, data in XMLrecords can be referred to as Semi Structured Data. Unorganized and raw data that cannot be categorized as semi structured or structured data isreferred to as unstructured data. Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of unstructured data. 25. What concept the Hadoop framework works? Hadoop Framework works on the following two core components- HDFS Hadoop Distributed File System: It is the java based file system for scalable and reliablestorage of large datasets. Data in HDFS is stored in the form of blocks and it operates on the MasterSlave Architecture. HadoopMapReduce: This is a java based programming paradigm of Hadoop framework thatprovides scalability across various Hadoop clusters.

26. What are the main components of a Hadoop Application? Hadoop applications have wide range of technologies that provide great advantage in solving complexbusiness problems. Core components of a Hadoop application are- Hadoop Common HDFS HadoopMapReduce YARN Data Access Components are - Pig and Hive Data Storage Component is Hbase Data Integration Components are - Apache Flume, Sqoop. Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper. Data Serialization Components are - Thrift and Avr Data Intelligence Components are - Apache Mahout and Drill. 27. Whenever a client submits a hadoop job, who receives it? NameNode receives the Hadoop job which then looks for the data requested by the client andprovides the block information. JobTracker takes care of resource allocation of the hadoop job to ensure timely completion. 28. What is partitioning, shuffle and sort phase. Shuffle Phase: Once the first map tasks are completed, the nodes continue to perform several other map tasksand also exchange the intermediate outputs with the reducers as required. This process of moving theintermediate outputs of map tasks to the reducer is referred to as Shuffling. Sort Phase: HadoopMapReduce automatically sorts the set of intermediate keys on a single node before theyare given as input to the reducer. Partitioning Phase: The process that determines which intermediate keys and value will be received by eachreducer instance is referred to as partitioning. The destination partition is same for any key irrespective of themapper instance that generated it. 29. Distinguish between HBase and Hive. HBase and Hive both are completely different hadoop based technologies- Hive is a data warehouse infrastructure on top of Hadoop, whereas HBase is a NoSQL key value storethat runs on top of Hadoop.

Hive helps SQL savvy people to run MapReduce jobs whereas HBase supports 4 primary operationsput,get, scan and delete. HBase is ideal for real time querying of big data where Hive is an ideal choice for analytical querying ofdata collected over period of time. 30. Distinguish between Hadoop 1.x and Hadoop 2.x In Hadoop 1.x, MapReduce is responsible for both processing and cluster management whereas inhadoop 2.x processing is taken care of by other processing models and YARN is responsible for clustermanagement. Hadoop 2.x scales better when compared to Hadoop 1.x with close to 10000 nodes per cluster. Hadoop 1.x has single point of failure problem and whenever the NameNode fails it has to be recoveredmanually. However, in case of Hadoop 2.x StandByNameNode overcomes the problem and wheneverthe NameNode fails it is configured for automatic recovery. PART B 1. Describe about Database Design and Database Modelling. 2. Explain detail about normalization with suitable examples. 3. Explain about JDBC Drivers, and how to access their database? 4. Explain Hadoop Eco systems. 5. Write short notes of following YARN NoSQL Hive