CS6703 GRID & CLOUD COMPUTING UNIT I INTRODUCTION PART-A

Similar documents
S.No QUESTIONS COMPETENCE LEVEL UNIT -1 PART A 1. Illustrate the evolutionary trend towards parallel distributed and cloud computing.

CS6703 GRID AND CLOUD COMPUTING. Question Bank Unit-I. Introduction

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

OPEN SOURCE GRID MIDDLEWARE PACKAGES

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Chapter 3 Virtualization Model for Cloud Computing Environment

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

Introduction to Grid Technology

Distributed Systems COMP 212. Lecture 18 Othon Michail

Sentinet for Microsoft Azure SENTINET

Large Scale Sky Computing Applications with Nimbus

<Insert Picture Here> Enterprise Data Management using Grid Technology

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Managing and Auditing Organizational Migration to the Cloud TELASA SECURITY

MapReduce. U of Toronto, 2014

Grid Middleware and Globus Toolkit Architecture

BigData and Map Reduce VITMAC03

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Chapter 5. The MapReduce Programming Model and Implementation

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Client Server & Distributed System. A Basic Introduction

Large Scale Computing Infrastructures

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

Data Centers and Cloud Computing

Grid Architectural Models

Design The way components fit together

Cloud Essentials for Architects using OpenStack

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Service Mesh and Microservices Networking

Design The way components fit together

IBM Bluemix compute capabilities IBM Corporation

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Sentinet for BizTalk Server SENTINET

CSE6331: Cloud Computing

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Cloud-Security: Show-Stopper or Enabling Technology?

Fundamental Concepts and Models

CLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department

Aneka Dynamic Provisioning

Network Services, Cloud Computing and Virtualization

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Unit 5: Distributed, Real-Time, and Multimedia Systems

CHEM-E Process Automation and Information Systems: Applications

Globus GTK and Grid Services

Accelerate Your Enterprise Private Cloud Initiative

Topics of Discussion

How to Keep UP Through Digital Transformation with Next-Generation App Development

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Migration and Building of Data Centers in IBM SoftLayer

Optimizing Pulse Secure Access Suite with Pulse Secure Virtual Application Delivery Controller solution

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Introduction to Hadoop and MapReduce

Overview SENTINET 3.1

Chapter 4. Fundamental Concepts and Models

AWS Solution Architecture Patterns

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Centers and Cloud Computing. Data Centers

Lecture 11 Hadoop & Spark

Knowledge Discovery Services and Tools on Grids

Faculté Polytechnique

#techsummitch

CLOUD COMPUTING. Rajesh Kumar. DevOps Architect.

Science Computing Clouds.

P a g e 1. Teknologisk Institut. Online kursus k SysAdmin & DevOps Collection

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

DEEP DIVE INTO CLOUD COMPUTING

Cloud Computing and Service-Oriented Architectures

Introduction to Distributed Systems (DS)

Distributed Systems 16. Distributed File Systems II

Introduction To Cloud Computing

Distributed Computing. Santa Clara University 2016

2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Infrastructure Provisioning with System Center Virtual Machine Manager

IBM WebSphere Business Integration Event Broker and Message Broker V5.0

PCI DSS Compliance. White Paper Parallels Remote Application Server

Next-Generation Cloud Platform

Introduction to MapReduce

Architecting Microsoft Azure Solutions (proposed exam 535)

Modelos de Negócio na Era das Clouds. André Rodrigues, Cloud Systems Engineer

6 Cloud/Grid Computing

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

An Introduction to GPFS

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Web Services. Lecture I. Valdas Rapševičius. Vilnius University Faculty of Mathematics and Informatics

Multi Packed Security Addressing Challenges in Cloud Computing

THE IMPACT OF E-COMMERCE ON DEVELOPING A COURSE IN OPERATING SYSTEMS: AN INTERPRETIVE STUDY

OPENSTACK PRIVATE CLOUD WITH GITHUB

Easy Access to Grid Infrastructures

Important DevOps Technologies (3+2+3days) for Deployment

INFS 214: Introduction to Computing

Transcription:

CS6703 GRID & CLOUD COMPUTING UNIT I INTRODUCTION PART-A 1. What are the computing Paradigm Distinctions? Centralized computing Parallel Computing Distributed Computing Cloud Computing 2. What is meant by Centralized Computing? This is a computing paradigm by which all computer resources are centralized in one physical system. All resources (processors, memory, and storage) are fully shared and tightly coupled within one integrated OS. 3. What is meant by Parallel Computing? In parallel computing, all processors are either tightly coupled with centralized shared memory or loosely coupled with distributed memory. Interprocessor communication is accomplished through shared memory or via message passing. A computer system capable of parallel computing is commonly known as a parallel computer. Programs running in a parallel computer are called parallel programs. The process of writing parallel programs is often referred to as parallel programming. 4. What is meant by Distributed computing? A distributed system is a network of autonomous computers that communicate with each other in order to achieve a goal. KCG College Of Technology information Technology Question Bank IV Year, Sem-7

The computers in a distributed system are independent and do not physically share memory or processors. They communicate with each other using messages, pieces of information transferred from one computer to another over a network. 5. What is meant by Cloud Computing? An Internet cloud of resources can be either a centralized or a distributed computing system. The cloud applies parallel or distributed computing, or both. Clouds can be built with physical or virtualized resources over large data centers that are centralized or distributed. 6. What are the various degrees of parallelism? Bit level parallelism (BLP) Instruction Level Parallelism (ILP) Data Level Parallelism (DLP) Task Level Parallelism (TLP) Job Level Parallelism (JLP) 7. What are the applications of High Performance and High-Throughput Systems? 8. What is meant by High Throughput Computing (HTC)? High-throughput computing (HTC) describes the use of many computing resources over long periods of time to accomplish a computational task. By using distributed computing enables lots of jobs to be scheduled to available resources to complete as fast as possible. KCG College Of Technology information Technology Question Bank IV Year, Sem-7

9. What is meant by High Performance Computing (HPC)? High-performance computing (HPC) is the use of super computers and parallel processing techniques for solving complex computational problems. HPC technology focuses on developing parallel processing algorithms and systems by incorporating both administration and parallel computational techniques. 10. What is Internet of Things (IoT)? IoT refers to the networked interconnection of everyday objects, tools, devices or computers. Each object is assigned an IP an address(ipv6) to distinguish from one another. 11. What are the 3Cs of Cyber Physical System? CPS integrates Cyber (heterogenous, asynchronous) with physical (concurrent and information dense) objects. A CPS merges the 3C technologies of Computation, Communication and Control. 12. What is meant by Multicore processor? A multi-core processor is an integrated circuit (IC) to which two or more processors have been attached for enhanced performance, reduced power consumption, and more efficient simultaneous processing of multiple tasks. 13. What is Graphic Processing Unit (GPU)? A Graphics Processing Unit (GPU) is a single-chip processor primarily used to manage and boost the performance of video and graphics. GPU features include 2-D or 3-D graphics Digital output to flat panel display monitors Texture mapping Application support for high-intensity graphics, etc These features are designed to lessen the work of the CPU and produce faster video and graphics. A GPU is not only used in a PC on a video card or motherboard; it is also used in mobile phones, display adapters, workstations and game consoles. KCG College Of Technology information Technology Question Bank IV Year, Sem-7

14. What are the different forms of Multithreaded Processor? 4-issue superscalar processor Fine grained multithreaded Processor Coarse grained multithreaded Processor Simultaneous Multithreaded (SMT) Processor 15. What is the difference between Storage area network (SAN) and Network attached Storage (NAS)? A SAN connects servers to network storage such as disk arrays where as NAS connects clients hosts directly to the disk arrays. 16. What is a hypervisor? A hypervisor, also called a virtual machine manager, is a program that allows multiple operating systems to share a single hardware host. Each operating system appears to have the host's processor, memory, and other resources all to itself. 17. What are the types of hypervisor? Type 1 hypervisors run directly on the system hardware. They are often referred to as a "native" or "bare metal" or "embedded" hypervisors in vendor literature. Type 2 hypervisors run on a host operating system. 18. What is a Virtual Machine (VM)? A virtual machine (VM) is a software program or operating system that not only exhibits the behavior of a separate computer, but is also capable of performing tasks such as running applications and programs like a separate computer. A virtual machine, usually known as a guest is created within another KCG College Of Technology information Technology Question Bank IV Year, Sem-7

computing environment referred as a "host." Multiple virtual machines can exist within a single host at one time.a virtual machine is also known as a guest. 19. What is a grid system? Interconnected computer systems where the machines utilize the same resources collectively. Grid computing usually consists of one main computer that distributes information and tasks to a group of networked computers to accomplish a commongoal. Grid computing is often used to complete complicated or tedious mathematical or scientific calculations. 20. How is grid Systems classified? Grid Systems are classified into two categories.they are Computational or data grids and P2P grids. Data grid: A data grid is a set of structured services that provides multiple services like the ability to access, alter and transfer very large amounts of geographically separated data, especially for research and collaboration purposes. Data from different regions are pulled from administrative domains which filter data for security purposes, and present it to the user upon request by means of a middleware application. P2P Systems: Every node acts as both client and a server, providing part of the system resources. Peer machines are simply client computers connected to the Internet. All client machines act autonomously to join or leave the system freely. 21. List out P2P Application Families? KCG College Of Technology information Technology Question Bank IV Year, Sem-7

22. What are the differences between Grid computing and cloud computing? Grid computing What? Grids enable access to Cloud computing shared Clouds enable access to leased computing power and storage capacity computing power and storage Who provides service? your desktop from your desktop the from Research institutes and universities capacity Large individual companies federate their services around the e.g. Amazon and Microsoft. world. Who uses the service? Research collaborations, called Small to medium commercial "Virtual Organizations", which bring businesses or researchers with together researchers around the world generic IT needs Who pays for the Governments providers working in the-same field.and users are The cloud provider pays for service? usually publicly funded research the computing resources; the organizations. user pays to use them 23. What is Service Oriented Architecture (SOA)? A service-oriented architecture is essentially a collection of services. These services communicate with each other. The communication can involve either simple data passing or it could involve two or more services coordinating some activity. 24. List out the access model for organizing a data grid? Monadic Model Hierarchical Model Federation Model Hybrid Model 25. List out the Resources to perform grid computing? Compute Resources Code Repositories Storage Resources Service Catalogs Network Resources KCG College Of Technology information Technology Question Bank IV Year, Sem-7

PART B 1. Explain about Grid Computing Infrastructures 2. Explain in detail about technologies for network based systems 3. Explain in detail about SOA 4. Explain in detail Grid Architecture KCG College Of Technology information Technology Question Bank IV Year, Sem-7

UNIT II -GRID SERVICES PART-A 1. What is OGSA in grid computing? Open Grid Services Architecture (OGSA) is a set of standards that extends Web services and service-oriented architecture to the grid computing environment. OGSA definitions and criteria describe how information is shared and distributed among the components of large, heterogeneous grid systems; they apply to hardware, platforms and software. It was developed within the Open Grid Forum, which was called the Global Grid Forum (GGF) 2. What is the intention/motivation to go for OGSA? 1. Facilitate use and management of resources across distributed, heterogeneous environments 2. Deliver seamless QoS 3. Define open, published interfaces in order to provide interoperability of diverse resources 4. Exploit industry-standard integration technologies 5. Develop standards that achieve interoperability 6. Integrate, virtualize, and manage services and resources in a distributed, heterogeneous environment 7. Deliver functionality as loosely coupled, interacting services aligned with industryaccepted web service standards. 3. What are the layers of OGSA architecture? 1. Infrastructure Services 4. Resource Management Services 2. Execution Management Services 5. Security Services 3. Data Management Services 6. Information Services 7. Self-Management Services KCG College Of Technology information Technology Question Bank IV Year, Sem-7

4. What is OGSI? describes the procedure for creating, managing and exchanging data among entities known as grid services 5. What is grid service? is described as a web service that adapts to a set of interfaces and behaviours which defines the client interaction with the grid service. 6. What are functional requirements on OGSA in grid computing 1. Basic functions: includes discovery and brokering, virtual organizations, data sharing, monitoring and policy 2. Security functions: includes multiple security infrastructures, authentication, authorization, accounting and instantiate new services 3. Resource management functions: includes advance reservation, notification/messaging, scheduling, load balancing, logging, disaster recovery, workflow management, fault tolerance and self-healing capabilities 7. What are the two fundamental requirements for describing Web services based on the OGSI. 1. The ability to describe interface inheritance a basic concept with most of the distributed object systems. 2. The ability to describe additional information elements with the interface definition 8. What is grid service instance? A grid service instance is a (potentially transient) service that conforms to a set of conventions, expressed as WSDL interfaces, extensions, and behaviors, for such purposes as lifetime management, discovery of characteristics, and notification. Page 9

9. What is OGSA services OGSA services build on OGSI mechanisms to define interfaces and associated behaviors for various functions not supported directly within OGSI, such as service discovery, data access, data integration, messaging, and monitoring 10. What is a core service? Core services are implementations of functions that are generally used by a wide variety of higher-level services and that implement broadly useful capabilities. 11. What are core services? /Service Interaction Service Management, Service Communication Security 12. What is Program Execution Program-execution services enable applications to have coordinated access to underlying VO resources, regardless of their physical location or access mechanisms 13. What are the services provided by program execution Agreement Factory Service Job Agreement Service Reservation Agreement Service Data Access Agreement Service Queuing Service Index Service 14. What is Service Composition? A service composition is a grid service that provides a new set of functions that are derived from, built on, extended from, and/or implemented using functions exposed by other grid services 15. What is Service orchestration?these interfaces provide ways to describe and manage the choreography of a set of interacting services. Page 10

16. List the types of relationships OGSA services can be related via uses relationship, and extends relationship. 17. What is uses relationship? a first service accesses the interface of a second service to use the functionality provided by this second service. 18. What is extends relationship? a first service extends the functionality provided by a second service by using porttype extensibility. 19. What is a platform service? Denotes services that provide functionalities that are basic. Platform services i. provide underlying functionalities on which other services build ii. provide functionalities that are common to (and used by) several high-level services iii. provide functionalities that are designed to be used primarily through the extends relationship 20. What is billing and payment services? refers to the financial service that actually carries out the transfer of money; for example, a credit card authorization service. PART-B 1. Draw The Architecture Of OGSA And Explain Its Components 2. Functionality Requirements 3. What is OGSA/OGSI? A More Detailed View 4. List OGSA Services Page 11

UNIT III VIRTUALIZATION PART-A 1. What are deployment models? a) Private b) Public c) Hybrid d)community 2. What is public deployment model? is a huge data centre that offers the same services to all its users. The services are accessible for everyone and used for consumer segement Eg., facebook, google,linkedin 3. What is private deployment model? A private cloud is built within the domain of an intranet owned by a single organization. it is a client owned and managed, and its access is limited to the owning clients and their partner 4. What is hybrid deployment model? A hybrid cloud is built with both public and private clouds. the Research Compute Cloud (RC2) is a private cloud, built by IBM, that interconnects the computing and IT resources at eight IBM Research Centers scattered throughout the United States, Europe, and Asia. 5. What is community deployment model? more than one group with common and specific needs shares the cloud infrastructure This can include environments such as a U.S. federal agency cloud with stringent security requirements, or a health and medical cloud with regulatory and policy requirements for privacy matters. Page 12

6. List categories of cloud computing?/three layers of cloud computing? IaaS - Infrastructure as a Service, PaaS - Platform as a Service, SaaS - Software as a Service 7. Define IaaS? allows users to rent the infrastructure itself: servers, data center space, and software, network equipment such as routers/switches. 8. Define PaaS? is a category of cloud computing that provides a platform and environment to allow developers to build applications and services over the internet. PaaS services are hosted in the cloud and accessed by users simply via their web browser. 9. Define SaaS? is a software distribution model in which applications are hosted by a vendor or service provider and made available to customers over a network, typically the Internet. 10. List some of public cloud offerings as IaaS? Amazon EC2, GoGrid, Raackspace Cloud, FlexiScale in the UK, Joyent Cloud 11. List some of public cloud offerings as PaaS? Google App Engine, Salesfocre.com s, force.com, Microsoft Azure, Amazon Elastic MapReduce, Aneka 12. What are the Benefits of deployment model? Page 13

13. List the design objectives of cloud computing? 1. Shifting computing from desktops to data centers 2. Service provisioning and cloud economics 3. Scalability in performance 4. Data privacy protection 5. High quality of cloud services 6. New Standards and interfaces 14. What is cloud ecosystem Cloud ecosystem is a term used to describe the complex system of interdependent components that work together to enable cloud services. 15. Advantages and disadvantages of cloud computing Advantages Disadvantages Easy implementation No longer in control Accessibility May not get all the features No hardware required Doesn't mean you should do away with servers Cost per head No Redundancy Flexibility for growth. Bandwidth issues. Efficient recovery 16. What is virtualization in cloud computing Virtualization is a software that creates virtual (rather than actual) version of something, such as an operating system, a server, a storage device or network resource. It is the fundamental technology that powers cloud computing. 17. Difference between virtualization and cloud computing virtualization differs from cloud computing because virtualization is software that manipulates hardware, while cloud computing refers to a service that results from that manipulation. Page 14

18. Define virtual machine manager virtual machine monitor (VMM) or virtual manager, which separates compute environments from the actual physical infrastructure. Or is the link between the gateway and resources. 19. What is virtual machine template A Virtual Machine Manager template provides a standardized group of hardware and software settings that can be used repeatedly to create new virtual machines configured with those settings Or A VM template is analogous to a computer s configuration and contains a description for a VM 20. List the implementation levels of virtualization 1. Instruction set architecture(isa) level 2. Hardware abstraction layer(hal) level 3. Operating System Level 4. Library(user-level API) level 5. Application level 21. Merits of virtualization at various levels X s Means Higher Merit, with a Maximum of 5 X s) Page 15

22. What are the three structures of virtualization 1. hypervisor architecture/vmm(virtual Machine Monitor). 2. para-virtualization 3. host-based virtualization 23. Define hypervisor architecture A hypervisor or virtual machine monitor (VMM) is a piece of computer software, firmware or hardware that creates and runs virtual machines. Or is a program that allows multiple operating systems to share a single hardware host. 24. Define para-virtualization paravirtualization is a virtualization technique that presents a software interface to virtual machines that is similar, but not identical to that of the underlying hardware. 25. What are the two types of hypervisor 1. micro-kernel architecture 2. monolithic hypervisor architecture 26. What is Hardware Support for Virtualization/hardware assisted virtualization Hardware virtualization refers to the creation of virtual (as opposed to concrete) versions of computers and operating systems. This technology was developed by Intel and AMD for their server platforms 27. What is CPU virtualization? CPU virtualization involves a single CPU acting as if it were two separate CPUs. 28. What is a physical cluster? Physical cluster is a collection of servers (physical machines) interconnected by a physical network such as a LAN Page 16

29. What are the design issues in virtual cluster 1. live migration of VMs 2. memory and file migrations 3. dynamic deployment PART-B 1. List deployment models with example 2. What are the services provided by cloud? Explain in detail 3. Explain implementation levels of virtualization 4. Explain virtualization structure with diagram 5. Explain virtualization of CPU, Memory and I/O devices \ 6. What are the types of cluster and explain about virtual clusters and Resource Management 7. Explain in detail about Virtualization for data center automation. Page 17

UNIT IV -PROGRAMMING MODEL PART-A 1. What are the grid middleware packages? Package Description BONIC Berkeley Open Infrastructure for Network Computing. UNICORE Middleware developed by the German grid Computing Community Globus(GT4) A middleware library developed by Argonne National La., Univ of Chicago and USC information Science Institute CGSP in ChinaGrid The CGSP is a middleware library developed by 20 top universities in china as part of the ChinaGrid Project Condor-G Sun Developed at the Univ. of Wisconsin Grid Developed by Sun Microsystems for business grid applications. Engine(SGE) 2. What is Globus Tool kit? The open source Globus Toolkit is a fundamental enabling technology for the "Grid," which allows sharing computing power, databases, and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing local autonomy. The toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. 3. What are the functional modules in Globus GT4 Library? Service Functionality Module Functional Description Name Global Manager Resource Allocation GRAM Grid Resource Management(HTTP-based) Page 18 Access and

Communication Nexus Unicast and Multicast Communication Grid Security Infrastructure GSI Authentication and related security services Monitory and Discovery Service MDS Distributed access to structure and state information Health and Status HBM Heartbeat monitoring of system components Global Access of Secondary GASS Grid access of data in remote secondary Storage storage Grid File Transfer GridFTP Inter-node fast file transfer 4. List out GT4 tools GridFTP RFT(Reliable File Transfer) RLS(Replica Location Service) OGSA-DAI(Globus Data Access and Integration) 5. Write down the Key terms used in Globus Endpoint-a logical address for a GridFTP server, similar to a domain name for a web server. Data is transferred between Globus endpoints. Globus Connect Personal for individual users a client for communicating with other GridFTP servers, via your local computer using Globus. Globus Connect Server for multiuser environments a Linux package that sets up a GridFTP server for use with Globus. 6. What is HADOOP? Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Page 19

7. What are the base nodes for a Hadoop cluster? NameNode DataNode 8. What is a Name node? NameNode is the centerpiece of HDFS. NameNode is also known as the Master NameNode only stores the metadata of HDFS the directory tree of all files in the file system, and tracks the files across the cluster. NameNode does not store the actual data or the dataset. The data itself is actually stored in the DataNodes. NameNode knows the list of the blocks and its location for any given file in HDFS. With this information NameNode knows how to construct the file from blocks. NameNode is so critical to HDFS and when the NameNode is down, HDFS/Hadoop cluster is inaccessible and considered down. NameNode is a single point of failure in Hadoop cluster. NameNode is usually configured with a lot of memory (RAM). Because the block locations are help in main memory. 9. What is a Data Node? DataNode is responsible for storing the actual data in HDFS. DataNode is also known as the Slave NameNode and DataNode are in constant communication. When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for. When a DataNode is down, it does not affect the availability of data or the cluster. NameNode will arrange for replication for the blocks managed by the DataNode that is not available. DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode. Page 20

10. What is MapReduce? MapReduce is a processing technique and a program model for distributed computing based on java. 11. How does MapReduce works? The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map function: Written by the client, takes an input pair and makes a set of Intermediate Key/Value Pairs. The MapReduce library assembles all intermediate values affiliated with the identical intermediate key I simultaneously and passes them to the Reduce function. Reduce Function: Reduce function, written by the client, acknowledges an intermediate key I and a set of values for that key. It merges these values to pattern a lesser set of values. The intermediate values are provided to the users reduce function by an iterator. 12. What is HDFS? HDFS is a distributed file system implemented on Hadoop s framework designed to store vast amount of data on low cost commodity hardware and ensuring high speed process on data. It s notion is Write Once Read Multiple times. 13. What are the goals of HDFS? Hardware failure Detection of faults and quick, automatic recovery Streaming data access High throughput of data access Large data Sets- Gigabytes to terabytes in size Simple Coherency Model Write Once read many access model for files Moving Computation is cheaper than moving data Portability over heterogeneous hardware and software platforms 14. What are the important components in HDFS Architecture? Blocks Name Node Data Node Page 21

15. What is HDFS Blocks? HDFS is a block structured file system. Each HDFS file is broken into blocks of fixed size usually 128 MB which are stored across various data nodes on the cluster. Each of these blocks is stored as a separate file on local file system on data nodes. Thus to access a file on HDFS, multiple data nodes need to be referenced and the list of the data nodes which need to be accessed is determined by the file system metadata stored on Name Node. 16. What is Input Splitting? Input split: It is part of input processed by a single map. Each split is processed by a single map. In other words InputSplit represents the data to be processed by an individual Mapper. Each split is divided into records, and the map processes each record, which is a key value pair. Split is basically a number of rows and record is that number 17. What are the Limitations of HDFS? HDFS is designed on the notion of Write Once, Read multiple times, once a file is written to HDFS, and then it can t be updated. But delete, append, and read Operations can be performed on HDFS files. HDFS is not suitable for large number of small sized files but best suits for large sized files. 18. What are the characteristics that distinguish HDFS from other file system? High Fault Tolerance HDFS High Throughput Access to Large Data Sets(Files) HDFS Operations. 19. What are the modes where Hadoop can run? Standalone mode Pseudo distributed mode (Or) Single node cluster Page 22

Fully distributed mode (Or) Multiple node cluster 20. What are the interfaces in Hadoop file system? File System shells Java API Web HDFS Libhdfs 21. List the categories of command line tools in Hadoop HDFS? File System Shell Interface Admin tool for HDFS PART B 1. Explain Globus Toolkit 2. Explain Hadoop Framework 3. Write about Mapreduce, Input Splitting, map and reduce functions 4. Explain about data flow of File read & File Write. Page 23

UNIT V SECURITY PART-A 1. What is Grid Security? Security architecture to enable dynamic, scalable,and distributed VOs protect resources for resourceproviders, computing entities for VOs,and end-processing for end-users through Authentication,Delegation,Authorization,Confidentiality,Privacy, etc 2. What are the Three challenges outlined to establish the trust among grid sites 1) is integration with existing systems and technologies. 2) is interoperability with different hosting environments 3) construct trust relationships among interacting hosting environments 3. List the trust model? Generalized Trust Model PKI-based model Reputation-based model. A Fuzzy-Trust Model 4. What is authentication? a mechanisms to ensure that data is kept secure from unauthorized access 5. What is Grid authentication? is a method of securing user logins by requiring the user to enter values from specific cells in a grid whose content should be only accessible to him and the service provider. Because the grid consists of letters and numbers in rows and columns, the method is sometimes referred to as bingo card authentication. 6. What are the authentication methods? Authentication methods in the grid include passwords, PKI, and Kerberos. Page 24

7. Define PKI To implement PKI, we use a trusted third party, called the certificate authority (CA). Each user applies a unique pair of public and private keys. The public keys are issued by the CA by issuing a certificate, after recognizing a legitimate user. The private key is exclusive for each user to use, and is unknown to any other users. A digital certificate in IEEE X.509 format consists of the user name, user public key, CA name, and a secrete signature of the user. 8. What is authorization? is a process to exercise access control of shared resources 9. What is central authority? is a special entity which is capable of issuing and revoking polices of access rightsgranted to remote accesses. 10. List the classification of authority Attribute authorities issue attribute assertions policy authorities issue authorization policies identity authorities issue certificates. 11. What are authorization models subject-push model resource-pulling model authorization agent model 12. What is subject-push model User conducts handshake with the authority first and then with the resource site in a sequence. Page 25

13. What is resource-pulling model user checks the resource first. Then the resource contacts its authority to verify the request,and the authority authorizes at 14. What is GSI? portion of the Globus Toolkit and provides fundamental security services needed to support grids, including supporting for message protection, authentication and delegation, and authorization. GSI enables secure authentication and communication over an open network, and permits mutual authentication across and among distributed sites with single sign-on capability. 15. What are the functions of GSI Message protection, authentication, delegation, andauthorization. 16. What are the levels of GSI functional layers Transport-Level Security Message-Level Security 17. Layers of cloud security 1. Host security 2. Network security 3. Application Security 18. What is network level security With private clouds, there are no new attacks, vulnerabilities, or changes in risk specific to this topology that information security personnel need to consider. If public cloud services are chosen, changing security requirements will require changes to the network topology and the manner in which the existing network topology interacts with the cloud provider's network topology should be taken into account Page 26

19. Explain host security The host security responsibilities in SaaSand PaaS services are transferred to the provider of cloud services. IaaS customers are primarily responsible for securing the hosts provisioned in the cloud. 20. Explain application level security The level is responsible for managing: Application-level security threats End user security SaaS application security PaaS application security Customer-deployed application security IaaS application security Public cloud security limitations 21. What is Identity and Access Management (IAM)? Identity and Access Management (IAM) is a framework consisting of technical, policy, and governance components that allow an organization to: identify individuals link identities with roles, responsibilities and affiliations assign privileges, access, and entitlements based on identity and associations 22. What are the four main areas of IAM. Credentialing: (assignment of an unique token to an entity needing access to resources) Authentication: (act of validating proof of identity) Authorization: (act of affording access to only appropriate resources and functions) Accountability: (ensuring against illegitimate utilization of an entity s authority flows from the first 3 functions). Page 27

PART-B 1. What are the Trust models for Grid security environment. 2. explain about Authentication and Authorization methods 3. Explain about Grid security infrastructure and Cloud Infrastructure security 4. How IAM practices in the cloud, SaaS, PaaS, IaaS availability in the cloud. 5. List the Key privacy issues in the cloud. 6. Difference between grid security and cloud security Page 28