1z0-449.exam. Number: 1z0-449 Passing Score: 800 Time Limit: 120 min File Version: Oracle. 1z0-449

Size: px

Start display at page:

Download "1z0-449.exam. Number: 1z0-449 Passing Score: 800 Time Limit: 120 min File Version: Oracle. 1z0-449"

Patrick Anderson
5 years ago
Views:

1 1z0-449.exam Number: 1z0-449 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Oracle 1z0-449 Oracle Big Data 2017 Implementation Essentials Version 1.0

2 Exam A QUESTION 1 The NoSQL KVStore experiences a node failure. One of the replicas is promoted to primary. How will the NoSQL client that accesses the store know that there has been a change in the architecture? A. The KVLite utility updates the NoSQL client with the status of the master and replica. B. KVStoreConfig sends the status of the master and replica to the NoSQL client. C. The NoSQL admin agent updates the NoSQL client with the status of the master and replica. D. The Shard State Table (SST) contains information about each shard and the master and replica status for the shard. Correct Answer: D /Reference: Given a shard, the Client Driver next consults the Shard State Table (SST). For each shard, the SST contains information about each replication node comprising the group (step 5). Based upon information in the SST, such as the identity of the master and the load on the various nodes in a shard, the Client Driver selects the node to which to send the request and forwards the request to the appropriate node. In this case, since we are issuing a write operation, the request must go to the master node. Note: If the machine hosting the master should fail in any way, then the master automatically fails over to one of the other nodes in the shard. That is, one of the replica nodes is automatically promoted to master. References: QUESTION 2 Your customer is experiencing significant degradation in the performance of Hive queries. The customer wants to continue using SQL as the main query language for the HDFS store. Which option can the customer use to improve performance? A. native MapReduce Java programs

3 B. Impala C. HiveFastQL D. Apache Grunt Correct Answer: B /Reference: Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. References: QUESTION 3 Your customer keeps getting an error when writing a key/value pair to a NoSQL replica. What is causing the error? A. The master may be in read-only mode and as result, writes to replicas are not being allowed. B. The replica may be out of sync with the master and is not able to maintain consistency. C. The writes must be done to the master. D. The replica is in read-only mode. E. The data file for the replica is corrupt. Correct Answer: C /Reference: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations. Note: Oracle NoSQL Database provides multi-terabyte distributed key/value pair storage that offers scalable throughput and performance. That is, it services network requests to store and retrieve data which is organized into key-value pairs. References:

4 QUESTION 4 The log data for your customer's Apache web server has seven string columns. What is the correct command to load the log data from the file 'sample.log' into a new Hive table LOGS that does not currently exist? A. hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; B. hive> create table logs as select * from sample.log; C.hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO TABLE logs; D.hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO TABLE logs; hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; E. hive> create table logs as load sample.1og from hadoop; Correct Answer: C /Reference: The CREATE TABLE command creates a table with the given name. Load files into existing tables with the LOAD DATA command. References: QUESTION 5 Your customer s Oracle NoSQL store has a replication factor of 3. One of the customer s replica nodes goes down. What will be the long-term performance impact on the customer s NoSQL database if the node is replaced? A. There will be no performance impact. B. The database read performance will be impacted. C. The database read and write performance will be impacted. D. The database will be unavailable for reading or writing. E. The database write performance will be impacted. Correct Answer: C

5 /Reference: The number of nodes belonging to a shard is called its Replication Factor. The larger a shard's Replication Factor, the faster its read throughput (because there are more machines to service the read requests) but the slower its write performance (because there are more machines to which writes must be copied). Note: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service readonly operations. References: QUESTION 6 Your customer is using the IKM SQL to HDFS File (Sqoop) module to move data from Oracle to HDFS. However, the customer is experiencing performance issues. What change should you make to the default configuration to improve performance? A. Change the ODI configuration to high performance mode. B. Increase the number of Sqoop mappers. C. Add additional tables. D. Change the HDFS server I/O settings to duplex mode. Correct Answer: B /Reference: Controlling the amount of parallelism that Sqoop will use to transfer data is the main way to control the load on your database. Using more mappers will lead to a higher number of concurrent data transfer tasks, which can result in faster job completion. However, it will also increase the load on the database as Sqoop will execute more concurrent queries. References: QUESTION 7 What is the result when a flume event occurs for the following single node configuration?

A. The event is written to memory. B. The event is logged to the screen. C. The event output is not defined in this section. D. The event is sent out on port 44444. E.

6 A. The event is written to memory. B. The event is logged to the screen. C. The event output is not defined in this section. D. The event is sent out on port E. The event is written to the netcat process. Correct Answer: B /Reference: This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. Note: A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores.

7 A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events. Incorrect Answers: D: port 4444 is part of the source, not the sink. References: QUESTION 8 What kind of workload is MapReduce designed to handle? A. batch processing B. interactive C. computational D. real time E. commodity Correct Answer: A /Reference: Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-tracker,job-tracker etc. References: QUESTION 9 Your customer uses LDAP for centralized user/group management. How will you integrate permissions management for the customer s Big Data Appliance into the existing architecture?

8 A. Make Oracle Identity Management for Big Data the single source of truth and point LDAP to its keystore for user lookup. B. Enable Oracle Identity Management for Big Data and point its keystore to the LDAP directory for user lookup. C. Make Kerberos the single source of truth and have LDAP use the Key Distribution Center for user lookup. D. Enable Kerberos and have the Key Distribution Center use the LDAP directory for user lookup. Correct Answer: D /Reference: Kerberos integrates with LDAP servers allowing the principals and encryption keys to be stored in the common repository. The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which will then link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket. References: QUESTION 10 Your customer collects diagnostic data from its storage systems that are deployed at customer sites. The customer needs to capture and process this data by country in batches. Why should the customer choose Hadoop to process this data? A. Hadoop processes data on large clusters (10-50 max) on commodity hardware. B. Hadoop is a batch data processing architecture. C. Hadoop supports centralized computing of large data sets on large clusters. D. Node failures can be dealt with by configuring failover with clusterware. E. Hadoop processes data serially. Correct Answer: B /Reference: Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-tracker,job-tracker etc. Incorrect Answers: A: Yahoo! has by far the most number of nodes in its massive Hadoop clusters at over 42,000 nodes as of July 2011.

9 C: Hadoop supports distributed computing of large data sets on large clusters E: Hadoop processes data in parallel. References: QUESTION 11 Your customer wants to architect a system that helps to make real-time recommendations to users based on their past search history. Which solution should the customer use? A. Oracle Container Database B. Oracle Exadata C. Oracle NoSQL D. Oracle Data Integrator Correct Answer: D /Reference: Oracle Data Integration (both Oracle GoldenGate and Oracle Data Integrator) help to integrate data end-to-end between big data (NoSQL,Hadoop-based) environments and SQL-based environments. These data integration technologies are the key ingredient to Oracle s Big Data Connectors. Oracle Big Data Connectors provide integration to from Oracle Big Data Appliance to relational Oracle Databases where in-database analytics can be performed. Oracle s data integration solutions speed the loads of the Connecting Visibility to Value Oracle Exadata Database Machine by 500% while providing continuous access to business critical information across heterogeneous sources. References: QUESTION 12 How should you control the Sqoop parallel imports if the data does not have a primary key? A. by specifying no primary key with the --no-primary argument B. by specifying the number of maps by using the m option C. by indicating the split size by using the --direct-split-size option D. by choosing a different column that contains unique data with the --split-by argument Correct Answer: D

10 /Reference: If the actual values for the primary key are not uniformly distributed across its range, then this can result in unbalanced tasks. You should explicitly choose a different column with the --split-by argument. For example, --split-by employee_id. Note: When performing parallel imports, Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default, Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range. References: QUESTION 13 Your customer uses Active Directory to manage user accounts. You are setting up Hadoop Security for the customer s Big Data Appliance. How will you integrate Hadoop and Active Directory? A. Set up Kerberos Key Distribution Center to be the Active Directory keystore. B. Configure Active Directory to use Kerberos Key Distribution Center. C. Set up a one-way cross-realm trust from the Kerberos realm to the Active Directory realm. D. Set up a one-way cross-realm trust from the Active Directory realm to the Kerberos realm. Correct Answer: C /Reference: If direct integration with AD is not currently possible, use the following instructions to configure a local MIT KDC to trust your AD server: 1. Run an MIT Kerberos KDC and realm local to the cluster and create all service principals in this realm. 2. Set up one-way cross-realm trust from this realm to the Active Directory realm. Using this method, there is no need to create service principals in Active Directory, but Active Directory principals (users) can be authenticated to Hadoop. Incorrect Answers: B: The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which will then link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket. References: QUESTION 14 What is the main purpose of the Oracle Loader for Hadoop (OLH) Connector?

11 A. runs transformations expressed in XQuery by translating them into a series of MapReduce jobs that are executed in parallel on a Hadoop cluster B. pre-partitions, sorts, and transforms data into an Oracle ready format on Hadoop and loads it into the Oracle database C. accesses and analyzes data in place on HDFS by using external tables D. performs scalable joins between Hadoop and Oracle Database data E. provides a SQL-like interface to data that is stored in HDFS F. is the single SQL point-of-entry to access all data Correct Answer: B /Reference: Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It prepartitions the data if necessary and transforms it into a database-ready format. References: QUESTION 15 Your customer has three XML files in HDFS with the following contents. Each XML file contains comments made by users on a specific day. Each comment can have zero or more likes from other users. The customer wants you to query this data and load it into the Oracle Database on Exadata. How should you parse this data?

12 A. by creating a table in Hive and using MapReduce to parse the XML data by column B. by configuring the Oracle SQL Connector for HDFS and parsing by using SerDe C. by using the XML file module in the Oracle XQuery for Hadoop Connector D. by using the built-in functions for reading JSON in the Oracle XQuery for Hadoop Connector Correct Answer: B /Reference: Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in Apache Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Delimited text files in Apache Hive tables SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting the

13 results of serialization as individual fields for processing. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats. References: QUESTION 16 Identify two ways to create an external table to access Hive data on the Big Data Appliance by using Big Data SQL. (Choose two.) A. Use Cloudera Manager's Big Data SQL Query builder. B. You can use the dbms_hadoop.create_extdd1_for_hive package to return the text of the CREATE TABLE command. C. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_BDSQL access parameter. D. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_HIVE access parameter. E. Use the Enterprise Manager Big Data SQL Configuration page to create the table. Correct Answer: BD /Reference: CREATE_EXTDDL_FOR_HIVE returns a SQL CREATE TABLE ORGANIZATION EXTERNAL statement for a Hive table. It uses the ORACLE_HIVE access driver. References: QUESTION 17 What are two of the main steps for setting up Oracle XQuery for Hadoop? (Choose two.) A. unpacking the contents of oxh-version.zip into the installation directory B. installing the Oracle SQL Connector for Hadoop C. configuring an Oracle wallet D. installing the Oracle Loader for Hadoop Correct Answer: AD /Reference: To install Oracle XQuery for Hadoop: 1. Unpack the contents of oxh-version.zip into the installation directory

14 2. To support data loads into Oracle Database, install Oracle Loader for Hadoop References: QUESTION 18 Identify two features of the Hadoop Distributed File System (HDFS). (Choose two.) A. It is written to store large amounts of data. B. The file system is written in C#. C. It consists of Mappers, Reducers, and Combiners. D. The file system is written in Java. Correct Answer: AD /Reference: HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications. The Hadoop framework, which HDFS is a part of, is itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts. References: QUESTION 19 What does the flume sink do in a flume configuration?

15 A. sinks the log file that is transmitted into Hadoop B. hosts the components through which events flow from an external source to the next destination C. forwards events to the source D. consumes events delivered to it by an external source such as a web server E. removes events from the channel and puts them into an external repository Correct Answer: D /Reference: A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it s consumed by a Flume sink. References: QUESTION 20 Your customer is spending a lot of money on archiving data to comply with government regulations to retain data for 10 years. How should you reduce your customer s archival costs? A. Denormalize the data. B. Offload the data into Hadoop. C. Use Oracle Data Integrator to improve performance. D. Move the data into the warehousing database. Correct Answer: B

/Reference: Extend Information Lifecycle Management to Hadoop For many years, Oracle Database has provided rich support for Information Lifecycle Management (ILM).

16 /Reference: Extend Information Lifecycle Management to Hadoop For many years, Oracle Database has provided rich support for Information Lifecycle Management (ILM). Numerous capabilities are available for data tiering or storing data in different media based on access requirements and storage cost considerations. These tiers may scale from 1) in-memory for real time data analysis, 2) Database Flash for frequently accessed data, 3) Database Storage and Exadata Cells for queries of operational data and 4) Hadoop for infrequently accessed raw and archive data: References: QUESTION 21 What access driver does the Oracle SQL Connector for HDFS use when reading HDFS data by using external tables? A. ORACLE_DATA_PUMP B. ORACLE_LOADER C. ORACLE_HDP D. ORACLE_BDSQL E. HADOOP_LOADER F. ORACLE_HIVE_LOADER Correct Answer: B /Reference: Oracle SQL Connector for HDFS creates the external table definition for Data Pump files by using the metadata from the Data Pump file header. It uses the ORACLE_LOADER access driver with the preprocessor access parameter. It also uses a special access parameter named EXTERNAL VARIABLE DATA, which enables ORACLE_LOADER to read the Data Pump format files generated by Oracle Loader for Hadoop.

17 References: QUESTION 22 You recently set up a customer s Big Data Appliance. At the time, all users wanted access to all the Hadoop data. Now, the customer wants more control over the data that is stored in Hadoop. How should you accommodate this request? A. Configure Audit Vault and Database Firewall protection policies for the Hadoop data. B. Update the MySQL metadata for Hadoop to define access control lists. C. Configure an /etc/sudoers file to restrict the Hadoop data. D. Configure Apache Sentry policies to protect the Hadoop data. Correct Answer: D /Reference: Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project s founding members. Sentry satisfies the following three authorization requirements: Secure Authorization: the ability to control access to data and/or privileges on data for authenticated users. Fine-Grained Authorization: the ability to give users access to a subset of the data (e.g. column) in a database Role-Based Authorization: the ability to create/apply template-based privileges based on functional roles. Incorrect Answers: C: The file /etc/sudoers contains a list of users or user groups with permission to execute a subset of commands while having the privileges of the root user or another specified user. The program may be configured to require a password. References: QUESTION 23 You are working with a client who does not allow the storage of user or schema passwords in plain text. How can you configure the Oracle Loader for Hadoop configuration file to meet the requirements of this client? A. Store the password in an Access Control List and configure the ACL location in the configuration file. B. Encrypt the password in the configuration file by using Transparent Data Encryption. C. Configure the configuration file to prompt for the password during remote job executions.

18 D. Store the information in an Oracle wallet and configure the wallet location in the configuration file. Correct Answer: D /Reference: In online database mode, Oracle Loader for Hadoop can connect to the target database using the credentials provided in the job configuration file or in an Oracle wallet. Oracle Wallet Manager is an application that wallet owners use to manage and edit the security credentials in their Oracle wallets. A wallet is a password-protected container used to store authentication and signing credentials, including private keys, certificates, and trusted certificates needed by SSL. Note: Oracle Wallet Manager provides the following features: Wallet Password Management Strong Wallet Encryption Microsoft Windows Registry Wallet Storage Backward Compatibility Public-Key Cryptography Standards (PKCS) Support Multiple Certificate Support LDAP Directory Support References: QUESTION 24 Your customer needs the data that is generated from social media such as Facebook and Twitter, and the customer s website to be consumed and sent to an HDFS directory for analysis by the marketing team. Identify the architecture that you should configure. A. multiple flume agents with collectors that output to a logger that writes to the Oracle Loader for Hadoop agent B. multiple flume agents with sinks that write to a consolidated source with a sink to the customer's HDFS directory C. a single flume agent that collects data from the customer's website, which is connected to both Facebook and Twitter, and writes via the collector to the customer's HDFS directory D. multiple HDFS agents that write to a consolidated HDFS directory E. a single HDFS agent that collects data from the customer's website, which ls connected to both Facebook and Twitter, and writes via the Hive to the customer's HDFS directory Correct Answer: B

19 /Reference: Apache Flume - Fetching Twitter Data. Flume in this case will be responsible for capturing the tweets from Twitter in very high velocity and volume, buffer them in memory channel (maybe do some aggregation since we're getting JSONs) and eventually sink them into HDFS. References: QUESTION 25 What are the two advantages of using Hive over MapReduce? (Choose two.) A. Hive is much faster than MapReduce because it accesses data directly. B. Hive allows for sophisticated analytics on large data sets. C. Hive does not require MapReduce to run in order to analyze data. D. Hive is a free tool; Hadoop requires a license. E. Hive simplifies Hadoop for new users. Correct Answer: BE /Reference: E: A comparison of the performance of the Hadoop/Pig implementation of MapReduce with Hadoop/Hive. Both Hive and Pig are platforms optimized for analyzing large data sets and are built on top of Hadoop. Hive is a platform that provides a declarative SQLlike language whereas Pig requires users to write a procedural language called PigLatin. Writing MapReduce jobs in Java can be difficult, Hive and Pig has been developed and works as platforms on top of Hadoop. Hive and Pig allows users easy access to data compared to implementing their own MapReduce in Hadoop. Incorrect Answers: A: Hive and Pig has been developed and works as platforms on top of Hadoop. C: Apache Hive provides an SQL-like query language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez and Spark jobs. D: Apache Hadoop is an open-source software framework, licensed through Apache License, Version 2.0 (ALv2), which is a permissive free software license written by the Apache Software Foundation (ASF). References: QUESTION 26 During a meeting with your customer s IT security team, you are asked the names of the main OS users and groups for the Big Data Appliance.

20 Which users are created automatically during the installation of the Oracle Big Data Appliance? A. flume, hbase, and hdfs B. mapred, bda, and engsys C. hbase, cdh5, and oracle D. bda, cdh5, and oracle Correct Answer: A /Reference: QUESTION 27 Which command should you use to view the contents of the HDFS directory, /user/oracle/logs? A. hadoop fs cat /user/oracle/logs B. hadoop fs ls /user/oracle/logs C.cd /user/oracle hadoop fs ls logs D.cd /user/oracle/logs hadoop fs ls * E. hadoop fs listfiles /user/oracle/logs F. hive> select * from /user/oracle/logs Correct Answer: B /Reference: To list the contents of a directory named /user/training/hadoop in HDFS. # hadoop fs -ls /user/training/hadoop Incorrect Answers: A: hadoop fs cat displays the content of a file. References:

21 QUESTION 28 Your customer receives data in JSON format. Which option should you use to load this data into Hive tables? A. Python B. Sqoop C. a custom Java program D. Flume E. SerDe Correct Answer: E /Reference: SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting the results of serialization as individual fields for processing. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats. The JsonSerDe for JSON files is available in Hive 0.12 and later. References: QUESTION 29 Your customer needs to move data from Hive to the Oracle database but does have any connectors purchased. What is another architectural choice that the customer can make? A. Use Apache Sqoop. B. Use Apache Sentry. C. Use Apache Pig. D. Export data from Hive by using export/import.

22 Correct Answer: A /Reference: Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. Incorrect Answers: B: Apache Sentry is an authorization module for Hadoop that provides the granular, role-based authorization required to provide precise levels of access to the right users and applications. C: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. References: QUESTION 30 Your customer is setting up an external table to provide read access to the Hive table to Oracle Database. What does hdfs:/user/scott/data refer to in the external table definition for the Oracle SQL Connector for HDFS? A. the default directory for the Oracle external table B. the local file system location for the data C. the location for the log directory D. the location of the HDFS input data E. the location of the Oracle data file for SALES_DP_XTAB Correct Answer: D

23 /Reference: hdfs:/user/scott/data/ is the location of the HDFS data. References: QUESTION 31 Your customer has 10 web servers that generate logs at any given time. The customer would like to consolidate and load this data as it is generated into HDFS on the Big Data Appliance. Which option should the customer use? A. Set up a zookeeper agent to capture the transactions and write them to HDFS. B. Write a hive query to listen for new logs and save them in a Hive table. C. Set up a flume agent to capture the transactions and write them to HDFS. D. Set up an hbase agent to capture the transactions and write them to HDFS. E. Set up a web server agent in Apache Oozie to write the data to HDFS. Correct Answer: C /Reference: Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, messages and pretty much any data source possible. Example:

A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source.

24 A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it s consumed by a Flume sink. The file channel is one example it is backed by the local filesystem. The sink removes the event from the channel and puts it into an external repository like HDFS (via Flume HDFS sink) or forwards it to the Flume source of the next Flume agent (next hop) in the flow. The source and sink within the given agent run asynchronously with the events staged in the channel. Incorrect Answers: A: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. References: QUESTION 32 The Hadoop NameNode is running on port #3001, the DataNode on port #4001, the KVStore agent on port #5001, and the replication node on port #6001. All the services are running on localhost. What is the valid syntax to create an external table in Hive and query data from the NoSQL Database? A. CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_tit1e STRING, overview STRING) STORED BY 'oracle.kv.hadoop.hive.table.tablestoragehandler' TBLPROPERTIES ("oracle.kv.kvstore"="kvscore", "oracle.kv.hosts"="localhost:3001", "oracle.kv.hadoop.hosts"="localhost",

25 "oracle.kv.tablename"= MOVIE"); B. CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_title STRING, overview STRING) STORED BY 'oracle.kv.hadoop.hive.table.tablestoragehandler' TBLPROPERTIES ("oracle.kv.kvstore "=" kvstore ", "oracle.kv.hosts"="localhost:5001", "oracle.kv.hadoop.hosts"="localhost", "oracle.kv.tab1ename"="movie"); C.CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_title STRING, overview STRING) STORED BY 'oracle,kv.hadoop.hive.table.tablestoragehandler' TBLPROPERTIES ("oracle.kv.kvstore"="kvstore", "oracle.kv.hosts"="localhost:4001", "oracle.kv.hadoop.hosts"="localhost", "oracle.kv.tab1ename"="movie"); D.CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_title STRING, overview STRING) STORED BY 'oracle,kv.hadoop.hive.table.tablestoragehandler' TBLPROPERTIES ("oracle.kv.kvstore"="kvstore", "oracle.kv.hosts"="localhost:6001", "oracle.kv.hadoop.hosts"="localhost", "oracle.kv.tab1ename"="movie"); Correct Answer: C /Reference: The following is the basic syntax of a Hive CREATE TABLE statement for a Hive external table over an Oracle NoSQL table: CREATE EXTERNAL TABLE tablename colname coltype[, colname coltype,...] STORED BY 'oracle.kv.hadoop.hive.table.tablestoragehandler' TBLPROPERTIES ( "oracle.kv.kvstore" = "database", "oracle.kv.hosts" = "nosql_node1:port[, nosql_node2:port...]", "oracle.kv.hadoop.hosts" = "hadoop_node1[,hadoop_node2...]", "oracle.kv.tablename" = "table_name");

Where oracle.kv.hosts is a comma-delimited list of host names and port numbers in the Oracle NoSQL Database cluster. Each string has the format hostname:port.

26 Where oracle.kv.hosts is a comma-delimited list of host names and port numbers in the Oracle NoSQL Database cluster. Each string has the format hostname:port. Enter multiple names to provide redundancy in the event that a host fails. References: QUESTION 33 What are the two roles performed by the Big Data Appliance and the Exadata Database Machine in an Oracle Big Data Management solution? (Choose two.) A. Data Warehouse B. Data Definer C. Data Analyzer D. Data Reservoir E. Data Connector F. Data Integrator Correct Answer: EF /Reference: E:

27 F: Oracle SQL Connector for Hadoop Distributed File System (HDFS) is an example of an application that pulls data into Oracle Exadata Database Machine. The connector enables an Oracle external table to access data stored in either HDFS files or a Hive table. QUESTION 34 Your customer completed all the Kerberos installation prerequisites when the Big Data Appliance was set up. However, when the customer tries to use Kerberos authentication, it gets an error. Which command did the customer fail to run? A. install.sh option kerberos B. emcli enable kerberos C.bdacli enable kerberos D.bdasetup kerberos Correct Answer: C /Reference: Installing the Oracle Big Data Appliance Software, the following procedure configures Kerberos authentication. To support Kerberos authentication: 1. Ensure that you complete the Kerberos prerequisites. 2. Log into the first NameNode (node01) of the primary rack. 3. Configure Kerberos: # bdacli enable kerberos Etc. References: QUESTION 35 What happens if an active NameNode fails in the Oracle Big Data Appliance? A. The role of the active NameNode fails over automatically to the standby NameNode. B. ClouderaManager starts a NameNode process on a surviving node. C. The entire cluster fails. D. The ResourceManager starts a NameNode process on a surviving node. E. All traffic is directed to the master DataNode until the NameNode is restarted. Correct Answer: A

28 /Reference: If the active NameNode fails, then the role of active NameNode automatically fails over to the standby NameNode. References: QUESTION 36 Your customer s IT staff is made up mostly of SQL developers. Your customer would like you to design a system to analyze the spending patterns of customers in the web store. The data resides in HDFS. What tool should you use to meet their needs? A. Oracle Database 12c B. SQL Developer C. Apache Hive D. MapReduce E. Oracle Data Integrator Correct Answer: B /Reference: Oracle SQL Developer is one of the most common SQL client tool that is used by Developers, Data Analyst, Data Architects etc for interacting with Oracle and other relational systems. SQL Developer and Data Modeler (version 4.0.3) now support Hive andoracle Big Data SQL. The tools allow you to connect to Hive, use the SQL Worksheet to query, create and alter Hive tables, and automatically generate Big Data SQL-enabled Oracle external tables that dynamically access data sources defined in the Hive metastore. Incorrect Answers: E: Oracle Data Integrator (ODI) is an Extract, load and transform (ELT) (in contrast with the ETL common approach) tool produced by Oracle that offers a graphical environment to build, manage and maintain data integration processes in business intelligence systems. References: QUESTION 37 Which statement is true about the NameNode in Hadoop? A. A query in Hadoop requires a MapReduce job to be run so the NameNode gets the location of the data from the JobTracker.

29 B. If the NameNode goes down and a secondary NameNode has not been defined, the cluster is still accessible. C. When loading data, the NameNode tells the client or program where to write the data. D. All data passes through the NameNode; so if it is not sized properly, it could be a potential bottleneck. Correct Answer: B /Reference: Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. References: QUESTION 38 How does increasing the number of storage nodes and shards impact the efficiency of Oracle NoSQL Database? A. The number of shards or storage nodes does not impact performance. B. Having more shards reduces the write throughput because of increased node forwarding. C. Having more shards results in reduced read throughput because of increased node forwarding. D. Having more shards increases the write throughput because more master nodes are available for writes. Correct Answer: D /Reference: The more shards that your store contains, the better your write performance is because the store contains more nodes that are responsible for servicing write requests. References:

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Volume: 72 Questions Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? A. update hdfs set D as./output ; B. store D