exam. Number: Passing Score: 800 Time Limit: 120 min File Version: Microsoft

Size: px

Start display at page:

Download "exam. Number: Passing Score: 800 Time Limit: 120 min File Version: Microsoft"

Richard Nichols
5 years ago
Views:

1 exam Number: Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft Analyzing Big Data with Microsoft R Version 1.0

2 Exam A QUESTION 1 You plan to read data from an Oracle database table and to store the data in the file system for later processing by dplyrxdf. The size of the data is larger than the memory on the server to be used for modelling. You need to ensure that the data can be processed by dplyrxdf in the least amount of time possible. How should you transfer the data from the Oracle database? A. Define a data source to the Oracle database server by using RxOdbcData. Use rximport to save the data to a comma-separated values(csv) file. B. Use the RODBC library, connect to the Oracle database server by using odbcconnect, and then use rxdatastep to export the data to a comma-separated values (CSV) file. C. Define a data source to the Oracle database server by using RxOdbcData,and then use rximport to save the data to an XDF file. D. Use the RODBC library, connect to the Oracle database server by using odbcconnect, and then use rxsplit to save the data to multiple comma-separated values (CSV) file. Correct Answer: C /Reference: : QUESTION 2 You are running a parallel function that uses the following R code segment. (Line numbers are included for reference only.) 01 cp < xval <- 0 maxdepth < (form, data = segmentationdatabig, maxdepth = maxdepth, cp = cp, xval = xval, blocksperread = 250 You need to complete the R code. The solution must support chunking. Which function should insert at line 02?

3 A. rxbtrees B. rxexec C. rxdforest D. rxdtree Correct Answer: D /Reference: : QUESTION 3 DRAG DROP You are using rxpredict for a logistic regression model. You need to obtain prediction standard errors and confidence intervals. Which R code segment should you use? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place:

Correct Answer: /Reference: : References: https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-logistic-regression QUESTION 4 You have one-class support vector machines (SVMs).

4 Correct Answer: /Reference: : References: QUESTION 4 You have one-class support vector machines (SVMs). You have a large dataset, but you do not have enough training time to fully test the model. What is an alternative method to validate the model? A. Use Principal Components Analysis (PCA)-Based Anomaly Detection. B. Replace the SVMs with two-class SVMs. C. Perform feature selection. D. Use outlier detection. Correct Answer: A

5 /Reference: : QUESTION 5 You need to build a model that looks at the probability of an outcome. You must regulate between L1 and L2. Which classification method should you use? A. Two-Class Neutral Network B. Two-Class Support Vector Machine C. Two-Class Decision Forest D. Two-Class Logistic Regression Correct Answer: D /Reference: : References: QUESTION 6 You have a dataset. You need to repeatedly split randomly the dataset so that 80 percent of the data is used as a training set and the remaining 20 percent is used as a test set. Which method should you use? A. threshold B. binary classification C. imputation D. cross validation E. pruning

6 Correct Answer: D /Reference: : QUESTION 7 You perform an analysis that produces the decision tree shown in the exhibit. (Click the Exhibit button.) How many leaf nodes are there on the tree? A. 2 B. 3 C. 5 D. 7 Correct Answer: C

7 /Reference: : References: QUESTION 8 You need to run a large data tree model by using rxdforest. The model must use cross validation. Which rxdforest option should you use? A. maxsurrogate B. maxnumbins C. maxdepth D. maxcompete E. xval Correct Answer: E /Reference: : References: QUESTION 9 You have the following regression forest.

8 Which variable contributes the most to the dependent variable? A. stack.loss B. Water.Temp C. Air.Flow D. Acid.Conc Correct Answer: D /Reference: : QUESTION 10 Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question. You have a data source that is larger than memory. You need to visualize the distribution of the values for a variable in the data source. What should you use? A. the Describe package B. the rxhistogram function C. the rxsummary function D. the rxquantile function E. the rxcube function F. the summary function G. the rxcrosstabs function H. the ggplot2 package Correct Answer: B

9 /Reference: : QUESTION 11 Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question. You have a dataset that contains the physical characteristics of people. You need to visualize a relationship between height and weight for a subset of observations in the dataset. What should you use? A. the Describe package B. the rxhistogram function C. the rxsummary function D. the rxquantile function E. the rxcube function F. the summary function G. the rxcrosstabs function H. the ggplot2 package Correct Answer: H /Reference: : QUESTION 12 Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to

10 that question. You need to get all of the deciles for a variable in a data frame. What should you use? A. the Describe package B. the rxhistogram function C. the rxsummary function D. the rxquantile function E. the rxcube function F. the summary function G. the rxcrosstabs function H. the ggplot2 package Correct Answer: D /Reference: : QUESTION 13 Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question. You need to calculate a measure of central tendency and variability for the variables in a dataset that is grouped by using another categorical variable. What should you use? A. the Describe package B. the rxhistogram function C. the rxsummary function D. the rxquantile function E. the rxcube function F. the summary function G. the rxcrosstabs function

11 H. the ggplot2 package Correct Answer: C /Reference: : QUESTION 14 HOTSPOT Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in this series. Start of repeated scenario You are developing a Microsoft R Open solution that will leverage the computing power of the database server for some of your datasets. You are performing feature engineering and data preparation for the datasets. The following is a sample of the dataset.

12 End of repeated scenario. You plan to score some data to create data features to address empty rows. You have the following R code. You need to transform the data and overwrite the current dataset. Which R code segment should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:

13 Correct Answer: /Reference: QUESTION 15 You are planning the compute contexts for your environment. You need to execute rx-function calls in parallel. What are three possible compute contexts that you can use to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.

14 A. local parallel B. Spark C. local sequential D. Map Reduce E. SQL Correct Answer: ABD /Reference: : References: QUESTION 16 You need to use the ScaleR distributed processing in an Apache Hadoop environment. Which data source should you use? A. Microsoft SQL Server database B. XDF data files C. ODBC data D. Teradata database Correct Answer: B /Reference: : References: QUESTION 17 You plan to analyze data on a local computer. To improve performance, you plan to alternate the operation between a Microsoft SQL Server and the local computer. You need to run complex code on the SQL Server, and then revert to the local compute context. Which R code segment should you use?

A. sqlcompute <- RxInSqlServer(connectionString = Driver=SQL Server;Server = myserver; Database = TestDB; Uid = myid; Pwd = mypwd; )sqlpackagepaths <- RxFindPackage(package = RevoScaleR,

15 A. sqlcompute <- RxInSqlServer(connectionString = Driver=SQL Server;Server = myserver; Database = TestDB; Uid = myid; Pwd = mypwd; )sqlpackagepaths <- RxFindPackage(package = RevoScaleR, computecontext = sqlservercompute) B. sqlcompute <- RxInSqlServer(connectionstring = sqlconnstring, sharedir = sqlsharedir,wait = sqlwait, consoleoutput = sqlconsoleoutput) rxsetcomputecontext( local )x <- 1:10rxExec(print, x, elemtype = cores, timestorun = 10)rxSetComputeContext( RxLocalParallel ) C. sqlcompute <- RxInSqlServer(connectionstring = sqlconnstring, sharedir = sqlsharedir,wait = sqlwait, consoleoutput = sqlconsoleoutput) rxsetcomputecontext( sqlcompute )x <- 1:10rxExec(print, x, elemtype = cores, timestorun = 10)rxSetComputeContext( local ) D. sqlcompute <- RxInSqlServer(connectionstring = sqlconnstring, sharedir = sqlsharedir,wait = sqlwait, consoleoutput = sqlconsoleoutput) rxsetcomputecontext( local )x <- 1:10rxExec(print, x, elemtype = cores, timestorun = 10)rxSetComputeContext( sqlcompute ) Correct Answer: D /Reference: : References: QUESTION 18 DRAG DROP You need to set the compute context for three different target environments. Which statement should you use for each environment? To answer, drag the appropriate statements to the correct execution contexts. Each statement may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: Correct Answer:

16 /Reference: : References: QUESTION 19 You have a dataset that has a character variable. You need to create a bag of counts of n-grams. Which function should you use? A. featurizetext() B. categoricalhash() C. concat() D. selectfeatures() E. categorical() Correct Answer: A /Reference: : References:

17 QUESTION 20 You have a dataset that has multiple blocks and only numeric variables. You are computing in a local compute context. You plan to lag a variable named x to create a new variable named x_lagged by using a transform function. You will create a new element in the output of the function. You need to minimize the number of missing values. Which three actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point. A. Assign a value to the first value of x_lagged in the current block. B. Use rxset to store the last value of x_lagged in the current block. C. Use rxset to store thelast value of x in the current block. D. Use rxget to retrieve the first value of x in the next block to be processed. E. Use rxget to retrieve a value stored in processing of the prior block. Correct Answer: ACD /Reference: :

Microsoft Analyzing Big Data with Microsoft R.

Microsoft Analyzing Big Data with Microsoft R. Microsoft 70-773 Analyzing Big Data with Microsoft R http://killexams.com/pass4sure/exam-detail/70-773 QUESTION: 34 You plan to read data from an Oracle database table and to store the data in the file