Leads4pass > Cloudera > CCDH > DS-200 > DS-200 Online Practice Questions and Answers

DS-200 Online Practice Questions and Answers

Questions 4

What is default delimiter for Hive tables?

A. ^A (Control-A)

B. , (comma)

C. \t (tab)

D. : (colon)

Buy Now
Questions 5

Refer to the exhibit.

Which point in the figure is the mode?

A. A

B. B

C. C

Buy Now
Questions 6

Under what two conditions does stochastic gradient descent outperform 2nd-order optimization techniques such as iteratively reweighted least squares?

A. When the volume of input data is so large and diverse that a 2nd-order optimization technique can be fit to a sample of the data

B. When the model's estimates must be updated in real-time in order to account for new observations.

C. When the input data can easily fit into memory on a single machine, but we want to calculate confidence intervals for all of the parameters in the model.

D. When we are required to find the parameters that return the optimal value of the objective function.

Buy Now
Questions 7

There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer.

The makeup of the groups as follows:

Each individual has an expression value for each of 10000 different genes. The expression value for each

gene is a continuous value between -1 and 1.

With which type of plot can you encode the most amount of the data visually?

Rather than use all 10,000 features to separate AML from ALL, you pick a small subnet of features to

separate them optimally. You feature vectors have 10,000 dimensions while you only have 52 data points. You use cross-validation to test your chosen set of features. What three methods will choose the features in an optimal way?

A. Singular value Decomposition

B. Bootstrapping

C. Markov chain Monte Carlo

D. Hidden Markov

E. Bayesian Information Criterion

F. Mutual Information

Buy Now
Questions 8

You have a directory containing a number of comma-separated files. Each file has three columns and each filename has a .csv extension. You want to have a single tab-separated file (all .tsv) that contains all the rows from all the files.

Which command is guaranteed to produce the desired output if you have more than 20,000 files to process?

A. Find . name `*, CSV' print0 | sargs -0 cat | tr `,' `\t' > all.tsv

B. Find . name `name * .CSV' | cat | awk `BEGIN {FS = "," OFS = "\t"} {print $1, $2, $3}' > all.tsv

C. Find . name `*.CSV' | tr `,' `\t' | cat > all.tsv

D. Find . name `*.CSV' | cat > all.tsv

E. Cat *.CSV > all.tsv

Buy Now
Questions 9

When optimizing a function using stochastic gradient descent, how frequently should you update your estimate of the gradient?

A. Once after every pass through the data set

B. Once per observation

C. For each observation with a probability that you choose ahead of time

D. After a random number of observations

E. Once every N observations, where you decide N ahead of time

Buy Now
Questions 10

You are about to sample a 100-dimensinal unit-cube. To adequately sample any single given dimension, you need only capture 10 points. How many points do you need to order to sample the complete 100dimensional unit cube adequately?

A. 10010

B. 1010

C. Log2(100)

D. 100

E. 1000

F. 1010

Buy Now
Questions 11

Consider the following sample from a distribution that contains a continuous X and label Y that is either A or B:

Which is the best cut point for X if you want to discretize these values into two buckets in a way that minimizes the sum of chi-square values?

A. X 8

B. X 6

C. X 5

D. X 4

E. X 2

Buy Now
Questions 12

Why is the naive Bayes classifier "naive"?

A. It generally performs worse than more complex methods

B. It Is an unbiased estimator

C. It assumes Independence between all features

D. It makes no assumptions on the underlying distributions (i.e., it is non-parametric)

Buy Now
Questions 13

Which three metrics are useful in measuring the accuracy and quality of a recommender system?

A. Mutual Information

B. RMSF

C. Tanimoto coefficient

D. Pearson correlation

E. Precision

F. Recall

Buy Now
Exam Code: DS-200
Exam Name: Data Science Essentials
Last Update: Nov 14, 2024
Questions: 60
10%OFF Coupon Code: SAVE10

PDF (Q&A)

$49.99

VCE

$55.99

PDF + VCE

$65.99