A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datasets.
Why is this a problem?
A. MapReduce works best on unstructured data
B. There is no problem; MapReduce accommodates all the data
C. MapReduce can only parse one file at a time.
D. MapReduce is not ideal when the processing of one dataset depends on another.
Which graph structure would best model the relationship between job seekers and employers?
A. Bipartite
B. Weighted
C. Directed acyclic
D. Ranked
What is an ideal use case for HDFS?
A. Storing files that are updated frequently
B. Storing files that are written once and read many times
C. Storing results between Map steps and Reduce steps
D. Storing application files in memory
A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20.
What is the lie factor of the graph?
A. 1
B. 2
C. 3
D. 6
You develop a Python script "logisticpy" to evaluate the logistic function denoted as f(y) for a given value y that includes the following Pig code:
Register 'logistic.py' using jython as udf;
z = FOREACH y GENERATE $0, udf.logistic ($0);
DUMP z;
What is the expected output when the Pig code is executed?
A. 0
B. Jython is not a supported language
C. Value of f(y) for ally
D. Tuples (y, f(y))
You conduct a TFIDF analysis on 3 documents containing raw text and derive TFIDF ("data", document y) = 1.908. You know that the term "data" only appears in document 2.
What is the TF of "data" in document 2?
A. 2 based on the following reasoning: TFIDF = TF1DF = 1 908 You then know that IDF will equal LOG (32)=0.954 Therefore, TFIDF=TF*0.954 = 1.908 TF will then round to 2
B. 4 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal LOG (3/1 )=0.477 Therefore, TFIDF=TF'0 477 = 1.908 TF will then round to 4
C. 6 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal 3/1=3 Therefore, TFIDF=TF/3 = 1.908 TF will then round to 6
D. 11 based on the following reasoning: TFIDF = TF1DF = 1908 You then know that IDF will equal LOG(3/2)=0.176 Therefore, TFIDF=TF"0.176 = 1.908 TF will then round to 11
What best describes tokenization?
A. Adding lexical relations to the raw text
B. Converting text into the list of terms
C. Converting text into a list of unique terms
D. Reducing variant forms of tokens to their base forms
What are the major components of the YARN architecture?
A. ResourceManager and NodeManager
B. Task Tracker and NameNode
C. HDFS, Tez, and Spark
D. Avro, ZooKeeper, and HDFS
What is a characteristic of spark?
A. Unable to run map -> reduce execution plans
B. Supports applications written in Python, Java, and Scala
C. Less efficient processing small files than Hadoop MapReduce
D. Supports workflows that can return to previous work steps
What do first-order and second-order Markov processes have in common concerning next word prediction?
A. Both use WordNet to model the probability of the next word
B. Both are unsupervised methods
C. Both provide the foundation to build a trigram language model
D. Neither makes assumptions about the probability of the next word