You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?
A. Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.
B. Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
C. When submitting the job on the command line, specify the ç’´ibjars option followed by the JAR file path.
D. Package your code and the Apache Commands Math library into a zip file named JobJar.zip
What does the following WebHDFS command do?
Curl -1 -L "http://host:port/webhdfs/v1/foo/bar?op=OPEN"
A. Make a directory /foo/bar
B. Read a file /foo/bar
C. List a directory /foo
D. Delete a directory /foo/bar
Analyze each scenario below and indentify which best describes the behavior of the default partitioner?
A. The default partitioner assigns key-values pairs to reduces based on an internal random number generator.
B. The default partitioner implements a round-robin strategy, shuffling the key-value pairs to each reducer in turn. This ensures an event partition of the key space.
C. The default partitioner computes the hash of the key. Hash values between specific ranges are associated with different buckets, and each bucket is assigned to a specific reducer.
D. The default partitioner computes the hash of the key and divides that valule modulo the number of reducers. The result determines the reducer assigned to process the key-value pair.
E. The default partitioner computes the hash of the value and takes the mod of that value with the number of reducers. The result determines the reducer assigned to process the key-value pair.
Which one of the following Hive commands uses an HCatalog table named x?
A. SELECT * FROM x;
B. SELECT x.-FROM org.apache.hcatalog.hive.HCatLoader('x');
C. SELECT * FROM org.apache.hcatalog.hive.HCatLoader('x');
D. Hive commands cannot reference an HCatalog table
On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot.
What determines how the JobTracker assigns each map task to a TaskTracker?
A. The amount of RAM installed on the TaskTracker node.
B. The amount of free disk space on the TaskTracker node.
C. The number and speed of CPU cores on the TaskTracker node.
D. The average system load on the TaskTracker node over the past fifteen (15) minutes.
E. The location of the InsputSplit to be processed in relation to the location of the node.
To use a lava user-defined function (UDF) with Pig what must you do?
A. Define an alias to shorten the function name
B. Pass arguments to the constructor of UDFs implementation class
C. Register the JAR file containing the UDF
D. Put the JAR file into the userandapos;s home folder in HDFS
Given the following Hive command:
Which one of the following statements is true?
A. The files in the mydata folder are copied to a subfolder of /apps/hlve/warehouse
B. The files in the mydata folder are moved to a subfolder of /apps/hive/wa re house
C. The files in the mydata folder are copied into Hive's underlying relational database
D. The files in the mydata folder do not move from their current location In HDFS
A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?
A. The file will be marked as corrupted if data node B fails during the creation of the file.
B. Each data node locks the local file to prohibit concurrent readers and writers of the file.
C. Each data node stores a copy of the file in the local file system with the same name as the HDFS file.
D. The file can be accessed if at least one of the data nodes storing the file is available.
You want to populate an associative array in order to perform a map-side join. You've decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?
A. combine
B. map
C. init
D. configure
When is the earliest point at which the reduce method of a given Reducer can be called?
A. As soon as at least one mapper has finished processing its input split.
B. As soon as a mapper has emitted at least one record.
C. Not until all mappers have finished processing all records.
D. It depends on the InputFormat used for the job.