Table schemas in Hive are:
A. Stored as metadata on the NameNode
B. Stored along with the data in HDFS
C. Stored in the Metadata
D. Stored in ZooKeeper
What does CDH packaging do on install to facilitate Kerberos security setup?
A. Automatically configures permissions for log files at and MAPRED_LOG_DIR/userlogs
B. Creates users for hdfs and mapreduce to facilitate role assignment
C. Creates directories for temp, hdfs, and mapreduce with the correct permissions
D. Creates a set of pre-configured Kerberos keytab files and their permissions
E. Creates and configures your kdc with default cluster values
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions:
1.
Group the individual images into a set of larger files
2.
Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.
Which data serialization system gives the flexibility to do this?
A. CSV
B. XML
C. HTML
D. Avro
E. SequenceFiles
F. JSON
Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v2 (MRv2) on YARN?
A. NodeManager
B. ApplicationMaster
C. TaskTracker
D. JobTracker
E. NameNode
F. DataNode
G. ResourceManager
Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to cluster?
A. Nothing, other than ensuring that the DNS (or/etc/hosts files on all machines) contains any entry for the new node.
B. Restart the NameNode and ResourceManager daemons and resubmit any running jobs.
C. Add a new entry to /etc/nodes on the NameNode host.
D. Restart the NameNode of dfs.number.of.nodes in hdfs-site.xml
On a cluster running CDH 5.0 or above, you use the hadoop fs put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?
A. The directory will appear to be empty until the entire file write is completed on the cluster
B. They will see the file with a ._COPYING_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)
C. They will see the file with a ._COPYING_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
D. They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
Your cluster is running MapReduce version 2 (MRv2) on YARN. Your ResourceManager is configured to use the FairScheduler. Now you want to configure your scheduler such that a new user on the cluster can submit jobs into their own queue application submission. Which configuration should you set?
A. You can specify new queue name when user submits a job and new queue can be created dynamically if the property yarn.scheduler.fair.allow-undecleared-pools = true
B. Yarn.scheduler.fair.user.fair-as-default-queue = false and yarn.scheduler.fair.allow- undecleared-pools = true
C. You can specify new queue name when user submits a job and new queue can be created dynamically if yarn .schedule.fair.user-as-default-queue = false
D. You can specify new queue name per application in allocations.xml file and have new jobs automatically assigned to the application queue
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text files as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?
A. We cannot say; the number of Mappers is determined by the ResourceManager
B. We cannot say; the number of Mappers is determined by the developer
C. 30
D. 3
E. 10
F. We cannot say; the number of mappers is determined by the ApplicationMaster
You are running Hadoop cluster with all monitoring facilities properly configured.
Which scenario will go undeselected?
A. HDFS is almost full
B. The NameNode goes down
C. A DataNode is disconnected from the cluster
D. Map or reduce tasks that are stuck in an infinite loop
E. MapReduce jobs are causing excessive memory swaps
You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?
A. Hadoop fs getmerge R westUsers.txt
B. Hadoop fs getemerge westUsers westUsers.txt
C. Hadoop fs cp westUsers/* westUsers.txt
D. Hadoop fs get westUsers westUsers.txt