Assuming you're not running HDFS Federation, what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a "split-brain" scenario with your NameNode when running HDFS High Availability (HA) using Quorum-based storage?
A. Two active NameNodes and two Standby NameNodes
B. One active NameNode and one Standby NameNode
C. Two active NameNodes and on Standby NameNode
D. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy
You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes.
Which best describes how you determine when the last checkpoint happened?
A. Execute hdfs namenode report on the command line and look at the Last Checkpoint information
B. Execute hdfs dfsadmin saveNamespace on the command line which returns to you the last checkpoint value in fstime file
C. Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the "Last Checkpoint" information
D. Connect to the web UI of the NameNode (http://mynamenode:50070) and look at the "Last Checkpoint" information
You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
A. Sample the web server logs web servers and copy them into HDFS using curl
B. Ingest the server web logs into HDFS using Flume
C. Channel these clickstreams into Hadoop using Hadoop Streaming
D. Import all user clicks from your OLTP databases into Hadoop using Sqoop
E. Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions:
1.
Group the individual images into a set of larger files
2.
Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.
Which data serialization system gives the flexibility to do this?
A. CSV
B. XML
C. HTML
D. Avro
E. SequenceFiles
F. JSON
Which is the default scheduler in YARN?
A. YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml
B. Capacity Scheduler
C. Fair Scheduler
D. FIFO Scheduler
Cluster Summary:
45 files and directories, 12 blocks = 57 total. Heap size is 15.31 MB/193.38MB(7%)
Refer to the above screenshot.
You configure a Hadoop cluster with seven DataNodes and on of your monitoring UIs displays the details
shown in the exhibit.
What does the this tell you?
A. The DataNode JVM on one host is not active
B. Because your under-replicated blocks count matches the Live Nodes, one node is dead, and your DFS Used % equals 0%, you can't be certain that your cluster has all the data you've written it.
C. Your cluster has lost all HDFS data which had bocks stored on the dead DatNode
D. The HDFS cluster is in safe mode
Assume you have a file named foo.txt in your local directory. You issue the following three commands:
Hadoop fs mkdir input
Hadoop fs put foo.txt input/foo.txt
Hadoop fs put foo.txt input
What happens when you issue the third command?
A. The write succeeds, overwriting foo.txt in HDFS with no warning
B. The file is uploaded and stored as a plain file named input
C. You get a warning that foo.txt is being overwritten
D. You get an error message telling you that foo.txt already exists, and asking you if you would like to overwrite it.
E. You get a error message telling you that foo.txt already exists. The file is not written to HDFS
F. You get an error message telling you that input is not a directory
G. The write silently fails
You are configuring a server running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format underlying file system of each DataNode?
A. They must be formatted as HDFS
B. They must be formatted as either ext3 or ext4
C. They may be formatted in any Linux file system
D. They must not be formatted - - HDFS will format the file system automatically
You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in your cluster. What should you do?
A. Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum
B. Set an HDFS replication factor that provides data redundancy, protecting against node failure
C. Run a Secondary NameNode on a different master from the NameNode in order to provide automatic recovery from a NameNode failure.
D. Run the ResourceManager on a different master from the NameNode in order to load-share HDFS metadata processing
E. Configure the cluster's disk drives with an appropriate fault tolerant RAID level
You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?
A. yarn.application.child.java.opts=-Xsx3072m
B. yarn.application.child.java.opts=-Xmx3072m
C. mapreduce.map.java.opts=-Xms3072m
D. mapreduce.map.java.opts=-Xmx3072m