The code block displayed below contains multiple errors. The code block should return a DataFrame that
contains only columns transactionId, predError, value and storeId of DataFrame
transactionsDf. Find the errors.
Code block:
transactionsDf.select([col(productId), col(f)])
Sample of transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f| 3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.+-------------+---------+-----+-------+---------+----+
A. The column names should be listed directly as arguments to the operator and not as a list.
B. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
C. The select operator should be replaced by a drop operator.
D. The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
E. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.
The code block shown below should return a copy of DataFrame transactionsDf with an added column
cos. This column should have the values in column value converted to degrees and having the cosine of
those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in
the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))
A. 1. withColumn
2.
col("cos")
3.
cos
4.
degrees
5.
transactionsDf.value
B. 1. withColumnRenamed
2.
"cos"
3.
cos
4.
degrees
5.
"transactionsDf.value"
C. 1. withColumn
2.
"cos"
3.
cos
4.
degrees
5.
transactionsDf.value
D. 1. withColumn
2.
col("cos")
3.
cos
4.
degrees
5.
col("value")
E. 1. withColumn
2.
"cos"
3.
degrees
4.
cos
5.
col("value")
The code block displayed below contains an error. The code block below is intended to add a column itemNameElements to DataFrame itemsDf that includes an array of all words in column itemName. Find the error.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+ 4.|1 |Thick Coat for Walking in the Snow|Sports
Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.withColumnRenamed("itemNameElements", split("itemName"))
A. All column names need to be wrapped in the col() operator.
B. Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.
C. Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.
D. Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method. E. The expressions "itemNameElements" and split("itemName") need to be swapped.
Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?
A. spark.read.json(filePath)
B. spark.read.path(filePath, source="json")
C. spark.read().path(filePath)
D. spark.read().json(filePath)
E. spark.read.path(filePath)
The code block displayed below contains an error. The code block should write DataFrame transactionsDf
as a parquet file to location filePath after partitioning it on column storeId.
Find the error.
Code block:
transactionsDf.write.partitionOn("storeId").parquet(filePath)
A. The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.
B. The partitionOn method should be called before the write method.
C. The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.
D. Column storeId should be wrapped in a col() operator.
E. No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.
Which of the following describes Spark's standalone deployment mode?
A. Standalone mode uses a single JVM to run Spark driver and executor processes.
B. Standalone mode means that the cluster does not contain the driver.
C. Standalone mode is how Spark runs on YARN and Mesos clusters.
D. Standalone mode uses only a single executor per worker per application.
E. Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.
Which of the following describes Spark's way of managing memory?
A. Spark uses a subset of the reserved system memory.
B. Storage memory is used for caching partitions derived from DataFrames.
C. As a general rule for garbage collection, Spark performs better on many small objects than few big objects.
D. Disabling serialization potentially greatly reduces the memory footprint of a Spark application.
E. Spark's memory usage can be divided into three categories: Execution, transaction, and storage.
Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?
Sample of itemsDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+
A. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", StringType()),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
B. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType),
3.
StructField("attributes", ArrayType(StringType)),
4.
StructField("supplier", StringType)])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
C. 1.itemsDf = spark.read.schema('itemId integer, attributes
D. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType(StringType())),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
E. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType([StringType()])),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)
The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header and casting the columns in the most appropriate type. Find the error. First 3 rows of transactions.csv: 1.transactionId;storeId;productId;name 2.1;23;12;green grass 3.2;35;31;yellow sun 4.3;23;12;green grass Code block: transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)
A. The DataFrameReader is not accessed correctly.
B. The transaction is evaluated lazily, so no file will be read.
C. Spark is unable to understand the file type.
D. The code block is unable to capture all columns.
E. The resulting DataFrame will not have the appropriate schema.
Which of the elements in the labeled panels represent the operation performed for broadcast variables?
Larger image
A. 2, 5
B. 3
C. 2, 3
D. 1, 2
E. 1, 3, 4