A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in through Amazon Kinesis Data Firehose with the butter interval set to 60 seconds. The dashboard must support near-realtime data.
Which visualization solution will meet these requirements?
A. Select Amazon OpenSearch Service (Amazon Elasticsearch Service) as the endpoint for Kinesis Data Firehose. Set up an OpenSearch Dashboards (Kibana) using the data in Amazon OpenSearch Service (Amazon ES) with the desired analyses and visualizations.
B. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker Jupyter notebook and carry out the desired analyses and visualizations.
C. Select Amazon Redshift as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with SPICE to Amazon Redshift to create the desired analyses and visualizations.
D. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and Amazon Athena to query it. Connect Amazon QuickSight with SPICE to Athena to create the desired analyses and visualizations.
An online gaming company is using an Amazon Kinesis Data Analytics SQL application with a Kinesis data stream as its source. The source sends three non-null fields to the application: player_id, score, and us_5_digit_zip_code.
A data analyst has a .csv mapping file that maps a small number of us_5_digit_zip_code values to a territory code. The data analyst needs to include the territory code, if one exists, as an additional output of the Kinesis Data Analytics application.
How should the data analyst meet this requirement while minimizing costs?
A. Store the contents of the mapping file in an Amazon DynamoDB table. Preprocess the records as they arrive in the Kinesis Data Analytics application with an AWS Lambda function that fetches the mapping and supplements each record to include the territory code, if one exists. Change the SQL query in the application to include the new field in the SELECT statement.
B. Store the mapping file in an Amazon S3 bucket and configure the reference data column headers for the .csv file in the Kinesis Data Analytics application. Change the SQL query in the application to include a join to the file's S3 Amazon Resource Name (ARN), and add the territory code field to the SELECT columns.
C. Store the mapping file in an Amazon S3 bucket and configure it as a reference data source for the Kinesis Data Analytics application. Change the SQL query in the application to include a join to the reference table and add the territory code field to the SELECT columns.
D. Store the contents of the mapping file in an Amazon DynamoDB table. Change the Kinesis Data Analytics application to send its output to an AWS Lambda function that fetches the mapping and supplements each record to include the territory code, if one exists. Forward the record from the Lambda function to the original application destination.
A real estate company maintains data about all properties listed in a market. The company receives data about new property listings from vendors who upload the data daily as compressed files into Amazon S3. The company's leadership team wants to see the most up-to-date listings as soon as the data is uploaded to Amazon S3. The data analytics team must automate and orchestrate the data processing workflow of the listings to feed a dashboard. The team also must provide the ability to perform one-time queries and analytical reporting in a scalable manner.
Which solution meets these requirements MOST cost-effectively?
A. Use Amazon EMR for processing incoming data. Use AWS Step Functions for workflow orchestration. Use Apache Hive for one-time queries and analytical reporting. Bulk ingest the data in Amazon OpenSearch Service (Amazon Elasticsearch Service). Use OpenSearch Dashboards (Kibana) on Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboard.
B. Use Amazon EMR for processing incoming data. Use AWS Step Functions for workflow orchestration. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.
C. Use AWS Glue for processing incoming data. Use AWS Step Functions for workflow orchestration. Use Amazon Redshift Spectrum for one-time queries and analytical reporting. Use OpenSearch Dashboards (Kibana) on Amazon OpenSearch Service (Amazon Elasticsearch Service) for the dashboard.
D. Use AWS Glue for processing incoming data. Use AWS Lambda and S3 Event Notifications for workflow orchestration. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.
A retail company that is based in the United States has launched a global website. The website's historic transaction data is stored in an Amazon Redshift cluster in a VPC in the us-east-1 Region. The company's business intelligence (BI) team wants to enhance user experience by providing a dashboard to visualize trends.
The BI team decides to use Amazon QuickSight to render the dashboards. During development, a team in Japan provisioned QuickSight in the ap-northeast-1 Region. However, the team cannot connect from QuickSight in ap-northeast-1 to the Amazon Redshift cluster in us-east-1.
Which solution will resolve this issue MOST cost-effectively?
A. In the Amazon Redshift console, configure Cross-Region snapshots. Set the destination Region as ap-northeast-1. Restore the Amazon Redshift cluster from the snapshot. Connect to QuickSight in ap-northeast-1.
B. Create a VPC endpoint from the QuickSight VPC to the Amazon Redshift VPC.
C. Create an Amazon Redshift endpoint connection string with Region information in the string. Use this connection string in QuickSight to connect to Amazon Redshift.
D. Create a new security group for the Amazon Redshift cluster in us-east-1. Add an inbound rule that allows access from the appropriate IP address range for the QuickSight servers in ap-northeast-1.
A large digital advertising company has built business intelligence (BI) dashboards in Amazon QuickSight Enterprise edition to understand customer buying behavior. The dashboards use the Super-fast, Parallel, In-memory Calculation Engine (SPICE) as the in-memory engine to store the data. The company's Amazon S3 data lake provides the data for these dashboards, which are queried using Amazon Athena. The data files used by the dashboards consist of millions of records partitioned by year, month, and hour, and new data is continuously added. Every data file in the data lake has a timestamp column named CREATE_TS, which indicates when the data was added or updated.
Until now, the dashboards have been scheduled to refresh every night through a full reload. A data analyst must recommend an approach so the dashboards can be refreshed every hour, and include incremental data from the last hour.
How can the data analyst meet these requirements with the LEAST amount of operational effort?
A. Create new data partitions every hour in Athena by using the CREATE_TS column and schedule the QuickSight dataset to refresh every hour.
B. Use direct querying in QuickSight by using Athena to make refreshed data always available.
C. Use the CREATE_TS column to look back for incremental data in the last hour and schedule the QuickSight dataset to incrementally refresh every hour.
D. Create new datasets in QuickSight to do a full reload every hour and add the datasets to SPICE.
A company hosts its analytics solution on premises. The analytics solution includes a server that collects log files. The analytics solution uses an Apache Hadoop cluster to analyze the log files hourly and to produce output files. All the files are archived to another server for a specified duration.
The company is expanding globally and plans to move the analytics solution to multiple AWS Regions in the AWS Cloud. The company must adhere to the data archival and retention requirements of each country where the data is stored.
Which solution will meet these requirements?
A. Create an Amazon S3 bucket in one Region to collect the log files. Use S3 event notifications to invoke an AWS Glue job for log analysis. Store the output files in the target S3 bucket. Use S3 Lifecycle rules on the target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
B. Create a Hadoop Distributed File System (HDFS) file system on an Amazon EMR cluster in one Region to collect the log files. Set up a bootstrap action on the EMR cluster to run an Apache Spark job. Store the output files in a target Amazon S3 bucket. Schedule a job on one of the EMR nodes to delete files that no longer need to be retained.
C. Create an Amazon S3 bucket in each Region to collect log files. Create an Amazon EMR cluster. Submit steps on the EMR cluster for analysis. Store the output files in a target S3 bucket in each Region. Use S3 Lifecycle rules on each target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
D. Create an Amazon Kinesis Data Firehose delivery stream in each Region to collect log data. Specify an Amazon S3 bucket in each Region as the destination. Use S3 Storage Lens for data analysis. Use S3 Lifecycle rules on each destination S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
A company is running Apache Spark on an Amazon EMR cluster. The Spark job writes data to an Amazon S3 bucket and generates a large number of PUT requests. The number of objects has increased over time.
After a recent increase in traffic, the Spark job started failing and returned an HTTP 503 Slow Down AmazonS3Exception error.
Which combination of actions will resolve this error? (Choose two.)
A. Increase the number of S3 key prefixes for the S3 bucket.
B. Increase the EMR File System (EMRFS) retry limit.
C. Disable dynamic partition pruning in the Spark configuration for the cluster.
D. Increase the repartitioning number for the Spark job.
E. Increase the executor memory size on Spark.
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's data analysts are using Amazon Athena to perform SQL
queries against a recent subset of the overall data.
The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.
Which solutions will improve query performance? (Choose two.)
A. Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
B. Configure Athena to use S3 Select to load only the files of the data subset.
C. Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
D. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
E. Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
A company tracks its sales opportunities in Salesforce. The company is using stacked bar charts inside Salesforce to visualize quarterly trends of open, lost, and closed sales opportunities by each business line. The company wants to host these charts in the AWS Cloud outside Salesforce so that employees do not need a Salesforce license to view the charts.
Which solution will meet this requirement with the LEAST development effort?
A. Use Amazon AppFlow to schedule a nightly export of the data in CSV format from Salesforce to Amazon S3. Import the data from the file on Amazon S3 into SPICE. Build the stacked bar chart in Amazon QuickSight.
B. Schedule a nightly script that uses the Salesforce Bulk API to run on an Amazon EC2 instance and copy data in CSV format to Amazon S3. Import the data from the file on Amazon S3 into SPICE. Build the stacked bar chart in Amazon QuickSight.
C. Use AWS Data Pipeline to schedule a nightly export of the data in CSV format from Salesforce to Amazon S3. Import the data from the file on Amazon S3 into SPICE. Build the stacked bar chart in Amazon QuickSight.
D. Use Amazon AppFlow to schedule a nightly export of the data in Apache Parquet format from Salesforce to Amazon S3. Import the data from the file on Amazon S3 into SPICE. Build the stacked bar chart in Amazon QuickSight.
A company uses Amazon Redshift for its data warehouse. The company is running an ETL process that receives data in data parts from five third-party providers. The data parts contain independent records that are related to one specific job.
The company receives the data parts at various times throughout each day.
A data analytics specialist must implement a solution that loads the data into Amazon Redshift only after the company receives all five data parts.
Which solution will meet these requirements?
A. Create an Amazon S3 bucket to receive the data. Use S3 multipart upload to collect the data from the different sources and to form a single object before loading the data into Amazon Redshift.
B. Use an AWS Lambda function that is scheduled by cron to load the data into a temporary table in Amazon Redshift. Use Amazon Redshift database triggers to consolidate the final data when all five data parts are ready.
C. Create an Amazon S3 bucket to receive the data. Create an AWS Lambda function that is invoked by S3 upload events. Configure the function to validate that all five data parts are gathered before the function loads the data into Amazon Redshift.
D. Create an Amazon Kinesis Data Firehose delivery stream. Program a Python condition that will invoke a buffer flush when all five data parts are received.