What is the formula for measuring skewness in a dataset?
A. MEAN - MEDIAN
B. MODE - MEDIAN
C. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
D. (MEAN - MODE)/ STANDARD DEVIATION
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?
A. Data Interpretation
B. Data Virtualization
C. Data visualization
D. Data Mining
Which ones are the known limitations of using External function? Choose all apply.
A. Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
B. Currently, external functions must be scalar functions. A scalar external function re-turns a single value for each input row.
C. External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly
D. An external function accessed through an AWS API Gateway private endpoint can be accessed only from a Snowflake VPC (Virtual Private Cloud) on AWS and in the same AWS region.
A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?
Assuming:
A database named product_db exists with a schema named product_agg and a table named Item_agg.
The database, schema, and table will be shared with two accounts named xy12345 and yz23456.
1.USE ROLE accountadmin;
2.CREATE DIRECT SHARE product_s;
3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;
4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;
5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s; 6.SHOW GRANTS TO SHARE product_s;
7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
8.SHOW GRANTS OF SHARE product_s;
A. GRANT USAGE ON DATABASE product_db TO SHARE product_s;
B. CREATE DIRECT SHARE product_s;
C. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
D. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
Which type of Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series?
A. MPP Python UDFs
B. Scaler Python UDFs
C. Vectorized Python UDFs
D. Hybrid Python UDFs
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?
A. Groups df based on index values
B. Groups df based on length of each index value
C. Groups df based on index strings
D. Data frames cannot be grouped by index values. Hence it results in Error.
Select the correct mappings:
I. W Weights or Coefficients of independent variables in the Linear regression model --> Model Pa-rameter
II. K in the K-Nearest Neighbour algorithm --> Model Hyperparameter
III. Learning rate for training a neural network --> Model Hyperparameter
IV.
Batch Size --> Model Parameter
A.
I,II
B.
I,II,III
C.
III,IV
D.
II,III,IV
Mark the incorrect statement regarding usage of Snowflake Stream and Tasks?
A. Snowflake automatically resizes and scales the compute resources for serverless tasks.
B. Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a DAG) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
C. Streams support repeatable read isolation.
D. An standard-only stream tracks row inserts only.
Which of the following cross validation versions is suitable quicker cross-validation for very large datasets with hundreds of thousands of samples?
A. k-fold cross-validation
B. Leave-one-out cross-validation
C. Holdout method
D. All of the above