Which functionality do regular expressions provide?
A. text pattern matching
B. underflow prevention
C. increased numerical precision D. decreased processing complexity
What would be considered "Big Data"?
A. An OLAP Cube containing customer demographic information about 100, 000, 000 customers
B. Daily Log files from a web server that receives 100, 000 hits per minute
C. Aggregated statistical data stored in a relational database table
D. Spreadsheets containing monthly sales data for a Global 100 corporation
Refer to the exhibit.
In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan, and the blue represents borrowers that are known to have defaulted on their loan.
Which analytical method could produce the probabilities needed to build this exhibit?
A. Logistic Regression
B. Linear Regression
C. Discriminant Analysis
D. Association Rules
The average purchase size from your online sales site is $17, 200. The customer experience team believes a certain adjustment of the website will increase sales. A pilot study on a few hundred customers showed an increase in average purchase size of $1.47, with a significance level of p=0.1.
The team runs a larger study, of a few thousand customers. The second study shows an increased average purchase size of $0.74, with a significance level of 0.03. What is your assessment of this study?
A. The change in purchase size is not practically important, and the good p-value of the second study is probably a result of the large study size.
B. The change in purchase size is small, but may aggregate up to a large increase in profits over the entire customer base.
C. The difference in the change in purchase size between the two studies is troubling; The team should run another, larger study.
D. The p-value of the second study shows a statistically significant change in purchase size. The new website is an improvement.
Consider a scale that has five (5) values that range from "not important" to "very important". Which data classification best describes this data?
A. Ordinal
B. Nominal
C. Real
D. Ratio
If R factors are categorical variables, which data classification level are they most closely related?
A. Nominal
B. Ordinal
C. Interval
D. Ratio
Which ROC curve represents a perfect model fit?
A.
B.
C.
D.
A. Exhibit A
B. Exhibit B
C. Exhibit C
D. Exhibit D
Which word or phrase completes the statement?
Theater actor is to "Artistic and Expressive" as Data Scientist is to ________________
A. "Communicative and Collaborative"
B. "Introverted and Technical"
C. "Logical and Steadfast"
D. "Independent and Intelligent"
You are analyzing data in order to build a classifier model. You discover non-linear data and discontinuities that will affect the model. Which analytical method would you recommend?
A. Decision Trees
B. Logistic Regression
C. ARIMA
D. Linear Regression
Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1.
Multicollinearity is not an issue among the variables
2.
Only three variables--A, B, and C--have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent variables of
A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?
A. Variables A, B, and C are significantly impacting sales, but are not effectively estimating sales
B. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
C. Due to the R2 of 0.10, the model is not valid ?the linear regression should be re-run with all 15 variables forced into the model to increase the R2
D. Due to the R2 of 0.10, the model is not valid ?a different analytical model should be attempted