Refer to the lift chart:
At a depth of 0.1, Lift = 3.14. What does this mean?
A. Selecting the top 10% of the population scored by the model should result in 3.14 times more events than a random draw of 10%.
B. Selecting the observations with a response probability of at least 10% should result in 3.14 times more events than a random draw of 10%.
C. Selecting the top 10% of the population scored by the model should result in 3.14 times greater accuracy than a random draw of 10%.
D. Selecting the observations with a response probability of at least 10% should result in 3.14 times greater accuracy than a random draw of 10%.
What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?
A. It violates assumptions of the model.
B. It requires extra computational effort and time.
C. It omits the training (and test) data sets from the benefits of the cleansing methods.
D. There is no ability to compare the effectiveness of different cleansing methods.
A company has branch offices in eight regions. Customers within each region are classified as either "High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable.
Suppose there is a significant interaction between REGION and VALUE. What can you conclude?
A. More high value customers are found in some regions than others.
B. The difference between average purchases for medium and high value customers depends on the region.
C. Regions with higher average purchases have more high value customers.
D. Regions with higher average purchases have more medium value customers.
Given the following GLM procedure output:
Which statement is correct at an alpha level of 0.05?
A. School*Gender should be removed because it is non-significant.
B. Gender should be removed because it is non-significant.
C. School should be removed because it is significant.
D. Gender should not be removed due to its involvement in the significant interaction.
Refer to the following exhibit:
What is a correct interpretation of this graph?
A. The association between the continuous predictor and the binary response is quadratic.
B. The association between the continuous predictor and the log-odds is quadratic.
C. The association between the continuous predictor and the continuous response is quadratic.
D. The association between the binary predictor and the log-odds is quadratic.
Refer to the REG procedure output:
The Intercept estimate is interpreted as:
A. The predicted value of the response when all the predictors are at their current values.
B. The predicted value of the response when all predictors are at their means.
C. The predicted value of the response when all predictors = 0.
D. The predicted value of the response when all predictors are at their minimum values.
Within PROC GLM, the interaction between the two categorical predictors, Income and Gender, was shown to be significant. An item store was saved from the GLM analysis.
Which statement from PROC PLM would test the significance of Gender within each level of Income and adjust for multiple tests?
A. sliceby Gender / adjust=tukey;
B. slice Income*Gender / sliceby=Gender adjust=tukey;
C. slice Income*Gender / sliceby=Income adjust=tukey;
D. sliceby Income / adjust=tukey;
Refer to the exhibit.
Which conclusion is justified concerning Sales, comparing stores A, B, and C?
A. Store B is significantly different from store A.
B. Store C is significantly different from Store A.
C. Store B is significantly different from store C.
D. There is no significant difference between stores.
This question will ask you to provide a segment of missing code.
The following code is used to create missing value indicator variables for input variables, fred1 to fred7.
Which segment of code would complete the task?
A. Option A
B. Option B
C. Option C
D. Option D
This question will ask you to provide a missing option.
A business analyst is investigating the differences in sales figures across 8 sales regions. The analyst is interested in viewing the regression equation parameter estimates for each of the design variables.
Which option completes the program to produce the regression equation parameter estimates?
A. Solve
B. Estimate
C. Solution
D. Est