In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data?
A. Training: 50% Validation: 0% Testing: 50%
B. Training: 100% Validation: 0% Testing: 0%
C. Training: 0% Validation: 100% Testing: 0%
D. Training: 50% Validation: 50% Testing: 0%
Including redundant input variables in a regression model can:
A. Stabilize parameter estimates and increase the risk of overfitting.
B. Destabilize parameter estimates and increase the risk of overfitting.
C. Stabilize parameter estimates and decrease the risk of overfitting.
D. Destabilize parameter estimates and decrease the risk of overfitting.
Refer to the REG procedure output:
Click on the calculator button to display a calculator if needed.
A. 0.4115
B. 0.6994
C. 0.5884
D. 0.1372
A linear model has the following characteristics:
1.
A dependent variable (y)
2.
Three continuous predictor variables (x1-x3)
3.
One categorical predictor variable (c1 with 3 levels)
Which SAS program fits this model?
A. Option A
B. Option B
C. Option C
D. Option D
One common approach for predicting rare events in the LOGISTIC procedure is to build a model that disproportionately over-re presents those cases with an event occurring (e.g. a 50-50 event/non-event split). What problem does this present?
A. All parameter estimates are biased.
B. Only the intercept estimate is biased.
C. Only the non-intercept parameter estimates are biased.
D. Sensitivity estimates are biased.
The question will ask you to provide a missing statement. Given the following SAS program:
Which SAS statement will complete the program to correctly score the data set NEW_DATA?
A. Score data data=MYDIR.NEW_DATA out=scores;
B. Score data data=MYDIR.NEW_DATA output=scores;
C. Score data=HYDIR.NEU_DATA output=scores;
D. Score data=MYDIR, NEW DATA out=scores;
What is the default method in the LOGISTIC procedure to handle observations with missing data?
A. Missing values are imputed.
B. Parameters are estimated accounting for the missing values.
C. Parameter estimates are made on all available data.
D. Only cases with variables that are fully populated are used.
When working with smaller data sets (N<200), which method is preferred to perform honest assessment?
A. Training: 40% Validation: 30% Testing: 30%
B. K-fold cross validation
C. Cross validation using 4th quartile observations
D. Use the AIC goodness of fit statistic
This question will ask you to provide a segment of missing code.
The following code is used to create missing value indicator variables for input variables, fred1 to fred7.
Which segment of code would complete the task?
A. Option A
B. Option B
C. Option C
D. Option D
This question will ask you to provide a missing option. Given the following SAS program:
What option must be added to the program to obtain a data set containing Spearman statistics?
A. OUTCORR=estimates
B. OUTS=estimates
C. OUT=estimates
D. OUTPUT=estimates