ST3189 - Linear Regression Quiz (Part 2)

51. Why is it important to standardize predictors before fitting Ridge or Lasso models?

52. In the context of the bias-variance trade-off, a very complex model (e.g., a high-degree polynomial) is likely to have:

53. What does a VIF (Variance Inflation Factor) value of 1 indicate for a predictor?

54. When using polynomial regression, what is a major risk of choosing a very high degree for the polynomial?

55. What is the primary advantage of using cross-validation to select the tuning parameter \(\lambda\) in Ridge or Lasso regression?

56. If a residual plot (residuals vs. fitted values) shows a distinct U-shape, what does this suggest?

57. Comparing model selection criteria, which of the following tends to select the most parsimonious (simplest) model?

58. In Bayesian linear regression, what is a 'conjugate prior'?

59. What is a potential issue with using p-values from hypothesis tests for variable selection?

60. How does the degrees of freedom of a regression spline relate to its flexibility?

61. What is Cook's distance used to measure?

62. What is the difference between an outlier and a high-leverage point?

63. What is the purpose of the 'hat matrix' H in linear regression?

64. In a simple linear regression, the leverage of an observation \(x_i\) increases as:

65. What is a key advantage of Principal Components Regression (PCR) over standard least squares?

66. What is a potential disadvantage of Principal Components Regression (PCR)?

67. How does Partial Least Squares (PLS) differ from Principal Components Regression (PCR)?

68. What is meant by a 'studentized residual'?

69. If you have a categorical predictor with 4 levels, how many dummy variables should you include in the regression model?

70. What is a confounding variable?

71. In the context of GAMs, what does the 'additive' assumption mean?

72. What is the main benefit of the elastic net penalty over the lasso penalty?

73. A 95% confidence interval for a coefficient \(\beta_1\) is found to be [-0.1, 0.8]. What is the correct conclusion at the 5% significance level?

74. What does it mean if the error terms in a time series regression are positively autocorrelated?

75. In Bayesian inference, what is a 'non-informative' or 'vague' prior?

76. What is the primary motivation for using a log transformation on the response variable Y?

77. If the goal of modeling is pure prediction, which of the following is generally more important?

78. What is the 'irreducible error' in the context of a statistical learning model?

79. In backward stepwise selection, what is the starting point?

80. What is a major limitation of using traditional subset selection methods (best, forward, backward) when the number of predictors p is very large, especially when p > n?

81. The shape of the constraint region for Ridge regression is a ________, while for Lasso it is a ________.

82. In a Bayesian context, the posterior predictive distribution represents:

83. What is the main purpose of a Q-Q (quantile-quantile) plot of the residuals?

84. If the true relationship between X and Y is \(Y = X^2\), and you fit a simple linear model \(Y = \beta_0 + \beta_1 X\), the model will have:

85. Which of the following is NOT a method for dealing with heteroscedasticity?

86. In a regression model with an interaction term between a quantitative predictor X and a binary dummy variable D, what does the coefficient of the interaction term (\(\beta_3\) in \(Y = \beta_0 + \beta_1 X + \beta_2 D + \beta_3 XD\)) represent?

87. What is the main drawback of k-fold cross-validation?

88. Leave-one-out cross-validation (LOOCV) is a special case of k-fold cross-validation where:

89. Compared to 5-fold or 10-fold CV, LOOCV generally has:

90. What is the primary risk of data dredging or p-hacking?

91. In a Bayesian setting, if the prior and posterior distributions are very different, what does this imply?

92. What is the main difference between prediction and inference?

93. A key assumption of linear regression is that the predictors are...

94. If the tuning parameter \(\lambda\) in Ridge regression is set to 0, the resulting model is equivalent to:

95. As the tuning parameter \(\lambda\) in Ridge regression approaches infinity, what happens to the coefficient estimates?

96. Which statement about the relationship between training error and test error is true?

97. A simple linear regression model is fit to a dataset where the true relationship is not linear. The training RSS will be ________ and the test RSS will be ________.

98. A very flexible model is fit to a dataset. The training RSS will be ________ and the test RSS will be ________.

99. What is the main advantage of using a smoothing spline over a polynomial or regression spline?

100. In the context of Bayesian inference, what does MCMC stand for?