Splus handout: multicollinearity Formatted output _____________________________________________________________________ New Splus functions in this handout: -scale: Centers and then scales the columns of a matrix so each column in the result has mean 0 and s.d. 1. -VIF: new function defined below. Computes VIF values. _____________________________________________________________________ Effect of correlated predictors Recall the data from the previous handout (from Section 6.9 of NKNW). > #original model > summary(lm(Y ~ X1 + X2,sales)) Call: lm(formula = Y ~ X1 + X2, data = sales) Residuals: Min 1Q Median 3Q Max -18.42 -6.216 0.7449 9.436 20.22 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -68.8571 60.0170 -1.1473 0.2663 X1 1.4546 0.2118 6.8682 0.0000 X2 9.3655 4.0640 2.3045 0.0333 Residual standard error: 11.01 on 18 degrees of freedom Multiple R-Squared: 0.9167 F-statistic: 99.1 on 2 and 18 degrees of freedom, the p-value is 1.921e-10 Correlation of Coefficients: (Intercept) X1 X1 0.6881 X2 -0.9898 -0.7813 > #model with interaction term > summary(lm(Y ~ X1 + X2 + X1:X2,sales)) Call: lm(formula = Y ~ X1 + X2 + X1:X2, data = sales) Residuals: Min 1Q Median 3Q Max -18.92 -7.328 0.4411 9.411 20.81 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -12.4763 227.9628 -0.0547 0.9570 X1 0.5319 3.5980 0.1478 0.8842 X2 6.0933 13.4039 0.4546 0.6552 X1:X2 0.0529 0.2058 0.2569 0.8003 Residual standard error: 11.3 on 17 degrees of freedom Multiple R-Squared: 0.9171 F-statistic: 62.66 on 3 and 17 degrees of freedom, the p-value is 2.128e-09 Correlation of Coefficients: (Intercept) X1 X2 X1 -0.9497 X2 -0.9982 0.9338 X1:X2 0.9628 -0.9982 -0.9503 > #Standardize the data and refit the interaction model > scaled.sales<-data.frame(scale(sales)) > summary(lm(Y ~ X1 + X2 + X1:X2,scaled.sales)) Call: lm(formula = Y ~ X1 + X2 + X1:X2, data = scaled.sales) Residuals: Min 1Q Median 3Q Max -0.5227 -0.2025 0.01219 0.26 0.5749 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0196 0.1024 -0.1918 0.8502 X1 0.7401 0.1164 6.3559 0.0000 X2 0.2513 0.1119 2.2457 0.0383 X1:X2 0.0264 0.1028 0.2569 0.8003 Residual standard error: 0.3124 on 17 degrees of freedom Multiple R-Squared: 0.9171 F-statistic: 62.66 on 3 and 17 degrees of freedom, the p-value is 2.128e-09 Correlation of Coefficients: (Intercept) X1 X2 X1 0.2064 X2 -0.0052 -0.7527 X1:X2 -0.7465 -0.2765 0.0069 #_____________________________________________________________________ #VIF #The "VIF" function below computes the VIF values for a predictor #matrix. Cut and paste this function into the command line version of #Splus (you only need to do this one time, unless you change .Data #directories). Then you can use the function any time you want to #compute VIF values. VIF<-function(X) { #Computes Variance Inflation Factors for a Predictor Matrix # See page 386 of NKNW for more on computations #INPUTS: #X is a matrix (or data frame) of the predictors (no column of ones). cat("REMINDER: Your input matrix should not include the response\n") a<-1/sqrt(dim(X)[1]-1)*scale(X) b<-cbind(diag(solve(t(a)%*%a))) dimnames(b)<-list(dimnames(X)[[2]],"VIF") return(b) } #Example using the VIF function: Note that I omitted the column #corresponding to the response. If you leave this in, your VIF values #will be incorrect. > VIF(cbind(sales[,-3],sales[,1]*sales[,2])) REMINDER: Your input matrix should not include the response VIF X1 702.45343 X2 26.47518 X2 926.92683 > VIF(cbind(scaled.sales[,-3],scaled.sales[,1]*scaled.sales[,2])) REMINDER: Your input matrix should not include the response VIF X1 2.779383 X2 2.567048 X2 1.204565 _______________________________________________________________________ Correlation of Coefficients: > #original model > summary(lm(Y ~ X1 + X2,sales)) Call: lm(formula = Y ~ X1 + X2, data = sales) Residuals: Min 1Q Median 3Q Max -18.42 -6.216 0.7449 9.436 20.22 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -68.8571 60.0170 -1.1473 0.2663 X1 1.4546 0.2118 6.8682 0.0000 X2 9.3655 4.0640 2.3045 0.0333 Residual standard error: 11.01 on 18 degrees of freedom Multiple R-Squared: 0.9167 F-statistic: 99.1 on 2 and 18 degrees of freedom, the p-value is 1.921e-10 Correlation of Coefficients: (Intercept) X1 X1 0.6881 X2 -0.9898 -0.7813 > #computing the coefficient correlations by hand: > a<-as.matrix(cbind(rep(1,21),sales[,-3])) > MSE<-summary(lm(Y ~ X1 + X2,sales))$sigma^2 > s.betahat<-MSE*solve(t(a)%*%a) > #correlation between beta0.hat and beta1.hat: > s.betahat[2,1]/sqrt(s.betahat[1,1]*s.betahat[2,2]) [1] 0.688088 > #correlation between beta1.hat and beta2.hat: > s.betahat[2,3]/sqrt(s.betahat[2,2]*s.betahat[3,3]) [1] -0.7812993