Splus handout: Multiple regression Formatted output We shall fit regression models for the data from Section 6.9 of NKNW. > sales<-read.table("DATA/NKNW/mine/CH06FI05.DAT",header=T) > sales X1 X2 Y 1 68.5 16.7 174.4 2 45.2 16.8 164.4 3 91.3 18.2 244.2 4 47.8 16.3 154.6 5 46.9 17.3 181.6 6 66.1 18.2 207.5 7 49.5 15.9 152.8 8 52.0 17.2 163.2 9 48.9 16.6 145.4 10 38.4 16.0 137.2 11 87.9 18.3 241.9 12 72.8 17.1 191.1 13 88.4 17.4 232.0 14 42.9 15.8 145.3 15 52.5 17.8 161.1 16 85.7 18.4 209.7 17 41.3 16.5 146.4 18 51.7 16.3 144.0 19 89.6 18.1 232.6 20 82.7 19.1 224.1 21 52.3 16.0 166.5 Note: I didn't like the format of the pairs plot, so I changed the order of the columns >pairs(sales[,c(3,1,2)]) > sales.lm<-lm(Y ~ X1 + X2,sales) > summary(sales.lm) Call: lm(formula = Y ~ X1 + X2, data = sales) Residuals: Min 1Q Median 3Q Max -18.42 -6.216 0.7449 9.436 20.22 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -68.8571 60.0170 -1.1473 0.2663 X1 1.4546 0.2118 6.8682 0.0000 X2 9.3655 4.0640 2.3045 0.0333 Residual standard error: 11.01 on 18 degrees of freedom Multiple R-Squared: 0.9167 F-statistic: 99.1 on 2 and 18 degrees of freedom, the p-value is 1.921e-10 _____________________________________________________________________ _____________________________________________________________________ RESIDUALS > sales.lsfit<-lsfit(cbind(sales$X1,sales$X2),sales$Y) > e.star<-ls.diag(sales.lsfit)$stud.res > par(mfrow=c(2,3)) > boxplot(e.star,ylab="estar") > title("Fig 1: Boxplot of the \n studentized residuals") > plot(fitted(sales.lm),e.star) > abline(h=0) > title("Fig 2: studentized resids vs y.hat") > plot(sales$X1,e.star) > abline(h=0) > title("Fig 3: studentized resids vs X1") > plot(sales$X2,e.star) > abline(h=0) > title("Fig 4: studentized resids vs X2") > plot(sales$X1*sales$X2,e.star) > abline(h=0) > title("Fig 5: studentized resids vs X1*X2") > qqnorm(e.star) > abline(0,1) > title("Fig 6: qqplot of studentized resids") _____________________________________________________________________ _____________________________________________________________________ FITTING OTHER MODELS _____________________________________________________________________ Omitting the intercept: > summary(lm(Y ~ X1 + X2 -1,sales)) Call: lm(formula = Y ~ X1 + X2 - 1, data = sales) Residuals: Min 1Q Median 3Q Max -17.28 -8.195 -0.3514 5.98 23.36 Coefficients: Value Std. Error t value Pr(>|t|) X1 1.6217 0.1549 10.4664 0.0000 X2 4.7504 0.5832 8.1448 0.0000 Residual standard error: 11.1 on 19 degrees of freedom Multiple R-Squared: 0.9968 F-statistic: 2917 on 2 and 19 degrees of freedom, the p-value is 0 _____________________________________________________________________ Interactions in lm: > summary(lm(Y ~ X1 + X2 + X1:X2,sales)) Call: lm(formula = Y ~ X1 + X2 + X1:X2, data = sales) Residuals: Min 1Q Median 3Q Max -18.92 -7.328 0.4411 9.411 20.81 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -12.4763 227.9628 -0.0547 0.9570 X1 0.5319 3.5980 0.1478 0.8842 X2 6.0933 13.4039 0.4546 0.6552 X1:X2 0.0529 0.2058 0.2569 0.8003 Residual standard error: 11.3 on 17 degrees of freedom Multiple R-Squared: 0.9171 F-statistic: 62.66 on 3 and 17 degrees of freedom, the p-value is 2.128e-09 _____________________________________________________________________ Note that these three expressions are equivalent: > summary(lm(Y ~ X1 + X2 + X1:X2,sales)) > summary(lm(Y ~ X1*X2,sales)) > ls.print(lsfit(cbind(sales$X1,sales$X2,sales$X1*sales$X2),sales$Y))