ST500 January 3, 1996 Session 2 Class will be held from 10-noon in C141, 1st Floor of Clark The CTSS Lab will also be open 9-1 weekdays. Assignments are due beginning of 2nd class period DATE TOPIC Tue 2 Intro/mail/pico/ftp + telnet/unix Wed 3 Unix/ftp/vi + pico/.login file/printing/sas commands assign 1(a)-sas Thu 4 Review/spss commands assign 1 (b)-spss Fri 5 minitab commands/pc stat pgm info assign 1(c)-minitab Mon 8 Regression: sas/spss/minitab assign 2-regression Tue 9 Analysis of Variance:sas/spss/minitab assign 3-anova Wed 10 Sas - Interactive Display Manager assign 4-land appraisal Thu 11 anov + reg options/diagnostics Fri 12 Xwindows:lnsight(sas)/bmdp/xbmdp Grading: assignments 1-4 @ 20% each =80% attendance =20% Second session _Unix: man command>f1 A Practical Guide to the Unix System, Mark G. Sobell $27.95 Unix in a Nutshell, O'Reilly & Assoc $19.50 _Get on Vector mailing list at ACNS and setup grad student account _ftp : use either c:\temp to read and write files on the lab pcs _vi: full screen editor + pico _printing from lamar at studlO/msite (lpr -PstudIO or lpr -Pmsite) _printing from lamar in C141 - need to use ftp _sas commands: syntax + examples _assign 1 (a) - sas (note that this is due 2 class days later: Jan 5 at the beginning of class) ST500 Assiqnment 1 (a) - sas Assignments 1(b) and 1(c) will be similar to this exercise, but use spss and minitab, respectively. Whenever you begin data analysis with a computer package, you must first check your data for invalid values or 'outliers', make sure that the data fields are being read correctly, verify that missing data, if any, is treated appropriately, and decide if transformations of the response variables might be appropriate. The tasks below are suggested as some ways in which to do that checking. The following tasks should be done with SAS using the assign1.dat data file. To get a copy in your home directory: cp ../data/assign1.dat assign1.dat. You need to email (or print) the (1)control statements (2) xxx.log file by beginning of class on Jan 5. 1. Read assign1.dat into SAS then list it. (NOTE: the file "../assign/a1a" should be a tremendous help!) 2. Do a frequency distribution of all data fields. (PROC FREQ) (By now you should have noticed a negative value for weight. It should be treated as missing for further analyses. IF .. THEN .. =.;) ( The 'small' value for weight should be 84.0, use vi or pico to fix the data file.) 3. Compute means/minimums/maximums of height,weight and age for m and f (sex) separately. (PROC MEANS; CLASS ..;) 4. Create two SAS data files: one with m and one with f. (DATA ..;set ..;IF ..;) 5. Recombine the two files into one file and print.(DATA ..; set ....;) 6. Sort the file by name within sex, and print. (PROC SORT: BY ....; 7. Plot weight against height and age, using two symbols on the plot to indicate sex. (PROC PLOT; PLOT.. *.. =..; 8. compute LOG10 of weight (Iwt), then plot Iwt against age with symbols for sex as in 7. (Iwt=log10(weight) ;) 9. Use PROC UNIVARIATE to get stem + Leaf plot, boxplot and normal probability plot of weight, height, and Iwt. (PROC UNIVARIATE PLOT; VAR ..;) Now without using the assign1.dat data, we will look at several data distributions. Remember if you see histograms similar to those for C and D, the data can be transformed to 'normal' with a square-root or log transformation. Other common transformations are the inverse (1/y) and arcsin-sqrt(p). 10. Generate 200 random normal observations, A. {DATA temp; DO i=1 TO 200; A=NORMAL(1234567); output;end;} Then compute B=A*.2+1 {Normal (1,.04)}. Then compute C=B*B. Then compute D=10**(B). Finally, run PROC UNIVARIATE as in 9) for variables A,B,C and D.