[Last updated Jan 16, 1998; p.83 has an extra " in line 5.] This is an errata to the second edition of the Sed & Awk book by Dougherty & Robbins. It contains corrections, additions and comments based on my own reading of the book. Feel free to ignore any or all of it. Bits in " " are directly from the text. Some entries are prefixed by a , which means 'A trifling matter, and fussy of me, but we all have our little ways.' This errata covers the first 11 chapters and one small bit from the printf() spec in the appendix B. Chapters 12 and 13 and the appendices may be covered sometime in the future. p. xiv ------ Under Dos versions, 'egrep' should be capitalized, as that would be consistent with the remainder of the book. p. xix ------ tcsh should be mentioned along with csh. p. 7 ---- "...remember to escape any exclamation points with a backslash ("\!"). There is no other way to get csh to leave the exclamation point alone." This is not entirely true. In csh, there is the shell variable histchars which allows one to change the history invoking character. See the csh man page for details. p. 16 ----- "For instance, the first example could have been entered without them [single quotes] but in the next example they are required, since the substitution contains spaces: $ sed 's/ MA/, Massachusetts/' list " Alas, one can also backslash the spaces to achieve the same effect: $ sed s/\ MA/,\ Massachusetts/ list p. 20 ----- "Multiple command lines can be entered in the same way as shown for sed: separating commands with semicolons or using the multiline input capability of the Bourne shell." What is not explicitly mentioned is that -e, which works for sed for executing multiple command lines is not a valid multiple command option (or indeed any option) in awk. p. 21 ----- "This allows us to retrieve any of three fields: the full name, the street address, and the city and state." The first 'and' should be an 'or' to match the 'any,' or the 'any of' should be 'all' to match the 'and.' Got that? p. 24 ----- The byState script, as written, prints out the first person in each state twice. In short, the first and last sections of the second awk script should be reversed. Here's a longer explanation. Consider this fragment: $1 != LastState { LastState = $1 print $1 print "\t" $2 } $1 == LastState { print "\t" $2 } Suppose we have, as the first line coming in: California, Amy Wilde, 334 Bayshore Pkwy, Mountain View, California $1 != LastState { # True LastState = $1 # LastState = California print $1 # California print "\t" $2 # Amy Wilde } # ok so far $1 == LastState { # oops, this is true now print "\t" $2 # Amy Wilde } # done Total output: California Amy Wilde Amy Wilde So, the final script should look like this: #!/bin/sh awk -F, '{ print $4 ", " $0 }' $* | sort | awk -F, ' $1 == LastState { print "\t" $2 } $1 != LastState { LastState = $1 print $1 print "\t" $2 }' -- $* in the /bin/sh script is not explained. It means "all of the command line parameters to the script," which in this case is none, since byState has input coming from a pipe, i.e. the output of 'sed -f nameState list'. p. 28 ----- "... ls * will list all the files in the current directory." Clearly not the ones beginning with '.' or '..'; also, this matches subdirectories and ls will print out the files contained in those subdirectories. p. 35 ----- The example with punctuation marks is a tough one to show, since different shells like to eat characters in different ways. Still, this worked under tcsh: grep '.[\!?;:,".] .' /etc/motd i.e. the ! is the character that needs to be escaped even inside of ''. p. 38 ----- SEDAWK-CH-3-TAB-3 means the table on the next page, Table 3-3. p. 47 ----- "Our first attempt at writing a regular expression to search for a word concluded with the following expression: " book.* ". On p. 42, the last example uses " book.? ". p. 48 ----- Many of the complex examples do not work in the {t}csh due to quoting problems. Save yourself the trouble and execute all the commands in a Bourne-type shell (sh, ksh, etc.) p. 62 ----- The bit about shell scripts should probably have come earlier, considering that two scripts, byState and gres, have been used already. In addition, a one line "chmod +x script" might have also been helpful. p. 78 ----- The d command was previously introduced. :) p. 81 ----- On 4 tested sed programs (SunOS 4.1.4, gnu sed v. 2.05 under SunOS 4.1.4, Solaris 2.5.1 /usr/ucb/sed and /usr/bin/sed), none of these had any problems with using a blank as a delimiter. p. 83 ----- "The twist in this problem is that the line needs to be preceded and followed by a blank line." The script, as shown, puts two blank lines before the line. Remove the third line from the script to get the desired behavior. In addition, line 5 reads: s/""//g which of course should be: s/"//g [Thanks to joachim.keser@ubs.com for the last bit.] p. 85 ----- There are two lines: /^\.XX/ /s/sed, substitution command/sed, substitute command/ /^\.XX/ /s/substitution/substitute/ Clearly this cannot be correct. I think what the authors were going for were: /^\.XX /s/sed, substitution command/sed, substitute command/ /^\.XX /s/substitution/substitute/ p. 87 ----- The "corpse of a deleted line" comment comes from the original Lee E. McMahon paper. It is not usually found in Unix man pages. p. 88 ----- The footnote appears to be backwards in its explanation. Leading spaces and tabs *do not* get stripped using gnu sed or SYSV sed, but SunOS 4.1.x sed and /usr/ucb/sed from Solaris 2.x *do* strip the leading blank space. p. 90 ----- If one doesn't use the -n option when using the list command, one doesn't get exactly the same output for both lines. Instead, the output of the second line (the default print) has some interpretation of the special characters done to it, so under tcsh, xterm emulation, gnu sed and Solaris 2.5.1, the output of the second line looks like: ere is a string of special characters: p. 101 ------ "Closes a database file" should be "Closes a database" p. 102 ------ The substitution: s/There is no return value\.*/None./ matches sentences that end with zero or more periods. It is unclear where such a construct as "value....." would come from. p. 103 ------ "closes a database" should be "Closes a database" p. 105 ------ The second address in the revised getmac script should be /^\.\.$/, as should the line with q on it; otherwise, the pattern matches all lines that have .. at the beginning of a line. Adding the $ also makes the script consistent with the previously introduced getmac script. p. 118 ------ "Lenny's first script" is an enigma. If it is run on the file without the previous s command, the errant results are different than those mentioned in the book. If, however, it is added to the end of the sed.len script, it works perfectly. p. 145 ------ In many places, patterns specified for matching also match other, possibly non-intended, patterns. For example, the rule for matching an area code also matches entries with a single parenthesis, i.e. 1(707 724-0000 and 1 707)724-0000. p. 153 ------ "OFMT does the same job [as CONVFMT], but controlling the conversion of numeric values when using the print statement." It's an awkward statement. I think the authors are trying to say that OFMT does the same job [as CONVFMT], but it is used for numeric conversion *only* when using the print statement. Also, the output of the new phonelist.awk is incorrect. It is missing a blank line between the names and the 'records processed' line. p. 175 ------ "(e.g., i > 0)" This should be (e.g., i > 4) p. 179 ------ "The factorial of a number is the product of successively multiplying that number by one less than that number." The authors should probably add, "stopping at one." Also, factorials are not defined for anything except positive integers and 0. p. 191 ------ The sixth line of the program has the incorrect "numberals" instead of numerals. p. 194 ------ The output from "awkro sample" has some lines that are wrapped due to terminal line length, not by awk magic (later acknowledged on p. 195). p. 195 ------ This bit of code: delete acro[acronym] does not do anything, as the variable acronym has never been used. Hence the modification to replace an acronym only the first time it appears is incorrect. Add the line: acronym = $1 before the $i = acro[$i] " (" $i ")" line to get the desired behavior. p. 196 ------ "You might refer to the element in the second column of the third row as "array[3,2]." Only if you were a fortran programmer. Everyone else would refer to it as array[2,3], or perhaps array[1,2] if the indices started at zero. p. 200 ------ "The shell script works the same as the first example of invoking awk." Alas, it doesn't. The shell script outputs 7 for ARGC, breaking up John and Wayne as two separate parameters. p. 201 ----- phones.data is not included anywhere, including progs.tar.gz or in the book. It is mentioned as 'names' in chapter 7 and also as phones.block when in block format, but the actual file is nowhere to be seen. p. 202 ------ "As a special case, if the value of an ARGV element is the empty string, awk will skip over it and continue on to the next element." I'm not sure what this means, but feeding the empty string to the command line of argv.awk does increment the ARGC counter and an empty string is printed in the ARGV output. p. 212 ------ "Use "&" to output an ampersand." That should be "... \& to output an ampersand." The next line then qualifies that to "\\&". p. 228 ------ The getline.awk script does not account for the possibility that "Name" could be in all caps in the man page (the gnu ls man page is like this). Hence, the pattern match should look like: /^\.SH "?[Nn][Aa][Mm][Ee]"?/ In addition, the script does not anticipate multiple names on the line following the .SH. Running the gnu ls man page through the modified script causes "ls," to be printed because the ls man page contains the line ls, dir, vdir \- list contents of directories p. 229 ------ The original pattern match in sorter.awk did not have the "? bits. p. 231 ------ For nis (nee yp) users, you should do a ypcat passwd > blah and then run the getname script on blah, rather than /etc/passwd. nis+ users should use the niscat command instead of ypcat (niscat passwd.org_dir may be the actual command; I have no experience with nis+). todays in the subdate.awk comment should be today's. None of the native SunOS 4.1.x date commands (/usr/bin/date, /usr/5bin/date) have a %Y option. The best you can do is %y, which returns the last two digits of the year. The gnu version of date does have %Y. p. 234 ------ The output of the soelim.awk script assumes that test1 contains: first:second one:two and that test2 contains: three:four five:six This wasn't exactly clear to me on first reading. p. 236 ------ The line that demonstrates the -v option should end with a - to make it consistent with the previous command line. p. 237 ------ In invoke, the comment in the book for the clear screen line does not match the one from the example programs. p. 241 ------ The footnote states that it is "still more efficient to close your files when you're done with them," rather than let gawk attempt to open and close files as needed. While I'm sure it is cleaner and perhaps better programming style, I fail to see how this can be true (especially) if the number of files open is less than the OS limit. On an related note (SunOS 4.1.x), {n}awk seems to be better than gawk in determining errors in close(). For example, consider this awk program: { print $0 > "/tmp/1" } Now consider the following line: close(/tmp/1) When inserted in the main program, {n}awk flags this as a syntax error. while gawk has no problem with it. When put in the END section, {n}awk still flags it as a syntax error, but gawk prints: gawk: test1:4: (FILENAME=/etc/motd FNR=23) fatal error: internal error Abort [gnu awk v. 3.0.2] p. 246 ------ The filename "orders" mysteriously becomes "orders.today." It's the same file. p. 253 ------ Note that the script which *does not* work shows completely different characteristics if /usr/local/bin/gawk is used instead of /bin/awk. With gawk, executing myscript either by itself or with a filename allows the user to input keystrokes which are echoed to the screen, but nothing else happens. p. 273 ------ Thompson Automation Software can be found at http://www.tasoft.com/~thompson. p. 378 ------ The command summary for printf() seems to imply that unless one is printing variables without format specifications, the () bits must be used with printf. Alas, the invoke script uses printf with format specs without ()s.