Logfile Parsing

While parsing logfiles on a Linux machine, several commands are useful in order to get the appropriate results, e.g., searching for concrete events in firewall logs.

In this post, I list a few standard parsing commands such as grep, sort, uniq, or wc. Furthermore, I present a few examples of these small tools. However, it’s all about try and error when building large command pipes. ūüėČ

Of course, the two most important functions are¬† cat¬† for displaying a complete textfile on the screen (stdout), and the pipe¬† |¬† which is used after every call to forward the output to the next tool. For a live viewing of log files,¬† tail -f¬† is used. Note that not all of the following tools can be used with such a type of live viewing, e.g., the sort commands. However, at least “grep” and “cut” can be used.

Filter, Replace, Omit, etc.

  • grep [-v] <text>: Prints only the lines that contain the specified value. When using it with -v, it prints only the lines that do NOT have the specified value. Example:¬† cat file | grep 1234¬† or cat file | grep -v 5678¬†.
  • sort [-g] [-k <position>] [-r]: Sorts the input. Using -g sorts numbers to their real numerical value. With -k the start and stop positions can be set precisely. -r reverses the order. Example: Sort only through the 23th field cat file | sort -k 23,24¬†.
  • uniq [-f <position>] [-s <position>] [-w <number>] [-c]: Deletes multiple entries. -f skips fields, -s skips chars, -w only compares n chars. Example: Delete all lines that have the same value in the 5th field while only comparing the first 10 chars: cat file | uniq -f 5 -w 10¬†. -c can be used to print the number of occurrences for each line.
  • wc -l: Simple word count. -l counts the lines (mostly used).
  • comm [-1] [-2] [-3]: Compares two files and prints three columns with entries only present in file 1, 2, or both. These columns can be suppressed with the -1, etc. switches. Example: Print the lines that are uniq in file2: comm -13 file1 file2¬†.
  • tr -s ‘ ‘: The tool “translate” can be used for many things. One of my default cases is to omit double spaces in logfiles with tr -s ' '¬†. But it can be used for other use cases, such as replacing uppercase letters to lowercase: tr [:upper:] [:lower:], e.g., to have IPv6 address look alike.
  • cut -d ‘ ‘ -f <field>: Prints only the field specified with -f. The field-separator must be set to space. Example: Print the 23th field: cat file | cut -d ' ' -f 23¬†.
  • head -n -<number>: Omits the first n lines. E.g., when each file starts with three comment lines that should be omitted: cat * | head -n -3¬†.
  • sed s/regexp/replacement/: Replaces the part of each line that is specified with the regex. E.g., everything before the keyword “hello” (and the keyword itself) should be removed: cat file | sed s/.*hello//¬†.

A few Examples

Here are a few examples out of my daily business. Let’s grep through some firewall logs. The raw log format looks like the following:

 

Connections from Host X to Whom?

Let’s say I want to know how many destination IPs appeared in a certain policy rule. The relevant policy-id in my log files is “219” (grep id=219). To avoid problems with double spaces, I delete them (tr -s ‘ ‘). The destination IP address field is the 23th field in the log entries – I only want to see them. Since the delimiter in the log file is space, I have to set it to ‘ ‘ (cut -d ‘ ‘ -f 23). Finally, I sort this list (sort) and filter multiple entries (uniq). Here is the result:

If I want to have the whole log entry lines (and not only the IP addresses), I can use sort for the 23th field (sort -k 23,24) and uniq for the 23th field (= skip the first 22 fields) while only comparing the following 20 chars (uniq -f 22 -w 20). This is the result:

 

Count of Connections from Host Y

Another example is the count of connections from host y, sorted by its destinations. The starting point is the source IP address (grep src=192.168.113.11). Double spaces should be removed (tr -s ‘ ‘). Only the destination IP address is relevant, which is the 23th field (cut -d ‘ ‘ -f 23). The output is sorted (sort) and counted per unique entries (uniq -c). To have the counters sorted by its numerical value, another (sort -g -r) is used. This is it:

 

Summary of Session-End Reasons

Grep every log entry that has the keyword “reason” in it (grep reason), followed by a replacement of the whole line until the last field, which is the reason entry. This is done via the regex that is replaced by nothing (sed s/.*reason.//). Finally, similar to the examples above, sorting the output, counting the unique entries and sorting the counts. Here it is:

 

Display Filter with Regex

Here is another example on how to “improve” a logfile output with sed in order to have a better view on it. The following output is from tcpdump sniffing on a network for ICMPv6 DAD messages.

I only want to see the timestamps along with the MAC & IPv6 address. That is, I want to throw away any words and symbols from this output. This can be done with¬† sed s/regexp/replacement/¬† which is called with a regex and a replacement of nothing. In my example, I want to replace anything between the > sign and the “has” keyword. The regex for this is¬† >.*has.¬† which means, beginning with > (which is escaped), followed by anything “.*” until “has”, followed by a single character “.”. And with a second run I want to replace everything to the end starting with the comma:

That’s it. ūüėČ

Leave a Reply

Your email address will not be published. Required fields are marked *