Caching minidns

I’ve started using analog to build reports from the blueslugs.com server logs. One of the tools on the site is minidns, which is a small Perl script that runs through the logs, replacing any IPv4 address matches with their DNS-resolved names (if defined). I’ve improved this slightly, by adding caching inside the application (on the grounds that your site is likely visited by the same communities over time) and, via Storable, to a state file (on the grounds that you probably run the analyzer over your logs rather regularly). [Get cachedns.pl.]

Is it worth it? Read on.

Here are some simple timed runs with the original (minidns.pl) and with cachedns.pl.

$ time perl minidns.pl < ./al.1000 > /dev/null

  real    1m56.431s
  user    0m0.030s
  sys     0m0.030s

al.1000 is the first 1000 lines of an Apache server log. Our name service cache, nscd(1), and the DNS server we’re calling (and the perl(1) text) are now reasonably warmed up for subsequent callers.

$ time perl minidns.pl < ./al.1000 > /dev/null

  real    0m11.210s
  user    0m0.030s
  sys     0m0.020s

$ time perl minidns.pl < ./al.1000 > /dev/null

  real    0m12.943s
  user    0m0.050s
  sys     0m0.010s

So 1000 calls take a little over 10 seconds. Let’s run the caching version:

$ time perl cachedns.pl -c dns.cache < ./al.1000 > /dev/null

  real    0m8.579s
  user    0m0.090s
  sys     0m0.020s

So the internal caching is maybe making a little difference. But let’s rerun with the now-populated cache file.

$ time perl cachedns.pl -c dns.cache < ./al.1000 > /dev/null

  real    0m0.096s
  user    0m0.070s
  sys     0m0.010s

Since the first part of the log file is processed every night, our cache file means that we’re likely only going to perform a DNS lookup for new visitors to the site. (There are many sophisticated DNS resolvers-for-weblogs around, that use C++ or Python or threading or whatever. I just felt that a simple, understandable Perl version, with a boost, was enough for this little site.)