Ask HN: what software do you use for analyzing apache logs?

davidw · on April 8, 2008

I use 'visitors' ( http://www.hping.org/visitors/ ), which was written by my friend, YC.news visitor antirez.

I tried Google Analytics too, but found it kind of annoying in that it's sort of a pain to get the information I want.

antirez · on April 8, 2008

Thanks David, note that in most debian-based distributions all you need is apt-get install visitors.

Edit: Also if you could like to play with real-time analysis to get detailed information about how your users are interacting with your web site please feel free to write me at antirez -at- gmail ~dot~ com for a free invitation for http://lloogg.com

apathy · on April 8, 2008

cool, you wrote Visitors? thanks for saving me a lot of work!

I used it from 2004-2005 for clicktrails... great idea and implementation. Graphviz makes everything better (corollary: anything that needs Graphviz is interesting enough to get better)

Google Analytics and/or Visitors are about all I would think a person needs. And now that you have lloogg.com out there, it looks like clicktrails are covered in a GA-style interface. Kick ass! Great idea, again, best of luck.

I haven't run into many websites that couldn't be improved by clickpath analysis and refocusing navigation on the ways people actually use a site -- optimizing for the common case, in other words. It baffles me that any site would fail to do so. Now they really have no excuse.

antirez · on April 9, 2008

Nice to hear that visitors was useful :) and sure Graphviz is very cool, it was a great help in our startup (the main product we developed is a digg/reddit/...-alike system for the Italian market) in order to visualize voting patterns and try to improve the algorithm for fraudolent voting patterns detection.

LLOOGG is still pretty raw and we are developing the "filters" part of the UI, but still it seems pretty useful to check what's really happening in a web site. We have a lot of success stories of people using LLOOGG for some week and then modifying the site structure to optimize the user experience.

Thanks for the comment.

pk · on April 8, 2008

Started with Webalizer, switched to Google Analytics.

Webalizer's ok, but it's missing a bunch of features like sane user agent string parsing (to give an overview of the browsers accessing your site). It also displays most of the stats by page hits (such as country and user agent) rather than "visits" or unique IPs, which I think is a better way to group.

I've been pretty happy with Google Analytics so far - it has a ton of options for sorting and grouping data (like viewing users' paths through the links on the site) and good IP geolocation. Plus, the JavaScript tracker gives you stats on visitors' screen resolution, which can be handy. On the downside, all our data is belong to Google.

nickb · on April 8, 2008

If you have $3K to spare, you can purchase Google Urchin 5/6 (Analytics was based on it) and keep all the data. As a bonus, it also analyzes log files.

kf · on April 8, 2008

Free trial for Urchin 6? http://www.google.com/urchin/download.html

sam · on April 8, 2008

apachelog is a nice python module for parsing log lines from apache. it works as a great base for doing your own analysis. http://code.google.com/p/apachelog/

It's based on this perl module: http://cpan.uwinnipeg.ca/~peterhi/Apache-LogRegex

To manage log files, use cronolog

edit: I use cronolog to break up the logfiles daily and then I run a python script (which uses apachelog to handle the nasty parsing) to create a summary dictionary of parameters from that day. For example:

{num_unique_ips:140, num_pageviews:532, ...}

I pickle that dictionary and save it as a file. So every day has a raw log file and a "summary dictionary" file. To make plots I go through the summary files and unpickle them to extract the quantities of interest.

hoyhoy · on April 8, 2008

AWStats with GeoIPFree is pretty good, but a major hassle to configure.

PStamatiou · on April 8, 2008

#!/bin/bash

sudo awk '{print $11}' access_log | grep -v 'yourowndomain.com' | grep -v 'bloglines.com' | grep -v '"-"' | grep -v 'feedburner.com' | sort | uniq -c | sort -rn | head -20

SwellJoe · on April 8, 2008

We use Google Analytics, Webalizer, and AWStats, plus some custom Perl bits. Clicky looks pretty swish, but I haven't taken the time to try it.

dangrossman · on April 8, 2008

I'm using W3Counter (http://www.w3counter.com). It's like Google Analytics minus the 1-24 hour delay, and all the goodness that comes from realtime reporting. But I keep GA installed as well for some of the more detailed back-reporting it doesn't have.

ivank · on April 8, 2008

Stone Steps Webalizer: http://www.stonesteps.ca/projects/webalizer/

along with cronolog and a bunch of custom Python scripts to autogenerate webalizer configs.

slurpme · on April 8, 2008

I use http://polliwog.sourceforge.net open source, runs on Java. Not suitable for large websites since it provides a LOT of information about your site.

dpapathanasiou · on April 8, 2008

Webtrax (http://www.multicians.org/thvv/webtrax-help.html) is a good open-source tool.

tom_rath · on April 8, 2008

A Windows solution, but WebLog Expert works great for us: http://www.weblogexpert.com/

mleonhard · on April 8, 2008

http://code.google.com/p/recordstream/

ubudesign · on April 8, 2008

I use analog.

xirium · on April 8, 2008

I use a combination of analog and some custom scripts. analog is written by a profession statistician and is a steadfast tool. The custom scripts look for interesting searches. Classics include "What does the fur on a rat do?" and "How do I connect an airbrush to a scubadiving tank?"

uruzseven · on April 8, 2008

Use Awk or Perl. They are available on every Linux system already so it's highly portable and very powerful.

agentbleu · on April 8, 2008

I like bbclone a lot, it's not like the usual stats but gives me very uniquely useful information and much better than many of the others as its so direct,