GoAccess Bulk Script
After putting up the Documentation for TaskFiend, by far my most checked out code, I got curious if anybody was actually accessing it. As a Free Software Zealot™, I wasn’t about to use Google Analytics. I don’t really even want to put more than the bare minimum JavaScript (grumble grumble cookie notice), much less trakcing cookies, so I was basically looking for something to make my Apache logs pretty. Enter GoAccess.
GoAccess’ page has a big picture of it as a CLI utility, but it will render an HTML file in real time if so requested:
goaccess [path to access log] -o [desired output file path - must be in an htdocs folder so it can be served] --log-format=COMBINED --real-time-html --daemonize
I wanted to keep it private from you snoops on the internet, so I set up Apache Basic Auth to protect the directory. That was kind of irritating to do, but I just have to do it once.
Except this is totally cool, I should do this for all the sites on this server! I do not want to open all those config files. As a not perfect workaround, all the sites’ analytics are served from one website. Then I only need to protect one directory.
After attempting to run this twice (for two different sites), I found that it requires the use of a port, so running it a second time will fail silently because the port is already in use.
I don’t really need up to the second analytics - I’m just satisfying mild curiosity. I decided1 to have it write a script that generates a file once and put it on a cron job. Even every 15 minutes is more than I need2, but I decided on that based on the scientific principle of yeah sure why not:
*/15 * * * * /home/kj/bin/update_all_goaccess_sites.sh
At first I hard-coded an array of (site1 site2 etc), but that was unamusing. Instead, I wrote a script to go through all the directories in my vhosts directory searching for files named access.log. It sends all those paths along to GoAccess. The resultant file is named sitename.html:
The need for the Dr. Binocs functionality (a list of directories not to process) got refactored away, but I kept it in here because it seemed like the kind of thing that might one day be useful.
#!/bin/bash
base=/var/www/vhosts
dr_binocs=(dead)
for full_path in `find $base/*/logs/ -name access.log`; do
if [ `printf '%s\0' "${dr_binocs[@]}" | grep -F -x -z -- $full_path` ]; then
echo SKIP - Skipping $full_path because Dr. Binocs told me to.
else
# https://stackoverflow.com/questions/918886/how-do-i-split-a-string-on-a-delimiter-in-bash
IFS='/' read -ra site <<< "$full_path"
echo Found "${site[4]}"
time goaccess "$full_path" -o "$base/[whatever directory corresponds to the site you want to host them from]/htdocs/analytics/${site[4]}.html" --log-format=COMBINED
fi
echo '';
done
echo ''
echo Fin! htts://[your website]/index.php
Note that it won’t create directories if the out path doesn’t exist. mkdir before you get started.
Although I could deduce any given site’s path, I want to be able to just troll a list of sites whose analytics I can peer into. I wrote a PHP script that basically just runs ls. That’s the index.php file referred to on the last line of the prior script:
<?php
// Create a list of files in this directory
$files = scandir(__DIR__);
$ignore = ['.', '..'];
foreach ($files as $file) {
if (in_array($file, $ignore)) {
continue;
}
echo '<p><a href="'.$file.'">'.$file.'</a></p>';
}
A couple obvious improvements could be made but I don’t care that much:
- A full HTML file rather than just a random snippet
- Remove the .html off the name of the site
Will this give me actionable insights? No. Will it provide amusement for a few days before I forget about it completely? Yes. I share this in case anybody out there is in a similar boat and wants a couple days of unproductive amusement. Enjoy!