Command to find all image files which are not really image files!

Quick one this … so you’ve got a compromised webserver and you want to check the files on it. Many scanning tools will ignore images, but an image might not always be what it seems! Check them all with this command:

find /path/to/dir -regex ".*\.\(jpg\|png\|gif\)" -exec file {} \; | grep -i -v "image data"

If all is good, you won’t get any output. If your server is seriously borked, then you might see things like this …

./wp-content/uploads/2011/01/22.jpg: HTML document, ASCII text

This is a flag that the image is in fact a PHP file. Investigate!

If you get this kind of thing,
./wp-content/uploads/2011/01/221.jpg: Minix filesystem, V2, 46909 zones

its probably a bug in an old version of file, so check your OS version, copy the file to a more recent OS and try again.

Bash script to clean Bots out of Apache Logs

If you’ve ever spent some time looking at webserver logs, you know how much crap there is in there from crawlers, bots, indexers, and all the bottom feeders of the internet. If you’re looking for a specific problem with the webserver, this stuff can quickly become a nuisance, stopping you from finding the information you need. In addition, its often a surprise exactly how much of the traffic your website serves up to these bots.

The script below helps with both these problems. It takes stats of a logfile (apache, but should also work on nginx), makes a backup, counts the number of lines it removes and each kind of bot, and then repeats the new stats at the end. Copy the following into a file eg. cleanlog.sh and run with the name of the logfile as an argument. eg cleanlog.sh logfile.log (Use a copy of the logfile. It DELETES the lines)
You’ll definitely want to edit the LOCALTRAFFIC bit to fit your needs. You may also want to add bots to the BOTLIST. Run the script once on a copy of the logfile and then view it to see what bots are left …

#!/bin/bash

INFILE=$1
cp $INFILE $INFILE.bak

#filestats before
PRELINES=$(cat $INFILE | wc -l )
PRESIZE=$( stat -c %s $INFILE )
echo $INFILE is $PRESIZE bytes and contains $PRELINES lines
echo 

# List of patterns to delete from logfiles
LOCALTRAFFIC=" wp-cron.php 10.10.0.2 "

echo "-------- Removing local traffic ---------"
for TERM in $LOCALTRAFFIC; do
    TERMCOUNT=$( grep "$TERM" $INFILE | wc -l )
    echo Removing $TERMCOUNT instances of $TERM
    sed -i  "/$TERM/d" $INFILE
done

# List of patterns to delete from logfiles, space separated
BOTLIST="ahrefs Baiduspider bingbot Cliqzbot cs.daum.net DomainCrawler DuckDuckGo Exabot Googlebot linkdexbot magpie-crawler MJ12bot msnbot OpenLinkProfiler.org opensiteexplorer pingdom rogerbot SemrushBot SeznamBot sogou.com\/docs tt-rss Wotbox YandexBot YandexImages ysearch\/slurp "

echo "------- Removing Bots ---------"
for TERM in $BOTLIST; do
    TERMCOUNT=$( grep "$TERM" $INFILE | wc -l )
    echo Removing $TERMCOUNT instances of $TERM
    sed -i  "/$TERM/d" $INFILE
done

#filestats after

POSTLINES=$(cat $INFILE | wc -l )
POSTSIZE=$( stat -c %s $INFILE )
PERCENT=$(awk "BEGIN { pc=100*${POSTLINES}/${PRELINES}; i=int(pc); print (pc-i<0.5)?i:i+1 }")

echo 
echo $INFILE is now $POSTSIZE bytes and contains $POSTLINES lines
echo Log reduced to $PERCENT percent of its original size.

And here is a sample output.

~/temp/log $ ./cleanlog.sh 02Apr.log
02Apr.log is 2432560 bytes and contains 10238 lines

-------- Removing local traffic ---------
Removing 1054 instances of wp-cron.php
Removing 776 instances of 10.10.0.2
------- Removing Bots ---------
Removing 525 instances of Googlebot
Removing 226 instances of DomainCrawler
Removing 1061 instances of Baiduspider
Removing 377 instances of pingdom
Removing 1343 instances of OpenLinkProfiler.org
Removing 1087 instances of opensiteexplorer
Removing 212 instances of YandexBot
Removing 26 instances of YandexImages
Removing 163 instances of SemrushBot
Removing 17 instances of ysearch\/slurp
Removing 44 instances of msnbot
Removing 385 instances of bingbot
Removing 95 instances of Wotbox
Removing 51 instances of Cliqzbot
Removing 22 instances of Exabot
Removing 116 instances of SeznamBot
Removing 5 instances of magpie-crawler
Removing 57 instances of sogou.com\/docs
Removing 6 instances of rogerbot

02Apr.log is now 566074 bytes and contains 2590 lines
Log reduced to 25 percent of its original size.

So you can see that around 75% of the traffic on here is crap. And now the log file is much easier to read.

 

Turning off ipV6 in Ubuntu 16

My home router doesn’t handle IPv6, and for that matter, neither does my ISP, so I get a lot of IPv6 related garbage in my syslog and kern.log. To turn it off, you need to create a new file, rather than editing a system file, and then reload these settings.

sudo nano /etc/sysctl.d/95-disable-ipv6.conf
#add the following lines
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
# reload kernel parameters
sudo service procps reload
# Check that its taken ... the result of this should be 1
cat /proc/sys/net/ipv6/conf/all/disable_ipv6

Nice.

Secret WordPress Options Page

OK, so maybe its not a complete secret, but after around 10 years of running WordPress I only just found out about it. Here it is:

http://domain.com/wp-admin/options.php

Obviously you have to be logged in as admin. This basically gives you access to the wp_options table without having to go to a separate database management app, or do some mysql command line ninja.

Who knew? Its not on a menu, probably to hide it from fat fingers, but hey, I wish I’d known about this earlier. Would have saved me a few hours over the course of my work life.

Blocking irritating PLDT Billing popups

So you forgot to pay your bill. It happens. PLDT used to set an automated phone call to call you once a day until you paid. Now they have something much more irritating, and something that feels borderline illegal. What they do is they hijack your internet connection. Every four or five page loads, they inject an HTML frame with a monster ad, which sits on top of all your work. You have no choice but to click on the button which takes you to a page with the amount you owe (which means they know which IP is your account BTW).

However there is no way to remove the popup, which sits on top of your browser. There is no ‘close window’ button, which means your only choice is to refresh the page, or close the window and re-open it. Too bad if you were working on something and hadn’t pressed the save button: that’s now gone. And even after you pay, the popup sticks around for a day or two … (how hard can it be to automatically cancel it?)

So how do we block it? The frame is an iframe which uses an IP address 210.213.253.135, so we can’t use DNS blocking or hostfile blocking. So changing the routing table seems to be the way to go.

On Linux, using sudo if necessary:

# check routing table
route -n
# Add rule
route add 210.213.253.135 gw 127.0.0.1 lo
# check routing table again
route -n
# check desired result
ip route get 210.213.253.135

Gotcha