Tuesday, July 07, 2020

Removing hacking attempts from web logs

I needed to extract valid http request URLs from a client's web access logs (IIS).  It wasn't sufficient to whitelist known good URLs since there were many URLs that attempted directory traversal or executing php scripts from legal looking URLs. 

A simple solution came to mind after some percolating.  I'd extracted timestamps, paths, HTTP method, remote IP address (client) and a few other things.  I didn't need to have perfect logs (didn't need to correctly identify every single log entry).  Having a few false positives and a few false negatives was fine.

Identifying IP addresses used by invalid requests (in my data set, anything with .. or php in them were invalid, the first being a sign of someone attempting directory traversal attacks and the second being a sign of someone probing for vulnerabilities in many php applications.

delete from [table] where ip in (select ip from [table] where url like '%..%' or url like '%php%')
worked very well.