I needed to extract valid http request URLs from a client's web access logs (IIS). It wasn't sufficient to whitelist known good URLs since there were many URLs that attempted directory traversal or executing php scripts from legal looking URLs.
A simple solution came to mind after some percolating. I'd extracted timestamps, paths, HTTP method, remote IP address (client) and a few other things. I didn't need to have perfect logs (didn't need to correctly identify every single log entry). Having a few false positives and a few false negatives was fine.
Identifying IP addresses used by invalid requests (in my data set, anything with .. or php in them were invalid, the first being a sign of someone attempting directory traversal attacks and the second being a sign of someone probing for vulnerabilities in many php applications.
delete from [table] where ip in (select ip from [table] where url like '%..%' or url like '%php%')
worked very well.