PDA

View Full Version : Let's fight back


Dave1
07-27-2001, 01:28 PM
I have had enough of people surfing my site using ad-blocking software so I have turned to APACHE to help me out however I am only half way there

I am using .htaccess directives to stop people bnrowsing my site using this type of software and have got some (from this forum) but I want more software to block

Heres what I have sofar



# block offline browsers
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT** ^Offline* [OR]
RewriteCond %{HTTP_USER_AGENT** ^WebZIP* [OR]
RewriteCond %{HTTP_USER_AGENT** ^WebReaper* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Anarchie* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Mass\ Down* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Slurp* [OR]
RewriteCond %{HTTP_USER_AGENT** ^BlackWidow* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Web********* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Wget* [OR]
RewriteCond %{HTTP_USER_AGENT** ^WebHook* [OR]
RewriteCond %{HTTP_USER_AGENT** ^Teleport*
# adblocking software
RewriteCond %{HTTP_USER_AGENT** ^.*Ad.*Muncher.*v4.*.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT** ^.*NetCaptor.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT** ^.*WebWasher.* [NC]
RewriteCond %{HTTP_USER_AGENT** ^.*adsubtract.*$ [NC,OR]
# Send leeches to denied page
RewriteRule ^.*$ /docs/robots_denied.html [L]


Can anyone add any ad-blocking sotware's HTTP_USER_AGENT to this list?

gethosted
07-27-2001, 01:35 PM
I would recommend blocking anything with the words "popup", "banner", "killer", or "ad" in the user agent field. Sure you may lose traffic from a legitimate browser named "popup killer" but I really doubt it :).

Manos
07-27-2001, 05:15 PM
Hey Dave1, thanks for those rules. I like to collect Apache rules... hehehehe.

You may try searching google... I haven't checked in awhile, but last time I went looking I found these rules to block bulk email collecting software.

#Block email collectors
RewriteCond %{HTTP_USER_AGENT** ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT** ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT** ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT** ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT** ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT** ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT** ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT** ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT** ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT** ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT** ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT** ^Microsoft.URL [OR]
#RewriteCond %{HTTP_USER_AGENT** ^Mozilla/3.Mozilla/2.01 [OR]
RewriteCond %{HTTP_USER_AGENT** ^EmailCollector
RewriteRule ^.*$ /we_hate_filthy_spammers.html [L]

I've commented out the Mozilla one, since it
could block, as far as I remember from the comments on the web page, any browser based on Mozilla that's created using some specific Visual Basic libraries. It was some seemingly low percentage of traffic, but I decided not to take a chance with it. I wish I could remember the page, but that was a while ago.

Also, people should check this thread out for more helpful ad-blocking discussion: http://66.33.83.213/forums/showthread.php?threadid=9915

mcsebraindumps
07-27-2001, 06:30 PM
Web ********, Offline Explorer, WebZIP, and some of the others you have there are actually spiders. They are more descructive in my opinion than anyone blocking ads from your site because they spider your entire site which inflate your page views and gets you banned from programs. I can't use search boxes on my site for this reason, been booted from two places. I would definitely put Teleport Pro in your list as it's the most common spider. I've gone a step further than you and had an entire collection of scripts written to handle this problem. I don't use htaccess because my list would be too long :) Fortunately I have a dedicated server and have root so I use ipchains which completely bans an IP. I have one script to parse the log file looking for those agents and ban them on sight. I then have another script to ban any IP accessing more than 40 files (not including .gif and .jpg) in 60 seconds. I then have a hidden tag in my site pointing to /cgi-bin/mustdie.pl which bans any IP hitting that :) I think I catch about 99% of unwanted spiders. My scripts also allow me to tell it certain domains not to ban such as search engines. Last but not least, my script unbans each IP after 24 hours. After about a month of doing this I think I finally have the problem under control. Anyway, here's my list of agents to ban:

Teleport|Offline Explorer
DISCO Pump
WebZIP
HTTrack
MSIECrawler
FlashGet
libwww
Web********
WebCopier
ia_archiver
WebCapture
Downloader
GetRight
Fetch
NetAnts
SuperBot
Wget

If you have large files on your site I suggest allowing GetRight and NetAnts since they can also be used solely as download agents. The problem is, they can also be used to grab your entire site.

harry
07-27-2001, 07:55 PM
Hi Folks

Just wish to ask whether its possible to write a program in JavaScript to be instaled on html pages and can block the mentioned ad-blocking software or just to display a blank page if the surfer is using such software?

Thanks!

Harith
http://www.danex-exm.dk

demae
07-27-2001, 08:12 PM
You mean the ad blocking software actually gives a real USER_AGENT string? How sloppy of them. ^_^