Menu

Topic-icon Bots & Crawlers

  • HBAGG
  • HBAGG's Avatar Topic Author
  • Offline
  • Fresh Boarder
  • Fresh Boarder
More
1 year 8 months ago - 1 year 8 months ago #4362 by HBAGG
Bots & Crawlers was created by HBAGG
Hi there
I have 13 Bots/Crawlers all over my websites that are causing me server issues which is one of the reasons I got this lovely component.
I have blocked four that I managed to work out but can anyone tell me how to identify the others to block them
Attachments:
Last edit: 1 year 8 months ago by HBAGG. Reason: Folder not attached

Please Log in or Create an account to join the conversation.

More
1 year 8 months ago #4363 by Jose
Replied by Jose on topic Bots & Crawlers
Hi Heather,

Thank you very much for your confidence in my extensions! I'm glad to hear it suits your needs! :)

The way to identify that bots is analyzing your server's access log; there you will see something like this:
fcrawler.looksmart.com - - [26/Apr/2000:00:00:12 -0400] "GET /contacts.html HTTP/1.0" 200 4595 "-" "FAST-WebCrawler/2.1-pre2 (ashen@looksmart.net)"
fcrawler.looksmart.com - - [26/Apr/2000:00:17:19 -0400] "GET /news/news.html HTTP/1.0" 200 16716 "-" "FAST-WebCrawler/2.1-pre2 (ashen@looksmart.net)"

ppp931.on.bellglobal.com - - [26/Apr/2000:00:16:12 -0400] "GET /download/windows/asctab31.zip HTTP/1.0" 200 1540096 "http://www.htmlgoodies.com/downloads/freeware/webdevelopment/15.html" "Mozilla/4.7 [en]C-SYMPA  (Win95; U)"

123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

As you can see every line has info like the IP, date, resource accesed and the user-agent used. Most web crawlers have its description in the user-agent field (although this field can be modified to hide the real bot, or even leave it empty), so you can block unwanted bots using that info.

In the next version of SCP, that will be released in a few days, I have updated the list of predefined malicious bots with more than 40 new bots.

Regards,
Jose
The following user(s) said Thank You: HBAGG

Please Log in or Create an account to join the conversation.

Time to create page: 0.102 seconds
Powered by Kunena Forum

Login or Sign In