Programming
Welcome to TNL.net. If you like this content, you may consider subscribing to the RSS feed.The second part of a two article series about RSS has been published on InformIT. The first one covers the history of RSS to date, including what it is and what it’s used for. The second article goes into [...]
Be the first to comment
Recently, the MyDoom virus affected the sites of two of the biggest opponents to the Open Source community: SCO and Microsoft. While Vigilante action is plain wrong (a message that few in the open source community seem to be sending out), there are opportunities for the open source community to shine. Here’s how.
As many people [...]
Be the first to comment
For people who have been using my enhanced version of the webalizer.conf file, I’ve just created a new update. This new version includes a number of new user agents, as well as some cleaning up to lower the memory footprint. It is downloadable from its usual location. Enjoy!
Be the first to comment
I’ve just updated the webalizer configuration file I’ve been hacking to reflect new agents that hit my site in September. Other improvements include better handling of sites (I started categorizing things by ISPs and sorting out some of the spiders site) and more improvements to the handling of referrer files. The new version is now [...]
Be the first to comment
Paul Graham highlighted an interesting concept in fighting off spammers. The basic idea is to make anti-spam tools do a counter strike in sites promoted by spammers. The basic idea is that a blacklist would be created to include repeat offenders. When a spam is seen, the server would check the blacklist to see if [...]
Be the first to comment
Over the past few days, I’ve been spending a fair amount of time hacking my webalizer.conf file (see my GroupAgent, SearchEngine, and HideAgent entries on the subject). As a result, I ended up with something that no one else seems to have posted online. I’ve added a few extra goodies in the file so check [...]
Be the first to comment
Many people have written to me to point out that they still get duplicate entries in their user-agent table after using the GroupAgent trick I highlighted. To remove those, you need to use the HideAgent directive. Here is the list of HideAgent directives I have in my file:
HideAgent rv:1.4
HideAgent 3.01
HideAgent 3.02
HideAgent 4.01
HideAgent 5.0
HideAgent 5.01
HideAgent 5.12
HideAgent 5.13
HideAgent 5.14
HideAgent 5.15
HideAgent 5.16
HideAgent 5.17
HideAgent 5.21
HideAgent 5.22
HideAgent 5.23
HideAgent 5.5
HideAgent 6.0
HideAgent 348NorthNews
HideAgent Alcatel-
HideAgent almaden.ibm.com/cs/crawler
HideAgent AmphetaDesk
HideAgent antibot
HideAgent AppleWebKit
HideAgent http://Ask.24x.Info/
HideAgent ASPseek
HideAgent aspseek
HideAgent augurfind
HideAgent AvantGo
HideAgent Awasu
HideAgent Baiduspider
HideAgent BarraHomeCrawler
HideAgent BBot
HideAgent BFS_method
HideAgent Bilbo
HideAgent Bison
HideAgent Blazer
HideAgent blo.gs
HideAgent BlogBot
HideAgent Blogdigger
HideAgent Blogosphere
HideAgent BlogPulse
HideAgent BlogShares
HideAgent Blogwise
HideAgent boitho.com
HideAgent bookwatch@onfocus.com
HideAgent books@onfocus.com
HideAgent BorderManager
HideAgent brainoff.com/geoblog/
HideAgent www.business-socket.com
HideAgent Camino
HideAgent CE-Preload
HideAgent Check and Get
HideAgent china
HideAgent China
HideAgent CJNetworkQuality
HideAgent cloakBrowser
HideAgent combine
HideAgent COMBINE
HideAgent compatible)
HideAgent CoolBot
HideAgent CoologFeedSpider
HideAgent CopyHunter
HideAgent curl
HideAgent DA
HideAgent danux
HideAgent Dattatec.com-Sitios-Top
HideAgent daypopbot
HideAgent DoCoMo
HideAgent DTS
HideAgent Ecosystem/development
HideAgent EgotoBot
HideAgent Elaine
HideAgent EmailSiphon
HideAgent Ericsson
HideAgent ETS
HideAgent eXactSite
HideAgent Exalead
HideAgent exactseek.com
HideAgent EyeOnSite
HideAgent fantomBrowser
HideAgent fantomCrew
HideAgent FAST
HideAgent Fast
HideAgent FavOrg
HideAgent FeedDemon
HideAgent Feedreader
HideAgent FeedOnFeeds
HideAgent Feedster
HideAgent FeedValidator
HideAgent Fetch
HideAgent Finder
HideAgent FlickBot
HideAgent Franklin
HideAgent Frontier
HideAgent Gaisbot
HideAgent GalaxyBot
HideAgent Genome
HideAgent GetRight
HideAgent Gigabot
HideAgent grub-client
HideAgent Google*
HideAgent gossamer-threads.com
HideAgent htdig
HideAgent HTTrack
HideAgent ia_archiver
HideAgent iaea.org
HideAgent iCab
HideAgent Industry
HideAgent Indy
HideAgent INGRID/3.0
HideAgent InternetSeer
HideAgent internetseer
HideAgent IUFW
HideAgent IUPUI
HideAgent IXE
HideAgent Jakarta
HideAgent janes-blogosphere
HideAgent Java
HideAgent jBrowser
HideAgent jiffe
HideAgent junkbuster
HideAgent k2spider
HideAgent Lachesis
HideAgent lachesis
HideAgent larbin
HideAgent Leknor.com
HideAgent Liberate
HideAgent libwww-perl
HideAgent Lincoln
HideAgent Linkbot
HideAgent LinkHype
HideAgent Links
HideAgent LinksManager.com
HideAgent LinkSweeper
HideAgent LinkWalker
HideAgent LNSpiderguy
HideAgent Lynx*
HideAgent MagpieRSS
HideAgent Microcomputers
HideAgent Missauga
HideAgent Missigua
HideAgent Mitsu
HideAgent mogimogi
HideAgent MOT-
HideAgent Mozilla/3.04
HideAgent Mozilla/3.04Gold
HideAgent Mozilla/4.04
HideAgent Mozilla/4.05
HideAgent Mozilla/4.06
HideAgent Mozilla/4.08
HideAgent Mozilla/4.5
HideAgent Mozilla/4.51
HideAgent Mozilla/4.6
HideAgent Mozilla/4.61
HideAgent Mozilla/4.7
HideAgent Mozilla/4.8
HideAgent MSFrontPage
HideAgent MSNBOT
HideAgent MyHeadlines
HideAgent MyWireServiceBot
HideAgent NationalDirectory
HideAgent NaverRobot
HideAgent NCBrowser
HideAgent Netcraft
HideAgent NetNewsWire
HideAgent NetResearchServer
HideAgent NewsGator
HideAgent Newz
HideAgent NG/1.0
HideAgent NIF
HideAgent NITLE
HideAgent nntp//rss
HideAgent Nokia
HideAgent NPBot
HideAgent NRK-bruker
HideAgent Openbot
HideAgent Opera
HideAgent Oddbot
HideAgent Offline
HideAgent OPWV-SDK
HideAgent Oracle
HideAgent Panasonic
HideAgent PEAR
HideAgent PHILIPS-
HideAgent PHP
HideAgent Pix
HideAgent PocketFeed
HideAgent Pompos
HideAgent Popdexter
HideAgent PostNuke
HideAgent Powermarks
HideAgent psbot
HideAgent Python-urllib
HideAgent QuepasaCreep
HideAgent Radio*
HideAgent Rainbow
HideAgent rdflib
HideAgent Robozilla
HideAgent RPT-HTTPClient
HideAgent SAGEM-
HideAgent SAMSUNG
HideAgent Scrubby
HideAgent SHARP-
HideAgent SideWinder
HideAgent slurp@inktomi.com
HideAgent Scooter
HideAgent searchspider.com
HideAgent SearchSpider.com
HideAgent SEC-
HideAgent semanticdiscovery
HideAgent SIE-
HideAgent SharpReader
HideAgent Shareware
HideAgent SlimBrowser
HideAgent Snoopy
HideAgent SOFTWING_TEAR_AGENT
HideAgent SonyEricsson
HideAgent spider@spider.ilab.sztaki.hu
HideAgent SpiderKU
HideAgent Spinne
HideAgent SmartDownload
HideAgent stealthBrowser
HideAgent Steeler
HideAgent SuperBot
HideAgent SurveyBot
HideAgent Sweeper
HideAgent Syndic8
HideAgent Syndirella
HideAgent Syndigator
HideAgent Tagword
HideAgent Technoratibot
HideAgent Teleport
HideAgent Teoma
HideAgent Teradex
HideAgent Terrar
HideAgent T-H-U-N-D-E-R-S-T-O-N-E
HideAgent timboBot
HideAgent TurnitinBot
HideAgent http://www.tutorgig.com/
HideAgent UltraLiberalFeedParser
HideAgent Vagabondo
HideAgent verzamelgids
HideAgent VoilaBot
HideAgent W3C_Validator
HideAgent w3m
HideAgent www.walhello.com
HideAgent www.wapsilon.com
HideAgent WebCapture
HideAgent Webclipping
HideAgent WebFilter
HideAgent WebGather
HideAgent WebGo
HideAgent WebRACE
HideAgent websitealert.net
HideAgent WebStripper
HideAgent WebTV
HideAgent WebZIP
HideAgent WEP
HideAgent Wget
HideAgent Wildgrape
HideAgent WinHttp.WinHttpRequest
HideAgent Xenu
HideAgent Zealbot
HideAgent ZyBorg
I will publish a webalizer.conf file [...]
Be the first to comment
Something has been bugging me about the whole SoBig.F incident and I believe that it has to do more with the self-congratulatory messages from people who eradicated most of it than from the virus itself. In a way, the virus is a clear representation of where things are headed. Back in 2001, I heard about [...]
Be the first to comment
Following up on last week’s entry, I’ve used some of agents I found to refine my search results in webalizer. Based on this, I was able to enhance the search engine results I’m getting. Here are the lines to add to your webalizer.conf file in order for this to work.
SearchEngine 348north.com search=
SearchEngine abcsearch.com terms=
SearchEngine alltheweb.com q=
SearchEngine altavista.com q=
SearchEngine antisearch.net KEYWORDS=
SearchEngine aolsearch query=
SearchEngine ask.com ask=
SearchEngine ask.co.uk ask=
SearchEngine augurnet.ch q=
SearchEngine baidu.com word=
SearchEngine barrahome.org query=
SearchEngine blogdex.net q=
SearchEngine blogdigger.com queryString=
SearchEngine blogosphere.us s=
SearchEngine blogmatrix.com search=
SearchEngine blogwise.com query=
SearchEngine boitho.com query=
SearchEngine buscador.ya.com q=
SearchEngine by.com query=
SearchEngine daypop.com q=
SearchEngine dir.com req=
SearchEngine dmoz.org search=
SearchEngine dogpile.com q=
SearchEngine dpxml qkw=
SearchEngine egoto.com keywords=
SearchEngine elf8888.at query0=
SearchEngine eureka.com q=
SearchEngine excite search=
SearchEngine feedster.com q=
SearchEngine gais.cs.ccu.edu.tw q=
SearchEngine galaxy.com k=
SearchEngine gigablast.com q=
SearchEngine google q=
SearchEngine goo.ne.jp MT=
SearchEngine hotbot.com query=
SearchEngine infoseek.com qt=
SearchEngine ixquick.com query=
SearchEngine kobala.nl qr=
SearchEngine lycos.com query=
SearchEngine look.com q=
SearchEngine looksmart key=
SearchEngine mamma.com query=
SearchEngine metacrawler q=
SearchEngine msn.com q=
SearchEngine msxml qkw=
SearchEngine mysearch.com serachfor=
SearchEngine naver.com query=
SearchEngine netscape.com query=
SearchEngine northernlight.com qr=
SearchEngine ntlworld.com q=
SearchEngine openfind query=
SearchEngine overture.com Keywords=
SearchEngine picsearch.com q=
SearchEngine popdex query=
SearchEngine quepasa.com q=
SearchEngine search.com qt=
SearchEngine searchspider.com q=
SearchEngine search.earthlink q=
SearchEngine suchmaschine21.de search=
SearchEngine syndic8 ShowMatch=
SearchEngine technorati query=
SearchEngine teensearch query=
SearchEngine teoma.com q=
SearchEngine teradex.com q=
SearchEngine texis q=
SearchEngine voila kw=
SearchEngine walhello key=
SearchEngine waypath.com key=
SearchEngine webcrawler searchText=
SearchEngine webfanatic.lunarpages.com q=
SearchEngine whois.sc q=
SearchEngine wisenut.com q=
SearchEngine yahoo p=
Once again, enjoy
Be the first to comment
I’ve been working on cleaning up my webalizer.conf file in order to get better statistics. Since I haven’t seen anyone posting the following information, I figured I would, since it might interest people who are using the Webalizer stats tool. Adding the following lines to your webalizer configuration file (webalizer.conf) will allow you to get [...]
1 Comment
« Prev - Next »