This is the first of a new column I’m calling SEO Legends. This is the place to dust off some of the “back-n-the-day” stories that are often told around the fireplace at Pubcon.
So lets get started and kick off this series with a confession of sorts – I spiked a huge portion of the internets robots.txt files … hear me out.
WebmasterWorld hosted a very active Apache web server forum, diving deep into every nuance of web servers. One topic that constantly surfaced was robots.txt entries and their syntax.
Given the forum’s reputation, it wasn’t long before we became a magnet for every bot in the SEO world. After all, who has bots? Yes, SEOs do! We dealt with far more than our fair share. The onslaught was so relentless — I’d spend hours every week combing through logs, banning IPs, and shutting down rogue bots. Our robots.txt file grew massive, packed with entries specifically targeting these pests. It became a full-scale effort just to keep the unwanted crawlers at bay. Even at a million page views a day, the bots could do double that.
Like any good SEO, I would run my own crawler against WebmasterWorld looking for 404s and issues. I used a unique robot name “RepoMonkey Bait & Tackle/v1.01” with that bot so that I could ignore those entries in the log files. (Lord only knows where that name came from). So I was a little surprised when I’d Googled the robot name and end up finding my robots.txt copied and in use on dozens of sites around the web. Many of them still had my copyright notice in them.
I’d been called out in forums and other sites when I would robots.txt ban some SEO services bot and not ban their number one competitor. I had to go so far as to cloak my robots.txt so that the public couldn’t see the real one and only give the real one to legit search engines. That worked until Mr. Cutts mentioned it was still cloaking Brett. meh.
While going through all that, I realized how much traffic was going to my robots.txt from people just looking at it and copying it. For awhile our robots.txt was one of our highest linked pages – go figure. That is when the idea of putting a blog in my robots.txt to take advantage of that traffic. And oh my, what targeted traffic it was! The blog got so much traffic, earned so many backlinks, even got a Dmoz entry, and linked to by big tech blogs – that it had to of contribute to the viral nature of that robots.txt.
I’m really not sure why I was so amused by this interest in our lowly robots.txt file, that I started making up fake bot names and entering them into the robots.txt. Long-story-short, there are thousands of copies of my viral robots.txt file still in use around the web. Best guess is over 50,000 robots.txt with “RepoMonkey Bait & Tackle/v1.01” in them.
Fake Robots
So lets – for better or worse (gulp, ahem, I was much younger back then – don’t judge 😉 look at some of these fake entries and see them live on the web today:
User-agent: BackDoorBot/1.0 User-agent: Black Hole User-agent: BlowFish/1.0 User-agent: BotALot User-agent: BuiltBotTough User-agent: Bullseye/1.0 User-agent: BunnySlippers User-agent: CheeseBot User-agent: CherryPicker User-agent: CherryPickerElite/1.0 User-agent: Flaming AttackBot User-agent: Foobot User-agent: LinkWalker User-agent: ProPowerBot/2.14 User-agent: ProWebWalker User-agent: RepoMonkey User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: SiteSnagger User-agent: SpankBot User-agent: TightTwatBot
Who is still using it? Here is Mastercards robots.txt : https://www.mastercard.com/robots.txt, and who knows how many more are out there. (100k?)
I do believe that some of those names were later picked up and used by actual bots. If you look at some of the sites like Darkvisitors.com, you see it is listed as a real bot.
This list over here, even has this site listed in.
All-in-all, some times I just have to laugh at the internet of SEOs – ya’ll are crazy.
As the CEO and founder of Pubcon Inc., Brett Tabke has been instrumental in shaping the landscape of online marketing and search engine optimization. His journey in the computer industry has spanned over three decades and has made him a pioneering force behind digital evolution. Full Bio
Visit Pubcon.com