What are robots.txt files? Featuring 15 of our favourites
At MCM Net we like nerdy website stuff. Here is a prime example.
Robots.txt files are an important (if not very sexy) part of SEO, but that doesn’t mean that you can’t have fun with them.
Robots.txt files are there to prevent the crawling and indexing of particular areas of your website by web crawlers and spiders (internet bots that systematically browse websites, typically for indexing purposes) from sites such as Google and Yahoo.
By telling the internet bots where not to go, you can prevent certain pages from appearing in search engine results. You also save bandwidth and server resources, to help speed up your website. This is especially useful for larger sites and those with a lot of traffic, or ones on slow servers.
Just as the robots from famed author Isaac Asimov’s universe are governed by the Three Laws of Robotics, internet robots have their own rules to abide by, known as The Robots Exclusion Protocol. Before a bot visits your website, they must first check your robots.txt file.
For a run-of-the-mill example of a robots.txt file, you can view our very own http://www.mcmnet.co.uk/robots.txt.
Our website’s functionality is straightforward, without many of the types of pages that Google recommends you disallow from being indexed (such as login areas and search results). In our case, we just ask bots not to crawl our admin pages.
User-agent: * means that the instruction applies to all robots. Some websites have defined rules for different robots (for example see http://www.bbc.co.uk/robots.txt).
Websites built with plugin architecture such as WordPress, have the option of simply installing an appropriate plugin that will generate a file for you. Otherwise you need to ask your developer if you need to create or change a robots.txt. You can find lots more information by visiting http://www.robotstxt.org/robotstxt.html.
Now for the fun part. Some websites like to get creative with their robots.txt files and we’ve shared our favourites below. Anything after a hashtag in a robots.txt is ignored by search engines, which is why websites can get away with website Easter eggs such as these.
Just crawl it – Nike
Not content with including a pun on their own slogan at the top of the file, Nike reward you for scrolling all the way to the bottom with a not-so-subtle ASCII art of their logo.
Job posting for SEOs by TripAdvisor, White & Seer Interactive
Here are three examples of companies using their robots.txt to hide job adverts. The idea being that, if a human is reading your file, they might be the right fit for an SEO role. Or they just read a blog post on the subject.
I, Robots.txt featuring Yelp & last.fm
We mentioned Asimov earlier and here is Yelp’s web team using their robots.txt to pay tribute to the sci-fi author. While, last.fm use a more understated version of the same Asimovian concept.
Friendly robot messages from Wikipedia and Etsy
Wikipedia’s file is full of helpful, thoughtful comments such as this. While Etsy hide a cute, little character at the bottom of their very long robots.txt.
There is a chilling, dystopian future (past?) relayed through YouTube’s robots.txt file.
Well-known for their hidden messages and inside jokes, Google do justice to the idea of robots.txt files being for actual (well, fictional) robots with this nod to the Terminator films.
Reddit have left instructions for the lead robots from science fiction masterpiece The Day the Earth Stood Still and legendary TV show Futurama. Whereas Tindeck’s homage to Bender from Futurama is slightly more over-the-top.
Flogging Fuchsias with Arena Flowers
Who can resist a DeLorean DMC-12? Arena Flowers know the target market for their Back to the Fuchsia bouquet and make the most of the valuable advertising platform a robots.txt file can provide. And maybe internet bots need to buy flowers sometimes too.
A Website Inside My Robots.txt
The heading says it all. This really is a step up from the previous examples. Vinna used this concept to put a game inside theirs.
Any we’ve missed? Get in touch @mcmnet.