Home · Articles · Robots · Articles

articles

Robots

A photo of BB-8, a character from Star Wars: The Force Awakens

I'm not talking about BB-8 or any other kind of robot that you've seen in movies. I'm talking about the kind of robots (called spiders) which crawl all around the internet gathering data.

Even Google has a very popular robot, called Googlebot. But did you know that you can send commands to these robots?


You can direct them to do, or not do certain things. If you upload a robots.txt file to your website, you have a measure of control over what these bots can do. For example:

User-agent: *
Disallow: 

Sitemap: http://www.spektredesign.net/sitemap.xml

The above rule tells ALL incoming robots that they can crawl every page on your website. The asterisk means "all robots", but you can specifically identify a robot by name too (User-agent: googlebot) This code also tells them where the XML sitemap is located (which is important for SEO).

User-agent: *
Disallow: /cgi-bin/

This example tells all robots that they can crawl every page except for the "cgi-bin" folder on your website.

User-agent: *
Disallow: /images/
Allow: /images/pinkball.gif

And this one tells all robots "Don't crawl the images folder, but you can only crawl the pinkball.gif file in the images folder."


Sat Oct 10th, 2015

This article hasn't been commented yet.

Write a comment

8 + 5 =