Robots.txt comes into play in a vital role when you want crawler need to visit your site and index the pages which you want. Robots.txt file is an instruction set of text file which is place in the root folder of your site. This instruction guide search engine that which page you want to visit & index and which not. Mainly all major search engines including Google obey the instructions of Robots.txt file.
Robots.txt file have simple structure to maintain, It can contain unlimited user agents and disallowed files & directories. If your website don’t have Robots.txt file then crawler can crawl whole website pages (Good & Bad). It’s not good practice to index pages with broken links. Management of Robots.txt for SEO comes under ON-PAGE SEO.
Let’s start exploring more about Robots.txt file
Basically, the syntax is as follows:
“User-agent” are search engines’ crawlers
“Disallow” lists the files and directories to be excluded from indexing.
“Sitemap” use to define location of XML Sitemap file
Also to combine “user-agent:” and “disallow:” you can add comment lines like keeping # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
In above syntax I explain how Robots.txt tells search engine to crawl all area of website except /temp/ directory.
Let me explain above Robots.txt code in detail
User-agent :- It’s for specifying the bot for which instructions are written, but in our example there is “*” which means these instructions are for all bots.
Disallow :- It’s for specifying which area of website need not to crawl. Like in our example /temp/ directory not to crawl.
Some key benefits of Robots.txt
- You can secure sensitive data of website like in banking segment some pages are only of private use so with Robots.txt they can be secure.
Must to know about this while dealing with Robots.txt file
Check out the blow example
In above example as per above discussion first line tells that all bot are allow to crawl and they can exclude /temp/ & /cgi-bin/ directory as per second and third line. Now forth line tells GoogleBot to crawl website by not touching the directory like /img/ , /css/ , /js/ & /temp/. But’s this is not like this because when Googlebot visit Robots.txt file and found that first line instruct all bot allow to crawl website excluding /temp/ & /cgi-bin/ directory become enough for Googlebot, it will start crawl website as per first three line and ignore rest of the file.
Point to consider as a giveaway about Robots.txt file
- If you’re using subdomain then you have to use separate robots.txt files for each.
- ” * “ and ” $ “ are the two regular expression characters used by Google & Bing for for pattern exclusion.
- You can use only one “Disallow:” line for each URL.
- Generally supported protocols for robots.txt are “http” & “https”.
- Robots.txt file mentions the location of the XML sitemap.
- Robots.txt is a recommended not a mandatory file
Check out Allinformer.com Robots.txt file
I believe that this article cover basics of Robots.txt file. If you like then kindly share with your love ones, your sharing giving me a carriage to share more knowledge & experiences. If you have any query kindly share via comment.