what-are-robots | contact lists

The commands in robots.txt work similarly to HTML and the various programming languages on the market.

There are commands that robots will follow to navigate and find pages on your site.

Here are some of the main commands in the robots.txt file:

User-Agent Command

You can enter specific commands for each new zealand telegram data crawler on the market in your robots.txt file using the User-agent command to determine which crawler it refers to.

To find out the name of each User-agent, you can consult the Web Robots database , which lists the robots of the main search engines on the market.

Google’s main search robot is Googlebot .

If you wanted to give it specific commands, the command you entered into your robots.txt would be this:

User-agent: Googlebot

If you wanted to leave specific commands for the Bing crawler, the command would be this:

User-agent: Bingbot

As you can see, you simply have to change the name of the User-agent.

And if you want to enter the general direction that all search robots should follow, simply replace the User-agent name with an asterisk. It would look like this:

User-agent: *

Disallow Command

The Disallow command is responsible for show facebook who’s boss. or what campaign metrics to monitor describing which directory pages or websites should not be included in search results.

Just like the User-agent command, simply enter the page address after the command.

To direct robots not to access the “beta.php” page on your site, the command would be this:

Disallow: /beta.php

You can still prevent access to specific folders.

If you needed to block access to the “files” folder, the command would be this:

Disallow: /files/

You can also block access to content that begins with a specific letter.

To block access to all folders and files beginning with the letter “a”, this would be the command:

Disallow: / a

Allow Command

The Allow command allows you to determine cob directory for search robots which pages or directories on your site you want to index .

By default, all pages on your site will be indexed, except when you use the Allow command.

Therefore, it is recommended to use the Allow command only when you need to block a folder or directory via the Allow command, but you would like to have indexed a specific file or folder that is inside the blocked directory.

If you want to block access to the “files” folder but need to allow access to the “products.php” page, the command would look like this:

Disallow: /files/

Allow: /files/products.php

If you want to block access to the “files” folder but need to allow access to the “projects” folder, the command would be like this:

Disallow: /files/

Allow: /files/projects/

Sitemap Command

Another useful command for a robots.txt file is to specify your page’s Sitemap , which is very useful for helping search engine robots identify all the pages on your site.

However, it’s a command that has fallen into deprecation, mainly due to Google Webmasters Tools , which allows you to quickly report the location of your sitemap file and other features.

To enter your Sitemap address, you must have saved your Sitemap file in the root folder of your site. The command to enter this address into your site is this:

Sitemap: https://tusitioweb.com.mx/sitemap.xml

If your domain is from another country, you must replace the Mexican national web code with your own, such as .ar or .co.

What are the limitations of robots.txt?

While useful for directing search engine access to your page, it’s important to recognize that robots.txt has some limitations.

Knowing them is essential, especially to identify the need to use other devices so that your URLs are not easily found in searches.

What are robots.txt commands?

User-Agent Command

Disallow Command

Allow Command

Sitemap Command

What are the limitations of robots.txt?

Leave a Comment Cancel Reply