Managing robots.txt in WordPress

, , 4 Comments

In this post I will explain what is a robots.txt file and how to manage it inside of your WordPress site.

What is a robot?

We’re not talking about R2-D2 or Terminator here! A robot, also called a crawler or a spider, is a software designed to scan all the web pages it can find so that they can be indexed in the search engines. Every search engine has its own robot. The Google robot (bot for short) is called Googlebot for example. Here you will find a big list of robots, with the search engine to where they belong: Search engine robots

What is robots.txt?

Robots.txt is a text file placed in (almost) every website. In fact, it’s the first thing the robot “see” when visiting the site. In this file are instructions to tell the robot what to do or what not to do with the pages it is going to visit.

How do I know if I have a robots.txt file on my site?

Simply type your url in your web browser, followed by “/robots.txt”. Example: www.mysite.com/robots.txt. You will then see what’s inside of your robots.txt file. If your site is made using WordPress, you certainly have a robots.txt file.

What’s the use of having a robots.txt on my site?

Basically, you will use the robots.txt file to exclude some web crawlers from visiting your site or if you want some of your pages not being indexed (for example, the admin pages. By default, the robots.txt file in WordPress prevents the wp-admin directory from being indexed).

What exactly do we find in the robots.txt file?

There are 2 main things in a robots.txt file:

  • User-agent: to give specific instructions to one robot in particular
  • Allow or Disallow: allow or disallow certain pages from being indexed

Often, a robots.txt file will look like this:

User-agent: *

Disallow:

In this case, all pages can be indexed, by all robots.

If you don’t want your site to be indexed at all:

User-agent: *

Disallow: /

It can be useful to do that if your site is still in development and you don’t want it to be found by the search engines for the moment.

I want to exclude certain robots from indexing my site

User-Agent: Googlebot

Disallow: /

In this case, the Google robot won’t index your pages.

I want to exclude certain pages or certain folders from being indexed

I don’t want my photos folder to be indexed:

User-Agent: *

Disallow: /photos/

I don’t want a specific page to be indexed:

User-Agent: *

Disallow: /mypage.html

Do I really need a robots.txt file?

You don’t need a robots.txt file if you want all your pages to be indexed. But one thing is important to mention: if the robot don’t see a robots.txt file, it will have full access to your site. It’s the way it is programmed. So if you want to protect certain pages, you must have a robots.txt file.

Access the robots.txt file in WordPress

You can easily access the robots.txt file by using a SEO plugin like All-in-One SEO or Yoast SEO. The example I give here is by using Yoast SEO.

    • Go to Yoast SEO menu and select Tools
Yoast SEO - Tools
Yoast SEO – Tools

 

 

 

 

 

 

 

 

 

    • Select File Editor
Yoast SEO - File Editor
Yoast SEO – File Editor

 

 

 

 

 

 

 

 

  • Here’s what your robots.txt will look like. You can modify it and save it by clicking Save changes to Robots.txt
Robots.txt In Yoast SEO
Robots.txt In Yoast SEO

A basic robots.txt file for WordPress

Here is a basic robots.txt file that you can use for your WordPress site:

User-agent: *
Disallow: /wp-login.php
Disallow: /wp-admin

This file protects the login page (wp-login.php) and the administration directory (wp-admin).

You can find a lot of robots.txt examples for your site. But before making any modification in your robots.txt file, be aware of the consequences of deindexing certain pages. Think twice!

Did you find this article useful? Have you examples of robots.txt files you use for your site? Tell us in the comments!

 

4 Responses

  1. jazzy323

    November 12, 2015 1:19 pm

    internet marketing is definitely full of adventures, so i like the direction of your site and the title. i never actually knew of the robots txt and the significance of having it, but in internet marketing you learn something new every day. well done on the article as it is helpful and layed out great

    Reply
  2. Xander

    November 12, 2015 1:58 pm

    I have used the robots.txt file before for privacy policies and so forth, never really knew what it was for, was just instructed to do so. Thanks for the in depth explanation and keeping it simple enough for me to understand. Internet marketing has definitely come a long way and I love to learn new things to stay ahead.

    Reply

Leave a Reply