In this post I will explain what is a robots.txt file and how to manage it inside of your WordPress site.
What is a robot?
We’re not talking about R2-D2 or Terminator here! A robot, also called a crawler or a spider, is a software designed to scan all the web pages it can find so that they can be indexed in the search engines. Every search engine has its own robot. The Google robot (bot for short) is called Googlebot for example. Here you will find a big list of robots, with the search engine to where they belong: Search engine robots
What is robots.txt?
Robots.txt is a text file placed in (almost) every website. In fact, it’s the first thing the robot “see” when visiting the site. In this file are instructions to tell the robot what to do or what not to do with the pages it is going to visit.
How do I know if I have a robots.txt file on my site?
Simply type your url in your web browser, followed by “/robots.txt”. Example: www.mysite.com/robots.txt. You will then see what’s inside of your robots.txt file. If your site is made using WordPress, you certainly have a robots.txt file.
What’s the use of having a robots.txt on my site?
Basically, you will use the robots.txt file to exclude some web crawlers from visiting your site or if you want some of your pages not being indexed (for example, the admin pages. By default, the robots.txt file in WordPress prevents the wp-admin directory from being indexed).
What exactly do we find in the robots.txt file?
There are 2 main things in a robots.txt file:
- User-agent: to give specific instructions to one robot in particular
- Allow or Disallow: allow or disallow certain pages from being indexed
Often, a robots.txt file will look like this:
In this case, all pages can be indexed, by all robots.
If you don’t want your site to be indexed at all:
It can be useful to do that if your site is still in development and you don’t want it to be found by the search engines for the moment.
I want to exclude certain robots from indexing my site
In this case, the Google robot won’t index your pages.
I want to exclude certain pages or certain folders from being indexed
I don’t want my photos folder to be indexed:
I don’t want a specific page to be indexed:
Do I really need a robots.txt file?
You don’t need a robots.txt file if you want all your pages to be indexed. But one thing is important to mention: if the robot don’t see a robots.txt file, it will have full access to your site. It’s the way it is programmed. So if you want to protect certain pages, you must have a robots.txt file.
Access the robots.txt file in WordPress
You can easily access the robots.txt file by using a SEO plugin like All-in-One SEO or Yoast SEO. The example I give here is by using Yoast SEO.
- Go to Yoast SEO menu and select Tools
- Select File Editor
- Here’s what your robots.txt will look like. You can modify it and save it by clicking Save changes to Robots.txt
A basic robots.txt file for WordPress
Here is a basic robots.txt file that you can use for your WordPress site:
This file protects the login page (wp-login.php) and the administration directory (wp-admin).
You can find a lot of robots.txt examples for your site. But before making any modification in your robots.txt file, be aware of the consequences of deindexing certain pages. Think twice!
Did you find this article useful? Have you examples of robots.txt files you use for your site? Tell us in the comments!