What is robots.txt in WordPress? The Complete Guide

As an experienced WordPress webmaster, I often get asked – what is robots.txt and why is it important for my site? This comprehensive guide will explain everything you need to know about crafting the perfect robots.txt file.

What is robots.txt?

Robots.txt is a simple text file that gives instructions to automated bots that crawl the web about your site‘s indexing preferences. It tells them which pages and files not to request from your server.

The main bots that use robots.txt are search engine crawlers like:

  • Googlebot from Google
  • Bingbot from Microsoft Bing
  • Slurp from Yahoo

Before indexing your site, these bots will first check for a robots.txt file in your root domain folder. If found, they will read the file and obey the provided directives.

How do the directives work?

The robots.txt file contains disallow and allow directives for specifying what to block or permit. For example:

User-agent: Googlebot
Disallow: /private-page/

This tells Googlebot specifically not to crawl or index the /private-page/ path.

Other examples include:

User-agent: *  # applies to all bots
Allow: /public-pages/

User-agent: Bingbot
Crawl-delay: 10 # set crawl rate limit

So robots.txt gives granular control over what gets indexed by providing a set of rules for the bots to follow.

Why is robots.txt important?

Having a proper robots.txt file is crucial for:

  • Efficiency – Prevents wasting crawler resources on unimportant pages

  • Privacy – Blocks indexing of sensitive areas like login pages

  • Quality – Avoids indexing duplicate, thin, or low-value pages

  • Focus – Guides search bots to your best content

According to Moz, over 50% of the top 1 million websites now use robots.txt. And most SEO experts recommend having one.

Stats on robots.txt usage

  • Over 50% of the Alexa top 1 million sites use robots.txt
  • Average robots.txt size is 1.7 kb
  • Most popular directives are User-agent, Disallow, and Sitemap
  • Can reduce crawl requests by over 60% if configured properly

So implementing a optimized robots.txt file is considered a search engine optimization best practice. Let‘s look at specifics for WordPress sites.

robots.txt on WordPress

For WordPress, some common things to include in your robots.txt are:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/uploads/

This prevents indexing of:

  • wp-admin/ – Blocks the WordPress login and dashboard.
  • wp-includes/ – Contains internal WP system files.
  • Plugins directory – Prevents indexing inactive plugins.
  • Uploads folder – Images and media files are usually linked to from posts.

You can then allow selective parts of uploads:

Allow: /wp-content/uploads/pdf/
Allow: /wp-content/uploads/brochures/

Other common rules include:

  • Disallow author, date, tag and category archives
  • Disallow paginated archive pages beyond the first
  • Set crawl-delay to limit indexing rate

Pro Tip: Always test your robots.txt using Google‘s robots.txt tester before deploying to check for errors.

Common robots.txt mistakes

Some common mistakes to avoid:

  • Blocking the entire website by disallowing ‘/‘
  • Forgetting separate robots.txt for subdomains
  • Conflicting directives like Allow and Disallow for same path
  • Blocking important scripts like /robots.txt or /sitemap.xml
  • Invalid directives Google can‘t understand

I recommend hand-crafting your robots.txt at first and then using a plugin like Yoast SEO to maintain it. The plugins can misconfigure robots.txt sometimes.

Advanced uses of robots.txt

Beyond basic indexing directives, you can also use robots.txt for:

  • Sitemap – Tell bots where your XML sitemap is located
  • Crawl Delay – Slow down bot crawl rate to reduce server load
  • User-agent order – Prioritize directives for select bots
  • Wildcards – Block files by pattern, like .jpg or .pdf

And while robots.txt controls crawling, meta robots noindex tags on individual pages override it for indexing. I recommend using both for maximum control of what search engines see.

Conclusion

Crafting the optimal robots.txt is crucial for every WordPress site. It improves SEO by focusing bots on your best content while avoiding pitfalls. I hope this guide covered everything you need to know to implement an effective robots.txt strategy on your WordPress site. Let me know if you have any other questions!

Written by Jason Striegel

C/C++, Java, Python, Linux developer for 18 years, A-Tech enthusiast love to share some useful tech hacks.