Robots.txt and sitemap are 2 essential things you need to make sure your Technical SEO is not ruining your On-Page efforts.
While most people have heard about sitemaps, robots.txt rarely gets a spotlight among people who’ve just started learning SEO.
In this article, I will break down both concepts and explain how and why you should add a sitemap to your robots.txt file.
Without further ado, let’s get to it!
What is a Robots.txt File?
The Robots.txt file is a text file that should be placed in the root directory of your website.
For example, https://ftdigital.ca/robots.txt
This sitemap file uses a set of instructions to tell search engine robots what pages on your website they should and shouldn’t crawl.
Robots.txt is usually the first place crawlers visit when they come to your website. They do it to understand which parts of your website you want them to skip. You should do it to ensure that your website’s crawling budget is not wasted on admin directories and other CMS folders.
Another use case for robots.txt is closing your website for bots while it’s still in development. Although it might seem like a minor thing to do, this can help your website tremendously after it’s launched.
I’ve personally seen many websites that were crawled before the launch, which led to a lot of 404 errors after the launch and problems with the indexing of relevant pages because these pages were first crawled when they had either no content or duplicate content.
For a new website on a new domain, it’s going to take a while to convince Google search engine to crawl them again and finally index them.
Last but not least, I often see that even when people create a robots.txt file, they often forget to add to it the second vital piece for successful ranking – a sitemap.
What is an XML Sitemap?
An XML sitemap is a file that contains a list of all pages on a website that you want robots to discover.
Just creating a sitemap is never enough. You need to check that it doesn’t contain pages, categories, tag pages and blog posts that shouldn’t be indexed.
If you don’t check it and add all this garbage there, your sitemap will not only not help you but will waste your crawling budget, thus decreasing your chances of ranking pages that really matter.
What Should I Do With a Sitemap?
The process for all of them is pretty much the same:
- Create an account
- Verify domain ownership
- Submit your sitemap’s URL
How Are Robots.txt & Sitemaps Related?
Back in 2006, one of the most important parts of page indexing was a manual XML sitemap submission to all webmaster tools, like Google search console, and it’s still the best practice now.
However, a year later, in 2007, Yahoo, Microsoft and Google agreed to support one system to check for XML sitemaps in robots.txt, called Sitemaps Autodiscovery. Since then, the correct setup of the robots.txt sitemap has become even more important for ranking.
Now that you know what these things are and why they are important, let’s add a sitemap to your robots.txt file.
How To Add XML Sitemap To Your Robots.txt File?
No matter what CMS or hosting you use, these 3 steps are always the same:
#1: Determine Your Sitemap Location
WordPress XML Sitemaps
If your website is built with WordPress, chances are, it also has a Yoast plugin installed. Other popular alternatives are All-in-One SEO and RankMath. For all these plugins, your sitemap index file URL will be: “/sitemap_index.xml.
For example, https://ftdigital.ca/sitemap_index.xml
If you use another CMS, your sitemap’s URL could be the same or located at “/sitemap.xml. Alternatively, your website might not have it at all. If this is your case, you should either search for an SEO plugin that can auto-generate it for you or manually create one yourself.
Keep in mind that if you do it manually, you will have to update it manually every time you add or remove a page, article, blog post, category, tag page, etc.
#2: Access Your Robots.txt File
As I’ve mentioned before, your robots.txt file is always located at [yourdomain.com]/robots.txt. However, when you go to that URL, you won’t be able to edit it there.
To edit it, either access your files through FTP or a File Manager on your hosting. If you’re unsure what both terms mean, contact your hosting company, and they should be able to help you.
#3: Add Sitemap URL To Robots.txt File
Once you have your sitemap’s URL and access to editing a robots.txt file, it’s time to finally add it.
All you need to do is add a line “Sitemap: https://yourdomain.com/sitemap_index.xml” below the “Disallow:” line.
This text containing the sitemap location can be placed anywhere in the robots.txt file. It is independent of the user-agent line.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://ftdigital.ca/sitemap_index.xml
Using a relative URL: “Sitemap: /pages.xml”
User-agent: * Disallow: Sitemap: /pages.xml
Disallowing your main directive: “Disallow: /”
User-agent: * Disallow: / Sitemap: https://example.com/pages.xml Sitemap: https://example.com/posts.xml
What If You Have Multiple Sitemap Files?
If you have more than one sitemap, you can add all of them to your robots.txt file.
Although you can add individual sitemaps, I strongly advise adding only the main one. The reason is that with individual sitemaps for each post type, you will constantly have to add or remove them when you update your website, which is a massive waste of time.
So unless you have a large website with more than 50,000 individual URLs and have to separate them in different sitemaps, I strongly encourage you to add only the main one in your robots.txt file.
Now you know that adding a sitemap index file to your robots.txt helps search engines discover your pages and support your SEO efforts.
If you’re trying to do SEO yourself and find it extremely confusing, contact us for help! We will provide you with a free SEO audit and assessment of your current strategy, as well as give you tips on how to improve it so you can grow your organic traffic, appear higher in search results and get free leads.
Sitemap and robots txt are two main components for successfully indexing your pages. Sitemap is a list of all posts and pages you want crawlers to discover and index, and robots.txt shows which parts of the website should be included and completely excluded from crawling.
The path of a robots.txt file is always yourdomain.com/robots.txt. It is the only location of this file that is acceptable for search engines. If you can’t find your robots.txt file there, it means that it’s either set up incorrectly or has never been created.
To get a robots.txt file on any website, go to domain.com/robots.txt. Keep in mind that this file will not be available for editing there. If you want to create, edit or remove this file, you should do it through FTP or a File Manager of your hosting.
Robots.txt file doesn’t have to have a sitemap. However, it is very beneficial to have one because it helps search engines understand its location.