Part two of our article about "robots.txt best practices guide + examples" is about setting up your newly created robots.txt file.
If you are not sure how to create your own robots.txt file or if you are not sure what you are, go to our first part of this article series, "robots.txt best practice guide + examples" where you are listed are to learn the ins and outs of what a robots.txt file is and how you can set it up correctly. Even if you have been in the SEO game for a while, the article offers a great refresher course.
Add a robots.txt file to your site
A robots.txt file is usually stored in the root of your website so that it can be found. For example, if your site was https://www.mysite.com, your robots.txt file would be found here: https://www.mysite.com/robots.txt. By placing the file at the root or the root of your site, you can manage the crawling of all URLs under the domain https://www.mysite.com.
It is also important to know that a robots.txt is case sensitive, so make sure you delete the & # 39; robots.txt & # 39; and not something like robots.txt, ROBOTS.TXT, robotsTXT or any other variation with capital letters.
Why a robots.txt file is important
A robots.txt is just an ordinary text file, but that "regular" text file is extremely important because it is used to let search engines know exactly where they can and cannot get to your site. That is why it is an extremely important part of your website.
After you've added your brand new robots.txt file to your site or just make updates to your current robots.txt file, it's important to test it to make sure it works the way you want.
Although there are many sites and different tools that you can use to test your robots.txt file, you can still use the Google robots.txt file tester in the old version of Search Console. Log in to your site's Search Console, scroll down to the bottom of the page and click → Go to Old Version
Then click Crawl → robots.txt tester
From here you can test your site's robots.txt file by adding the code from your file to the box and then clicking the "test" button.
If all goes well, the red test button should now be green and must have switched to & # 39; Allowed & # 39 ;. Once that happens, it means that your newly created or modified robots.txt file is valid. You can now upload your robots.txt file to the root of your site.
Google updates for robots.txt file standards effective from September 1
Google recently announced that there will be changes to how Google understands some of the unsupported guidelines in your robots.txt file.
From 1 September, Google will stop supporting unsupported and unpublished rules in the robot exclusion protocol. That means that Google no longer supports robots.txt files with the noindex directive in the file.
If you have used the noindex directive in your robots.txt file to control crawling in the past, there are a number of alternative options that you can use:
No-index in robot meta tags: both tags are supported in both the HTTP response heads and in HTML. However, the noindex guideline is the most effective way to remove URLs from the index when crawling is allowed.
404 and 410 HTTP status codes
Both status codes mean that the page does not exist, so URLs returning this code will be removed from the Google index once they have been crawled and processed.
Adding password protection is a great way to prevent Google from fully seeing and crawling pages on your site or your site (think of a dev version of the site). If you hide a page behind a login, it will be generally removed from the Google index because they are unable to enter the required information to continue to see what is behind the login. You can use the Subscription and paywalled content format for that type of content, but that is a completely different topic for another time.
Don't allow in robots.txt
Search engines can only index pages that they know (can find and crawl), so by blocking the page or pages from not being crawled, this usually means that the content is not indexed. It is important to remember that Google may still find and index those pages by other pages that link to them.
Delete Search Console URL tool
The search console removal tool offers you a quick and easy way to temporarily remove a URL from Google search results. We say temporarily because this option is only valid for around 90 days. After that, your url can reappear in the Google search results.
To make your removal permanent, you must follow the steps above
- Block access to content (password required)
- Add a noindex meta tag
- Create a 404 or 410 http status code
Making small adjustments can sometimes have major consequences for your SEO sites and using a robots.txt file is one of those adjustments that can make a significant difference.
Remember that your robots.txt file must be uploaded to the root of your site and & # 39; robots.txt & # 39; must be mentioned to find it. This small text file is a must for any website and adding a robots.txt file to the root of your site is a very simple process
I hope this article has helped you learn how to add a robots.txt file to your site, as well as the importance of having a file. If you want to know more about robots.txt files and you have not yet done so, you can read part of this article series: "robots.txt best practice guide + examples".
What is your experience creating robots.txt files?
Michael McManus is Earned Media (SEO) Practice Lead at iProspect.
A PWA is a mobile-friendly website that behaves like an app, but does not have to be downloaded in order to be used. Case studies & # 39; s from Starbucks and Forbes included.
What is considered as duplicate content? What steps can you take to ensure that this does not impede your SEO efforts? Urgent SEO questions answered.
If you care about where and how you appear on search engines, Google Search Console and its updates are of great importance to you.
In part three, we learn how to automatically group pages using machine learning to restore SEO site traffic with Python.