Earlier this week, Google eliminated its Robots.txt FAQ assist doc from its search developer documentation. When requested, John Mueller from Google replied to Alexis Rylko saying, “We replace the documentation once in a while. Be at liberty to submit suggestions in case you really feel one thing’s lacking. Robots.txt is unquestionably nonetheless a factor.”
The Robots FAQ doc lived over right here: builders.google.com/search/docs/crawling-indexing/robots/robots-faq
That now redirects to the principle Google robots.txt assist web page.
What did the Robots FAQ web page say, effectively the Wayback Machine has a duplicate, so I’ll archive it right here:
(Q) Does my web site want a robots.txt file?
(A) No. When Googlebot visits an internet site, we first ask for permission to crawl by making an attempt to retrieve the robots.txt file. An internet site with no robots.txt file, robots meta tag, or X-Robots-Tag HTTP headers will usually be crawled and listed usually.
(Q) Which methodology ought to I exploit to dam crawlers?
(A) It relies upon. Briefly, there are good causes to make use of every of those strategies:
- robots.txt: Use it if crawling of your content material is inflicting points in your server. For instance, you could wish to disallow crawling of infinite calendar scripts. Do not use the robots.txt to dam non-public content material (use server-side authentication as a substitute), or deal with canonicalization. To guarantee that a URL isn’t listed, use the robots meta tag or X-Robots-Tag HTTP header as a substitute.
- robots meta tag: Use it if you want to management how a person HTML web page is proven in search outcomes or to guarantee that it is not proven.
- X-Robots-Tag HTTP header: Use it if you want to management how content material is proven in search outcomes or to guarantee that it is not proven.
(Q) Can I exploit robots.txt, robots meta tag, or the X-Robots-Tag HTTP header to take away another person’s website from search outcomes?
(A) No. These strategies are solely relevant to websites the place you may modify the code or add information. Be taught extra about take away data from Google.
(Q) How can I decelerate Google’s crawling of my web site?
(A) You possibly can usually regulate the crawl fee setting in your Google Search Console account.
(Q) I exploit the identical robots.txt for a number of web sites. Can I exploit a full URL as a substitute of a relative path?
(A) No. The foundations within the robots.txt file (with exception of sitemap:) are solely legitimate for relative paths.
(Q) Can I place the robots.txt file in a subdirectory?
(A) No. The file have to be positioned within the topmost listing of the web site.
(Q) I wish to block a personal folder. Can I stop different folks from studying my robots.txt file?
(A) No. The robots.txt file could also be learn by numerous customers. If folders or filenames of content material aren’t meant for the general public, do not listing them within the robots.txt file. It’s not advisable to serve completely different robots.txt information primarily based on the person agent or different attributes.
(Q) Do I’ve to incorporate an enable rule to permit crawling?
(A) No, you don’t want to incorporate an enable rule. All URLs are implicitly allowed and the enable rule is used to override disallow guidelines in the identical robots.txt file.
(Q) What occurs if I’ve a mistake in my robots.txt file or use an unsupported rule?
(A) Internet crawlers are usually very versatile and sometimes won’t be swayed by minor errors within the robots.txt file. Typically, the worst that may occur is that incorrect or unsupported guidelines might be ignored. Keep in mind although that Google cannot learn minds when decoding a robots.txt file; we’ve got to interpret the robots.txt file we fetched. That mentioned, if you’re conscious of issues in your robots.txt file, they’re often simple to repair.
(Q) What program ought to I exploit to create a robots.txt file?
(A) You should use something that creates a legitimate textual content file. Widespread applications used to create robots.txt information are Notepad, TextEdit, vi, or emacs. Learn extra about creating robots.txt information. After creating your file, validate it utilizing the robots.txt Tester.
(Q) If I block Google from crawling a web page utilizing a robots.txt disallow rule, will it disappear from search outcomes?
(A) Blocking Google from crawling a web page is more likely to take away the web page from Google’s index.
Nevertheless, robots.txt disallow doesn’t assure {that a} web page won’t seem in outcomes: Google should resolve, primarily based on exterior data akin to incoming hyperlinks, that it’s related and present the URL within the outcomes. When you want to explicitly block a web page from being listed, use the noindex robots meta tag or X-Robots-Tag HTTP header. On this case, do not disallow the web page in robots.txt, as a result of the web page have to be crawled to ensure that the tag to be seen and obeyed. Discover ways to management what you share with Google
(Q) How lengthy will it take for modifications in my robots.txt file to have an effect on my search outcomes?
(A) First, the cache of the robots.txt file have to be refreshed (we usually cache the contents for as much as in the future). You possibly can pace up this course of by submitting your up to date robots.txt to Google. Even after discovering the change, crawling and indexing is an advanced course of that may generally take fairly a while for particular person URLs, so it is unattainable to provide a precise timeline. Additionally, remember the fact that even when your robots.txt file is disallowing entry to a URL, that URL could stay seen in search outcomes regardless of that indisputable fact that we won’t crawl it. When you want to expedite removing of the pages you have blocked from Google, submit a removing request.
(Q) How can I quickly droop all crawling of my web site?
(A) You possibly can quickly droop all crawling by returning a 503 (service unavailable) HTTP standing code for all URLs, together with the robots.txt file. The robots.txt file might be retried periodically till it may be accessed once more. We don’t advocate altering your robots.txt file to disallow crawling.
(Q) My server isn’t case-sensitive. How can I disallow crawling of some folders fully?
(A) Guidelines within the robots.txt file are case-sensitive. On this case, it is suggested to guarantee that just one model of the URL is listed utilizing canonicalization strategies. Doing this lets you have fewer strains in your robots.txt file, so it is simpler so that you can handle it. If this is not doable, we advisable that you just listing the frequent combos of the folder identify, or to shorten it as a lot as doable, utilizing solely the primary few characters as a substitute of the total identify. As an example, as a substitute of itemizing all higher and lower-case permutations of /MyPrivateFolder, you possibly can listing the permutations of /MyP (if you’re sure that no different, crawlable URLs exist with these first characters). Alternately, it might make sense to make use of a robots meta tag or X-Robots-Tag HTTP header as a substitute, if crawling isn’t a problem.
(Q) I return 403 Forbidden for all URLs, together with the robots.txt file. Why is the location nonetheless being crawled?
(A) The 403 Forbidden HTTP standing code, in addition to different 4xx HTTP standing codes, is interpreted because the robots.txt file does not exist. Which means that crawlers will usually assume that they’ll crawl all URLs of the web site. So as to block crawling of the web site, the robots.txt have to be returned with a 200 OK HTTP standing code, and should comprise an acceptable disallow rule.
(Q) Is the robots meta tag a substitute for the robots.txt file?
(A) No. The robots.txt file controls which pages are accessed. The robots meta tag controls whether or not a web page is listed, however to see this tag the web page must be crawled. If crawling a web page is problematic (for instance, if the web page causes a excessive load on the server), use the robots.txt file. If it’s only a matter of whether or not or not a web page is proven in search outcomes, you should utilize the robots meta tag.
(Q) Can the robots meta tag be used to dam part of a web page from being listed?
(A) No, the robots meta tag is a page-level setting.
(Q) Can I exploit the robots meta tag outdoors of a
part?
(A) No, the robots meta tag must be within the
part of a web page.
(Q) Does the robots meta tag disallow crawling?
(A) No. Even when the robots meta tag at present says noindex, we’ll have to recrawl that URL sometimes to test if the meta tag has modified.
(Q) How does the nofollow robots meta tag examine to the rel=”nofollow” hyperlink attribute?
(A) The nofollow robots meta tag applies to all hyperlinks on a web page. The rel=”nofollow” hyperlink attribute solely applies to particular hyperlinks on a web page. For extra data on the rel=”nofollow” hyperlink attribute, see our documentation on user-generated spam and the rel=”nofollow”.
(Q) How can I test the X-Robots-Tag for a URL?
(A) A easy approach to view the server headers is to make use of the URL Inspection Software function in Google Search Console. To test the response headers of any URL, strive looking for “server header checker”.
Google has discontinued the English model of the Robots FAQs (https://t.co/AJoGFD7if8), now redirecting to “Intro”. Variations in different languages stay unaffected right now. Personally, I discovered the unique web page to be helpful.
Are we on the trail to leaving robots.txt? #search engine optimization pic.twitter.com/ThgBl9K3Zi
— Alexis Rylko 🇺🇦 (@alekseo) November 22, 2023
We replace the documentation once in a while. Be at liberty to submit suggestions in case you really feel one thing’s lacking. Robots.txt is unquestionably nonetheless a factor.
— I’m John – ⭐ ⭐ SRS BZNS ONLY ⭐ ⭐ (@JohnMu) November 22, 2023
I assume perhaps Google thinks it’s redundant from what’s already printed on the opposite pages?
Discussion board dialogue at X.