Disable search engine indexing

Prevent search engines from indexing pages, folders, your entire site, or just your webflow.io subdomain.

site-settings
This video features an old UI. Updated version coming soon!

You can tell search engines which pages to crawl by writing a robots.txt file. You can also prevent search engines from crawling and indexing specific pages, folders, your entire site, or your webflow.io subdomain. This is useful for hiding pages like your site’s 404 page from being indexed and listed in search results.

Important: Content from your site may still be indexed, even if it hasn’t been crawled. That happens when a search engine knows about your content either because it was published previously, or there’s a link to that content from other content online. To ensure that a previously indexed page is not indexed, don’t add it in the robots.txt. Instead, use the noindex meta code to remove that content from Google’s index.

In this lesson: 

  1. How to disable indexing of the Webflow subdomain
  2. How to generate a robots.txt file
  3. Best practices for privacy
  4. FAQ and troubleshooting tips

How to disable indexing of the Webflow subdomain 

You can prevent Google and other search engines from indexing your site’s webflow.io subdomain by disabling indexing from your Site settings.

  1. Go to Site settings > SEO tab > Indexing section
  2. Set Disable Webflow subdomain indexing to “Yes” 
  3. Click Save changes and publish your site

This will publish a unique robots.txt only on the subdomain, telling search engines to ignore this domain. 

Note: You’ll need a Site plan or paid Workspace to disable search engine indexing of the Webflow subdomain. Learn more about Site and Workspace plans.
Disable Webflow subdomain indexing is set to YES. This section and the “Save changes” button are highlighted.

How to generate a robots.txt file 

The robots.txt is usually used to list the URLs on a site that you don't want search engines to crawl. You can also include the sitemap of your site in your robots.txt file to tell search engine crawlers which content they should crawl.

Just like a sitemap, the robots.txt file lives in the top-level directory of your domain. Webflow will generate the /robots.txt file for your site once you create it in your Site settings.

To create a robots.txt file:

  1. Go to Site settings > SEO tab > Indexing section
  2. Add the robots.txt rule(s) you want
  3. Click Save changes and publish your site
Important: Content from your site may still be indexed, even if it hasn’t been crawled. That happens when a search engine knows about your content either because it was published previously, or there’s a link to that content from other content online. To ensure that a previously indexed page is not indexed, don’t add it in the robots.txt. Instead, use the noindex meta code to remove that content from Google’s index.
A robots.txt rule “User-agent:*”, line break, “Disallow: /” in the robots.txt field is highlighted, along with the “Save changes” button.

Robots.txt rules

You can use any of these rules to populate the robots.txt file.

  • User-agent: * means this section applies to all robots.
  • Disallow:   tells the robot to not visit the site, page, or folder.

To hide your entire site

User-agent: *

Disallow: /

To hide individual pages

User-agent: *

Disallow: /page-name

To hide an entire folder of pages

User-agent: *

Disallow: /folder-name/

To include a sitemap

Sitemap: https://your-site.com/sitemap.xml

Helpful resources

Check out more useful robots.txt rules.

Note: Anyone can access your site’s robots.txt file, so they may be able to identify and access your private content. 

Best practices for privacy 

If you’d like to prevent the discovery of a particular page or URL on your site, don’t use the robots.txt to disallow the URL from being crawled. Instead, use either of the following options: 

FAQ and troubleshooting tips

Can I use a robots.txt file to prevent my Webflow site assets from being indexed? 

It’s not possible to use a robots.txt file to prevent Webflow site assets from being indexed because a robots.txt file must live on the same domain as the content it applies to (in this case, where the assets are served). Webflow serves assets from our global CDN, rather than from the custom domain where the robots.txt file lives. 

I removed the robots.txt file from my Site settings, but it still shows up on my published site. How can I fix this? 

Once the robots.txt has been made, it can’t be completely removed. However, you can replace it with new rules to allow the site to be crawled, e.g.: 

User-agent: * 

Disallow:

Make sure to save your changes and republish your site. If the issue persists and you still see the old robots.txt rules on your published site, please contact customer support.