While we recommend you develop Magento sites in such a way that the development site is not publicly accessible, it is sometimes needed to connect a Magento testing site to the web right away. Here are some tricks to make sure search engines like Google Search do not start indexing this test site.
Resetting robots.txt
You might have a great robots.txt
for your live site, that tells search engines what to index and what not. When it comes to testing sites or development sites, this is less efficient. A robots.txt
that tells search engines to move along and not index anything, looks like this:
User-agent: *
Disallow: *
Sending the X-Robots-Tag header
Another way is to add the X-Robots-Tag
header to all of your content. With the robots.txt
file, you rely on searching engines to read and interpret this file. Search engines will cache this file, so changes to this file might not have direct effect. With the X-Robots-Tag
you do have direct results, because the HTTP header is added to any response sent from your server to the search engine.
You can add the following to your .htaccess
file (assuming you are running Apache):
Header set X-Robots-Tag "noindex, follow"
Using a PHP line for adding X-Robots-Tag
If you are running another webserver like Nginx, or if you have setup where .htaccess
files are simply ignored, you might be forced to add the statement to the Virtual Host configuration of your webserver. An alternative is to simply add the following PHP code to the top of your index.php
file in the root of the Magento filesystem:
header('X-Robots-Tag: noindex, follow');
Which method to use?
In this guide you find 3 methods to instruct search engines to skip your site. A question would be which method is best. Personally I think you should simply implement all 3 methods, to make sure that a change of environment does not cause your non-production site to be indexed all of a sudden. The index.php
change perhaps classifies as a core hack, though I personally see the index.php
file as part of the modifiable configuration. It also contains switches for debugging and maintenance, that I modify most of the time. So my suggestion is to use all 3 methods to be sure.
Blocking access by IP
Probably the safest way of hooking your site to the web, without actually any chance of search engines indexing anything, is to block access. The strategy here would be to allow your own IP addresses and blocking all other IPs. This can be done by adding the following to the top of your .htaccess
file:
Order deny,allow
Deny from all
Allow from 127.0.0.1
Allow from 127.0.0.2
In this case, only the IPs 127.0.0.1
and 127.0.0.2
are allowed access. You probably need your own IP to this as well. This example uses the .htaccess
file, which only works under Apache. For other webservers like Nginx, you will need to add deny
and allow
rules directly to your Nginx configuration files.
About the author
Jisse Reitsma is the founder of Yireo, extension developer, developer trainer and 3x Magento Master. His passion is for technology and open source. And he loves talking as well.