prevent Google from indexing media/images

Author Message

Pascal Specht

Thursday 16 April 2009 3:00:44 am

Hi there,

I've seen recently that Google indexed some folders with image content inside the media/images: What can I do to prevent search engines from indexing the media content? Did anybody have success with a robots rule on www/var/ezwebin_site/storage/images/media for example? Or is there another place I should look at?

Thanks in advance,
</Pascal>

Gaetano Giunta

Thursday 16 April 2009 5:23:06 am

robots.txt is surely your friend here.

Please note that it will not prevent malicious bots from indexing images - the only way to achieve that is to do for "content" images what is normally done for other binary content:
a - allow access to them via links to "content/download" instead of direct access (might involve creating a custom template operator and download handler)
b - set up a web server rule to block direct access to images

Principal Consultant International Business
Member of the Community Project Board

Andreas Kaiser

Thursday 16 April 2009 6:47:38 am

Some ways are:

1. to have a robot rule for not indexing "Media" directory in the robots.txt

User-agent: *
Disallow: /var/
Disallow: /Media/

2. Adding

<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">

in the head of the media pages. You can use section id for adding this tag only in the media section...

eZ Partner in Madrid (Spain)
Web: http://www.atela.net/

Pascal Specht

Thursday 16 April 2009 8:15:34 am

Thank you both for your help!

</P>

Michael Gross

Saturday 11 July 2009 12:03:08 am

To follow up the original poster's question, what is the surest way to prevent any spiders from searching an entire site? Would a robots.txt with the following do it?

    User-Agent: *
    Disallow: [File Name a]

Thanks,

You've a noob here, so I also need to know where in the site heirarchy the robots.txt file goes.

As far as editing the site configuration files, is there a mandatory or recommended text encoding that should be used?

Michael

André R.

Saturday 11 July 2009 3:59:15 am

robots.txt is not eZ Puiblish spefic, so if you want to learn about it, the best way is to google it (wikipedia has a good entry, at least the english one is*).
Short: just like favicon.ico, place it in the root of you installation and make sure you can access it in your browser, like http://ez.no/robots.txt
Rewrite rules in Apache are key her, but the ones recomended in doc** has it enabled by default. And if you use .htaccess (shared server where you don't have access to apache config), you just need to uncomment the lines about robots.txt / favicon.ico to allow access***.

*: http://en.wikipedia.org/wiki/Robots_exclusion_standard
**: http://ez.no/doc/ez_publish/technical_manual/4_0/installation/virtual_host_setup
***: http://pubsvn.ez.no/nextgen/trunk/.htaccess_root

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Heath

Saturday 11 July 2009 4:17:46 am

But a little parts of using a robots.txt file most often require some eZ Publish configuration changes in order for the file to be recognized by search engines.

If you are using virtualhost mode you will want to add a similar exclusion rule for /robots.txt file. Otherwise eZ publish will try to resolve the url internally and fail. To avoid this problem exclude the file with mod_write rules.

Snippet of apache vhost configuration file's mod_rewrite rules,

                 RewriteRule ^/robots\.txt - [L]
                 RewriteRule .* /index.php

Technically it would be more eZ Publish compatible to have the robots.txt file stored within an extension (extension/mydesign/design/standard/files/robot.txt) and use a transparent mod_rewrite/mod_proxy redirection to the actual destination contents. I don't have a handy example of these rules though.

Cheers,
Heath

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

Michael Gross

Monday 13 July 2009 10:32:05 pm

Thanks for your replies. Based on the techinical nature of the replies, and my lack of experience, I think I'll just put up some summary content and allow the indexing. I have put your replies into my notebook for further research.

Michael

Michael

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.