Can't get my robots.txt file recognised

Author Message

Tony Coe

Thursday 24 August 2006 7:16:55 am

Apologies if this has been covered elsewhere, but have found a couple of rewriting articles and tried what was suggested in them, but without success.

I am configuring a copy of ez publish to run across several domains and have it all running ok, but can't get my robots.txt file to be picked up anywhere. I think the problem may be caused by my path prefix, but can't work out how to get around it.

The domain in question is set to use the path prefix of /noni_horses/

I originally added the line
RewriteRule ^/robots.txt - [L]
to the rewrite section of my virtual host settings.
When this didn't work and thinking the prefix might be causing the problem, I changed it to
RewriteRule ^/noni_horses/robots.txt - [L]
Again with no joy.

I have copies of my robots.txt file both in the root of my site and have also tried putting a copy in a subfolder /noni_horses/

I always just get an error kernel 20.

I'll be the first to admit that I'm not entirely sure what I'm doing here - I have a pretty good working knowledge of php, but very limited experience of apache. Please can someone tell me what I'm doing wrong? Help!

Marcin Drozd

Thursday 24 August 2006 7:31:32 am

Hi
If U have
RewriteRule !\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf$ index.php
try to add:

|robots\.txt

and perhaps
<FilesMatch "(index\.php|<b>robots\.txt|</b>\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">

http://ez-publish.pl

Claudia Kosny

Thursday 24 August 2006 12:09:59 pm

Hello Tony,

Your robots.txt is not supposed to be picked up by EZ so your first rewrite rule
^/robots.txt - [L]
should be ok.

The search engine spiders will pick it up only at the root of your server, no matter whre you installed EZ. If you have installed EZ in a subdirectory /noni_horses of your server docroot, the robots.txt still needs to go to the docroot, you have to consider the subfolder in the settings of your robots.txt.

So what you have to achieve is that your robots.txt is displayed if you call up http://serverroot/robots.txt.

This of course changes if have virtual hosts settings that make sure that you can call up your server with http://www.noni_horses.<whatever tld you use >
In this case the spiders do not see that you are using a subdirectory.

Getting an kernel 20 error whenh trying to call up the robots.txt via EZ is expected - after all it is not an module or something like this.

Greetings from Luxembourg

Claudia

Tony Coe

Thursday 31 August 2006 2:47:17 am

Hi Claudia/Marcin,

I just can't get it to work!
I've tried adding the following below ' RewriteEngine On' in my vhost.conf file:
RewriteRule !(^/design|^/var/.*/storage|^/var/storage|^/var/.*/cache|^/var/cache|^/noni_horses/robots\.txt|^/extension/.*/design|^/kernel/setup/packages|^/packages|^/share/icons).*\.(gif|css|jpg|png|jar|js|ico|pdf|swf)$ /index.php

I also tried
RewriteRule !(^/design|^/var/.*/storage|^/var/storage|^/var/.*/cache|^/var/cache|^/robots\.txt|^/extension/.*/design|^/kernel/setup/packages|^/packages|^/share/icons).*\.(gif|css|jpg|png|jar|js|ico|pdf|swf)$ /index.php

and also
RewriteRule ^/robots.txt - [L]
and
RewriteRule ^/noni_horses/robots.txt - [L]

I also tried changing AllowOverride None to AllowOverride All for the virtual host and tried putting the same in the .htaccess file, with no joy.

I also tried adding:<FilesMatch "(index\.php|robots\.txt|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

with no difference
and also
<FilesMatch "(index\.php|\noni_horses\robots\.txt|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

I know I'm probably just being stupid and not understanding how the rewrites work here, but I really can't work out where I'm going wrong...

Incidentally, I notice that there doesn't seem to be a valid robots.txt on ez.no!
(at least trying to access ez.no/robots.txt gets a kernel 20 error, same as I'm getting....)

Claudia Kosny

Thursday 31 August 2006 9:29:34 am

Hello Tony

Unfortunately I made a mistake in my previous posting which might well be the cause of the problem you still have.
The rewrite rule must _not_ have a leading slash is this is part of the directory structure and will be stripped by the rewrite engine. So just remove the slash and it should work fine.
Forget about the path noni_horses as the robots.txt must be in the document root of your virtual host.

If you still have problems, please check the rewrite.log - there you can see which rewrite rules are applied to which file.

You are right that ez.no does not seem to have robots.txt. Althoughthey might check the user agent in their htaccess/virtual host and only deliver it for certain spiders, not for web browsers. On the other hand I think the main reason for not having a robots.txt is that you don't need one if you have only a EZ installation on your server. Provided you use the htaccess or virtual host settings as recommended during installation, there is no need to forbid any folder to a searchengine - anything worth protecting is protected by htaccess or a required login

To be on the safe side here the rules that work for me:

php_value allow_call_time_pass_reference 0

<FilesMatch ".">
order allow,deny
deny from all
</FilesMatch>

<FilesMatch "(^robots\.txt$)|(index\.php|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

RewriteEngine On

RewriteRule ^robots\.txt$ robots.txt [L]
RewriteRule !\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf$ index.php

DirectoryIndex index.php

Greetings from Luxembourg

Claudia

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.