Thursday 27 November 2003 1:59:43 pm
Hi all I have worked on a number of sites over the last 12 months and was becoming increasinging frustrated because theyy were not being spidered beyond the home page by google. I found the reason this week! Have you noticed that on some ezPublsih sites, the first page visited will have links will have appended something like "?PHPSESSID=b0da36931dc38bd1f04e9a7af8c5b165" ? Well this is the issue! From another CMS mailing list I'm on: "We were having a problem getting our action app content indexed (by google search, not news), so i asked my brother who had just started working at Google. He said:
1. yes, they do index the query string (stuff after the ?).
2. in order to do so, they pay attention to the problem of session variables in the query string by assuming that anything that looks like a session variable is one.
3. the long item ids are thus assumed to be session variables, and aren't getting spidered (i don't know the exact rule, but probably any string longer than 16 chars is going to be assumed to be a session variable). 4. they were trying to improve their algorithm for figuring out what's a session variable and what isn't." This issue is not a specific ezPublish one but relates to the fact that it uses sessions and a PHP default configuration. The php configuration item is "session.use_trans_sid" This needs to be turned off and the session information will dissappear from the link, the site will work fine and google will get beyond your home page. See http://martin.f2o.org/php/session for details.
Cheers
Bruce http://www.designit.com.au/
My Blog: http://www.stuffandcontent.com/
Follow me on twitter: http://twitter.com/brucemorrison
Consolidated eZ Publish Feed : http://friendfeed.com/rooms/ez-publish
|