Learn / eZ Publish / eZ Publish Search Engine Optimization

eZ Publish Search Engine Optimization

This article describes the basic techniques of Search Engine Optimization (SEO) that can be used with eZ Publish sites. Before discussing specific tips and tricks, let's set the context by considering the importance of search engine ranking.

How often do you start a browser session by going to Google, AllTheWeb or MSN Search? Most likely you visit these sites several times per day. Do you remember the phrase: 'If something is not on the Internet, it does not exist'? Nowadays, the slogan is: 'If the website cannot be found on the first couple of pages of Google search results, it does not exist.'

The fact that visibility within search rankings is critically important is obvious to anyone who runs a commercial website. Someone with a webshop wants his products to be at the top of the search results for his potential customers. A service company wants clients to easily find information about their service offerings.

SEO is a set of techniques used to customize web pages so they are more effectively indexed by the search engine's scanning mechanisms and therefore have a higher position in the search results. By properly applying these techniques, customers and visitors find your site more easily via search engines.

From the perspective of a search engine, indexing web resources consists of analyzing and ordering pages in the search engine database by the criteria of importance. When a search engine user searches for keywords, the search engine provides links to pages that are relevant to the keywords. These "search results" are ordered according to the page's rank in the search engine database.

The indexing process can be described by dividing it into 3 steps:

  1. Getting the resource (usually a web page) for indexing.
  2. Scanning and analyzing the page content and the links related to it to determine what the resource is about and how important it is compared to other documents within the same topic.
  3. Storing a position in the rankings for all the keywords found in the document (according to the resource's estimated ability to deliver relevant results to the end user).

Search engines continuously index and re-rank resources on the web. Because content within existing resources changes, and because new resources are constantly being added, the search indexes must be constantly updated to reflect the current state of the web.

Three components affect a website's position in keyword rankings:

  • Content - the density of each keyword in certain HTML tags on a page (for instance text in the title tag, the text within page headings, the content of alt attributes in img tags, etc).
  • Links - the links between pages on a website, and the links between the resource and other resources on the web and to and from other websites.
  • Popularity - the number of links to the resource and the ranking of the sites from which the links originate; the number of users viewing the page, returning to the page and selecting links on the page.

Ideally, each resource on your site is represented by each of these components. Ranking depends not only on the individual component, but on the combined score of the components. Therefore, a page that scores high for links but low for content is probably less highly ranked than a page that scores equally for links and content.

The following sections describe the eZ Publish features that support SEO.

eZ Publish complies with web standards defined by the WWW Consortium (W3C) . This has a positive impact on the overall HTML quality of an eZ site. Because of this compliance, the source code for the Graphical User Interface elements is in a valid XML format (by default XHTML 1.0 Transitional), with a clean DOM structure. There are no problems with non-valid and badly structured XHTML, for example deeply nested and unclosed HTML tags. By structuring a document, XML makes it possible to use content in many ways and formats (HTML, RTF, OpenDocument, etc.) and makes it easy for search engine robots to parse the content.

While non-compliant code can be displayed by many browsers, other readers (such as search engine parsers) are not so forgiving and will not necessarily be able to interpret the document correctly. The following code snippet shows invalid markup that, while it may display in a browser, will not necessarily be correctly parsed by a search engine:

<Table>
 <TR>
   <TD>
     <P>
       <ul>
         <LI>
           <a HREF="#">
             <IMG srC=image.gif>
           </A>
         </UL>
   </TD>
 </tr>
</tABLE>

This snippet, on the other hand, is completely clear code that is understandable by all modern XHTML parsers that support the XHTML 1.0 standard. This code can be interpreted by all kinds of browsers and by search engines.

<div>
 <ul>
   <li>
     <a href="#">
       <img src="image.gif" />
     </a>
   </li>
 </ul>
</div>

The structure of a web page consists of:

  • the content of the document (data and metadata)
  • navigation elements (links between pages)
  • additional information (banners, footer, information blocks not directly connected with the core content)

Different information blocks can be positioned in different locations on the page (in this case (1) header, (2) horizontal menu, (3) vertical menu, (4) main content, (5) additional information, (6) footer).

In the past, a page layout such as the one shown above would have been implemented via an HTML table structure. All the components of the page ? the content (in HTML format) and design (HTML code for tables and CSS styles) - were included in one document. This made it difficult for webmasters to change the page design, because items such as the information boxes are associated with a specific cell in the table. This design also made it difficult for search engines to understand the content and analyze it correctly.

Additionally, table-based design made it impossible to display websites correctly on non-standard browsers such as:

  • text-only browsers
  • voice-driven accessibility interfaces
  • mobile and hand-held browsers with small screens

One of the advantages of the eZ Publish template system is that the layout is determined via <div> blocks ( table-less layout) rather than tables. Web developers construct pages from information blocks (header, menu, main content, additional or related information, footer, etc.) and then specify the layout using external style sheets (CSS files).

The page layout with the vertical menu positioned as the left column block

The page layout with the horizontal menu positioned as part of header block.

Valid XHTML source code allows people and search engines to precisely understand the content of the document. XHTML and CSS code connected via the page layout template (the main eZ Publish template - pagelayout.tpl) reduces page size (because formatting like fonts, sizes and colors are stored in re-usable CSS files).

The <title> tag is generated automatically by eZ Publish. It contains the reversed path to the document. For example, the <title> tag from the eZ Publish Demo page on the eZ website is:

<title> Demo / eZ Publish / Products </title>

The title of the page

Optionally, you can modify the contents of the <title> tag, as every entity in eZ Publish is based on templates. The default template script that generates the <title> is contained in the page_head.tpl file:

{let name=Path
    path=$module_result.path
    reverse_path=array()}
 
 {section show=is_set($module_result.title_path)}
 {set path=$module_result.title_path}
 {/section}

 {section loop=$:path}
 {set reverse_path=$:reverse_path|array_prepend($:item)}
 {/section}

 {set-block scope=root variable=site_title}
 {section loop=$Path:reverse_path}
  {$:item.text|wash} {delimiter} / {/delimiter}
 {/section}
 {$site.title|wash}
 {/set-block}

{/let}

<title>{$site_title}</title>

The <meta> tags (description, keywords, etc.) are determined by the settings in the system configuration file ( site.ini). eZ Publish adds the tags automatically to the <head> section of the generated XHTML document.

Metadata configuration in the site.ini file:

[SiteSettings]
# List of metadata to set in pagelayout
MetaDataArray[author]=eZ systems
MetaDataArray[copyright]=eZ systems
MetaDataArray[description]=Content Management System
MetaDataArray[keywords]=cms, publish, e-commerce, content management

The page_head.tpl template file extracts the meta tags from the above configuration:

{section name=meta loop=$site.meta}
   <meta name="{$meta:key|wash}" content="{$meta:item|wash}" />
{/section}

Adding keywords and descriptions to class definitions can provide an additional level of content information that search engines can use. Template scripts can insert the keywords and descriptions into the meta tags.

Additional attributes in the definition of the content class.

This simple script template can insert keywords and a description into meta tags:

{let node_metas=fetch(conten, node, hash(node_id, $module_result.node_id))}
 <meta name="description" content="{$node_metas.object.data_map.meta_description.content.output.output_text}" />
 <meta name="keywords" content="{$node_metas.object.data_map.meta_keys.content.output.output_text}" />

 {/section}

{/let}

Headers (<h1>, <h2>, etc) are very important for Search Engine Optimization. They are used to structure content via titles, subtitles and topics. Used properly, they give visitors and search engines clear information about the document's content.

By default, eZ Publish assigns <h1> tags to the article title and automatically descends through the header levels (<h2>, <h3>, <h4>).

Headers in eZ Online Editor.

Source code of an article with headers generated by eZ Online Editor.

URL addresses are one of the main ranking factors during search engine indexing. Search engine robots analyze links because they generally:

  • consist of one or a few words
  • are related to the most important information
  • direct users to pages that are extensions or are at least related to the current content

The eZ Online Editor provides an interface for adding keywords within anchor (<a>) tags.

Adding a link in eZ Online Editor.

Internal links

Links to pages on an eZ Publish site are long and complex, consisting of resource and session information. For example:

http://www.example.com/index.php?id_kat=123&id_art=234234&sess_id=a87df87g8sdfgs7d

eZ Publish has a tool called "friendly URL" to eliminate the problem of complex addresses. Friendly URLs remove special signs that search engines cannot parse, like: '?', '&', '+', '&xx', etc. Instead of "dirty" URLs, eZ Publish generates URLs containing a path to the resource. For example:

http://www.example.com/folder_name/article_title

But what happens when an editor changes the title of the article to "Updated Article Title"? eZ Publish changes the friendly URL:

http://www.example.com/folder_name/updated_article_title

If someone requests the original URL, eZ Publish automatically redirects the request to the page with the new name, thus preventing HTTP 404 Not Found errors.

URL translation

In addition to friendly URLs, eZ Publish provides the ability to create especially short and memorable URLs. This address translation can be used to provide quick and intuitive addresses for important resources. As an example, the page located at the URL:

http://ez.no/product/ez_publish/info/ez_publish_camp_2005

...can also be accessed via this URL:

http://ez.no/camp2005

Outgoing links

Linking to other well-ranked sites can have a positive effect on the rank of the page containing the link. Search engines trust big well-known information portals, especially if they are related to the topic of the page where the link originates. Links to other sites are easy to create (especially with the Online Editor ) and are easily managed via the eZ Publish URL tool.

URL verification

Valid links are critical to search engine optimization. The eZ Publish URL verification tool makes it easy to check all the links within a site and to generate statistics regarding valid and invalid URLs.

Managing URLs in the administration panel.

One of the system maintenance tasks contained in the runcronjobs.php script (installed by default in the root folder of the eZ Publish installation) checks link validity and reports broken links as "Invalid". Administrators can use this information to edit and correct URLs.

Site relocation

If you have to change the location of your site, eZ Publish can be configured to use the HTTP 301 error code response ("moved permanently"). This tells search engines about the new location of the site and ensures you keep your position in the search engine ranks.

Problems with dynamic navigation

When a website's navigation is built dynamically or via mechanisms other than standard links (for example, via JavaScript, Flash objects, frames, special characters or image maps), search engines cannot parse the links, because they can only read the HTML source code. Therefore, non-standard site navigation methods will reduce a site's link ranking because the search engine cannot identify the links.

Text within <strong> and <emphasis> tags make important keywords more visible to search engines.

Formatting content in the Online Editor using header, bold, italic and plain text.

Formatting content manually using header, bold, italic and plain text.

newpageWhile "a picture is worth a thousand words", search engines can only use words for building indexes, not pictures. The "alt" parameter in the <img> tag is essential, both for accessibility (to support readers such as text-based browsers) and for search engine indexing. By adding "alt" descriptions to images, search engines can gather information about the content on the page.

Browsing web pages with enabled pictures.

For example, the XHTML tag for the picture above is:

<img href="[...]" alt="eZ Publish 3.8 release" />

A search engine (or a site visitor who cannot view images) sees the alternative text:

Browsing web pages with disabled pictures - the content of the "alt" attribute from the <img> tag is displayed instead of the picture.

A well-designed sitemap can be very useful for search engine optimization. This special page consists of links and keywords that describe the website. By default, eZ Publish automatically generates sitemaps in a simple and SEO-attractive form:

<h1>Folder A sitemap</h1>
 <h2><a href="folder_a/folder_x">Folder X</a></h2>
 <ul>
 <li><a href="folder_a/folder_x/article_1">Article 1</a></li>
 <li><a href="folder_a/folder_x/article_2">Article 2</a></li>
 </ul>
 <h2><a href="folder_a/folder_y">Folder Y</a></h2>
 <ul>
 <li><a href="folder_a/folder_y/article_3">Article 3</a></li>
 <li><a href="folder_a/folder_y/article_4">Article 4</a></li>
 <li><a href="folder_a/folder_y/article_5">Article 5</a></li>
 </ul>

Additionally, you can create a sitemap for every part of the site by setting the number of the content node in the URI. The following URI generates a map of the whole website root node (usually assigned ID "2"):

http://www.example.com/content/view/sitemap/2

Section sitemap

In dynamically growing information websites and portals, Sections provide a layer of content organization. In the simplest case, a section is a subtree of the information structure.

For example, eZ.no as a corporate site consists of pages about the company and its products and services. In addition, there are other complex sections designed for specific audiences, such as the eZ partners area and the eZ community pages. These sections need specially designed portal pages to link to all of the section's components, such as forums, news, blogs, articles, etc.

For such a large sections, dedicated sitemaps can be used. eZ Publish enables developers to generate sitemaps for any subtree, for example:

http://www.example.com/content/view/sitemap/x

...where "x" is the node id.

Sitemaps for sections can be useful for new site visitors and for search engine robots because they are simple navigation maps, consisting of clean lists of headers and links without additional components such as design, pictures and text.

Google sitemap

To optimize Google search rankings, you can build Google Sitemaps. These are similar to standard sitemaps but are expressed in a specially defined XML format. These documents are used by the Google indexer to simplify the indexing process. To generate a Google sitemap, use the dedicated eZ Publish templates that conforms with Google's XML format.

Content syndication is very popular with sites where information is frequently updated. RSS is an XML format that "feeds" links from news sites, blogs etc, to subscribers. Search engines analyze RSS feeds because they are current, popular among end-users and often served, exchanged, reused and archived by information websites.

eZ Publish supports both the import and export of RSS channels. Use the eZ Publish Administration Interface to configure RSS.

Import and export RSS in the eZ Publish Administration Interface.

eZ Publish supports the development of multilingual websites. Many editors can work independently with different translations of a single object. Usually a separate siteaccess is created for each of the translated sites. Each has its own URL, as the following example of English, German and Polish versions shows:

http://www.example.com/en

http://www.example.com/de

http://www.example.com/pl

The language, localization and charset of each site is set in the main template ( pagelayout.tpl file) of every siteaccess. As an example, a site in the English language with Great Britain localization has the following XHTML code:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB">
 <head>
 <meta http-equiv="Content-language" content="en-GB" />
 <meta http-equiv="charset" content="iso-8859-1" />
 </head>
</html>

While search engines can ( to a degree) determine the language of a site on their own, the additional parameters help search engines classify the site more exactly according to the language, localization and charset. This is useful when people search for information specific to a country or language.

Improved support for multilingual sites was added in the 3.8 release of eZ Publish. The most relevant aspect for search engine optimization is the generation of unique URLs for every object translation. For example:

http://www.example.com/en/first_folder/second_articles_title

http://www.example.com/pl/pierwszy_folder/tytul_drugiego_artykulu

Both addresses have appropriate titles that include their Polish and English names.

While there are many tricks to increase a site's position in search results, not all of these tricks are ethical, and some will be treated as cheating by end-users and search engines. The consequences can be severe, including the removal from the search engine index all records related to the website.

As an example, one method of "fooling" search engines is to add <div> blocks with many keywords that are not related to the website's topic, and then moving the block outside the screen or formatting the content color in the same way as the background.

HTML code:

<div id="keywords"> Many repeated keywords here in <h1> and more keywords </h1> for instance </div>

external CSS code:

#keywords {
 position: absolute;
 top:-2000px;
}

Techniques such as this can result in having your site removed from search engine results.

Search engine optimization is a valuable practice that should result in increased site visits and, in the case of e-commerce sites, increased sales. There are several mechanisms for verifying the effect of search engine optimization on a website's popularity and profits.

During the process of optimization, webmasters can track the position of the website in search ranks. There are many tools that can be used for this purpose, including for example GoogleRanks. The eZ Publish Webstats tool can be used to check the number of site visits, users, page views and referring URLs. An increase in these statistics will show you when your search engine optimization is successful.