Learn / eZ Publish / Helper Child Nodes: User-Friendly Search Results and Preventing Full Node Views

Helper Child Nodes: User-Friendly Search Results and Preventing Full Node Views

The target audience for this article is developers with advanced knowledge of eZ Publish.

There are two ways to associate helper nodes with a master node:

  • Via an explicit relation: using object relation or object relations attributes, using general object relations or <embed> tags in an XML text block.
  • Using the tree structure: putting helper nodes below the master node.

In this article, we discuss the latter type, where helper nodes are children of a master node. If you are familiar with helper nodes and the issues involved, you can skip directly to the section named "Indexing child nodes in their master nodes" on the Search results page. Otherwise, we recommend that you read this article in the order that it is written.

Extensions used

The extensions as they are introduced in this article are for eZ Publish 3 releases and can be downloaded from http://www.seeds.no/eng/ez-publish/downloads. However, the same extensions for eZ Publish 4 can be downloaded from the same location.

Here, we will take a look at several scenarios of common structures involving helper nodes.

Forums

If you have a forum in eZ Publish, you probably use classes for forums, forum threads and forum replies. A forum has forum threads as children and a forum thread has forum replies as children. The full view of a forum shows a list of threads under it and the full view of every thread shows all or some of its children.

However, you do not want to show the full view of a forum reply. More importantly, you do not want to show multiple replies within the same thread if you search for a term that frequently occurs in a few replies (for example, when these replies quote the same text).

Similar examples include article comments, glossaries, and lists of simple entries like basic information about employees, web addresses, and so on.

These examples have the following in common:

  • The full view of children must not be shown.
  • Search hits must return the master, parent node.

Arrays of content

In eZ Publish, if you need content that is an array of strings, integers or other simple types and you do not need the validation, you can use the Matrix datatype and override the edit template for the attribute. However, if you need validation or you need to use other types such as images, XML text blocks or files, you need to use helper nodes.

Consider an information portal for a tax office. Your customer wants to be able to attach an arbitrary number of electronic forms in PDF format to pages about tax requirements. The process also needs to be simple to remove, replace or add forms at any time without editing the specific tax requirements page. You will probably have a class with an XML text block for the tax requirements pages and a class for the PDF forms.

Download links are directly available in the full view of the requirements pages. Regarding the forms themselves, the customer may or may not want to show the full views. It might be unnecessary and undesirable, or the customer might want to show some metadata about the PDF documents, such as keywords, the document author, instructions on how to fill in the form, and so on. Either way, for search hits, you probably want to return the requirements pages only (so that users see the forms in the context of the related requirements).

To summarize:

  • The full view of children may or may not be shown.
  • Search hits must return the master, parent node.

Galleries

An image gallery is a typical example where the full view of the children is necessary (showing an image in full scale, together with links to the previous and next images). However you might want to give users the choice to return search results for individual images or for the galleries containing the image results.

In other words:

  • The full view of children must be shown.
  • Search hits may or may not return child nodes.

In all of these scenarios, note that it is important to use unique classes for helper nodes. If you were to use the same classes for both helper nodes and other, unrelated nodes, this makes it much more difficult to override the full, line, and other views for the helper nodes and to prevent them from appearing as search hits.

Although you can try to avoid any links to helper nodes, users can still guess the URL and type it directly. Therefore, the solution is to redirect the page to the parent node when a user accesses the full view of a helper child node. We have several options to do this and will start with a simple solution for the full view template:

<script type="text/javascript">
<!--
    location.pathname = {$node.parent.url|ezurl};
// -->
</script>
<p>You should be automatically redirected, if you are not, please click <a href={$node.parent.url|ezurl}>here</a>.</p>

The solution above uses JavaScript to redirect the user to the master parent node. If a user's browser does not support JavaScript, he / she will have to click a link to the parent node.

Meta refresh

Another solution exists to automatically redirect the user without using JavaScript: the meta refresh tag. Here, the <meta> tag in the HTML header instructs the browser to redirect to a page immediately or after a given number of seconds. This solution was originally developed by Netscape but is now supported in the majority of web browsers.

<html>
<head>
<meta http-equiv="Refresh"
      content="0; url=http://www.mysite.com/address/to/redirect" />
</head>
...

Note that the “0” in the “content” parameter means: “wait 0 seconds and then redirect”.

As this redirection must be done in the HTML header, we cannot do this all in the full view template. We can use persistent_variable, a variable that can be set in a full view template and will subsequently be available in the pagelayout, even if the view cache is used. persistent_variable will store the path of the node to which we want to redirect:

In the /node/view/full.tpl override:

{set scope="global" persistent_variable=hash( 'redirect', $node.parent.url )}

In pagelayout.tpl:

{* ... *}
 
<head>
 
{* ... *}
 
{if and( is_set( $module_result.content_info.persistent_variable.redirect ),
         $module_result.content_info.persistent_variable.redirect )}
<meta http-equiv="Refresh"
      content="0; url={$module_result.content_info.persistent_variable.redirect|ezurl( 'no', 'full' )}" />
{/if}
 
{* ... *}
 
</head>
 
{* ... *}

HTTP redirects

The solutions mentioned above suffer from a certain usability problem. After the user is redirected, if he / she clicks the browser's Back button, this will load the full view of the helper node again, which redirects the user immediately back to the parent node. Some modern browsers try to overcome the problem, but the resulting behavior or solution is different across browsers.

A better solution is to use HTTP redirects -- in other words, generating responses with status code 301 (permanent redirect) or 302 (temporary redirect). There is no template operator that can do this and thus we need to write our own.

We will create an extension called “redirect” (alternatively you can add the operator to an existing extension – in that case you will need to change the extension's name in the INI settings and alter paths in the PHP files) with the following files:

autoloads/eztemplateautoload.php:

<?php
 
$eZTemplateOperatorArray = array();
 
$eZTemplateOperatorArray[] = array(
    'script' => 'extension/redirect/autoloads/redirectoperator.php',
    'class' => 'RedirectOperator',
    'operator_names' => array( 'redirect' )
);
 
?>

autoloads/redirectoperator.php:

<?php
 
class RedirectOperator
{
    function RedirectOperator()
    {
        $this->Operators = array( 'redirect' );
    }
 
    function &operatorList()
    {
        return $this->Operators;
    }
 
    function namedParameterPerOperator()
    {
        return true;
    }
 
    function namedParameterList()
    {
        return array(
            'redirect' => array(
                'url' => array(
                    'type' => 'string',
                    'required' => true
                )
            )
        );
    }
 
    function modify( &$tpl, &$operatorName, &$operatorParameters, &$rootNamespace,
                     &$currentNamespace, &$operatorValue, &$namedParameters )
    {
        include_once( 'lib/ezutils/classes/ezsys.php' );
        include_once( 'lib/ezutils/classes/ezhttptool.php' );
        include_once( 'lib/ezutils/classes/ezexecution.php' );
 
        $redirectUri = $namedParameters['url'];
        // if $redirectUri is not starting with scheme://
        if ( !preg_match( '#^\w+://#', $redirectUri ) )
        {
            // path to eZ Publish index
            $indexDir = eZSys::indexDir();
 
            /* We need to make sure we have one
               and only one slash at the concatenation point
               between $indexDir and $redirectUri. */
            $redirectUri = rtrim( $indexDir, '/' ) . '/' . ltrim( $redirectUri, '/' );
        }
 
        // Redirect to $redirectUri by returning status code 301 and exit.
        eZHTTPTool::redirect( $redirectUri, array(), 301 );
        eZExecution::cleanExit();
    }
}
 
?>

settings/site.ini.append.php:

<?php /* #?ini charset="iso-8859-1"?
 
[TemplateSettings]
ExtensionAutoloadPath[]=redirect
 
*/ ?>

Do not forget to enable the extension. To use the new operator, simply call it in an override of node/view/full.tpl like this:

{redirect( $node.parent.url )}

Note that we do not need to switch off view caching (by setting the cache_ttl variable to “0”) because the redirect operator will terminate the processing of the request before the cache file can be saved.

To avoid fetching a helper node, its object and its parent node at every request, we can benefit from view caching, using persistent_variable again to remember to which page to redirect (in the node/view/full.tpl override):

{set scope="global" persistent_variable=hash( 'redirect', $node.parent.url )}

The redirection is then handled in pagelayout.tpl:

{if and( is_set( $module_result.content_info.persistent_variable.redirect ),
         $module_result.content_info.persistent_variable.redirect )}
{redirect( $module_result.content_info.persistent_variable.redirect )}
{/if}
 
{* Original pagelayout.tpl *}
{* ... *}

For more information on URL redirection, see http://en.wikipedia.org/wiki/URL_redirection.

We will assume that master nodes have only one location. If a master node had multiple locations, each of its locations would have its own set of children and thus multiple locations of such a master node would have different helper nodes. As eZ Publish search algorithms work with objects rather than nodes, it would be impossible to determine which location is the right one.

By default, eZ Publish shows search results using the line view, so we will also assume this in our examples.

If we search for content stored in helper child nodes, we will see these helper nodes in the list of search results. By default, these results link to the full views of these nodes. We will show the link to the parent node instead by using the following code in an override of the node/view/line.tpl template for helper node classes:

<a href={$node.parent.url|ezurl}>{$node.parent.name|wash}</a>

We could alternatively use this:

{node_view_gui content_node=$node.parent view='line'}

However, if a search term exists in more than one helper node (suppose that there are many forum replies quoting the same text), you will get several results showing and linking to the same master node. This is quite confusing to users. One way to address this is to alter the line template to let a user know that the information exists in different places. For example, for forum threads, we can use the following:

<a href={$node.parent.url|ezurl}>{$node.parent.name|wash} (in reply by {$node.object.owner.name|wash} on {$node.object.published|l10n( 'datetime' )})</a>

Alternatively, we can use anchors to mark the helper nodes' content in the master node's full view. First, insert the following piece of code in the full view of the parent node, preceding the display of each child (assuming $child denotes a helper node):

<a name="{$child.url|trim( '/' )|explode( '/' )|reverse[0]}"></a>

Then, insert this code in the line view of the child in order to link to the anchor:

{def $url_last_part=$node.url|trim( '/' )|explode( '/' )|reverse[0]}
<a href={concat( $node.parent.url, '#', $url_last_part )|ezurl}>{$node.parent.name|wash} ({$node.name|wash})</a>

Indexing child nodes in their master nodes

The best solution is to show only the master nodes in search results. The following section describes a simple and elegant solution.

Every datatype in eZ Publish has a special function named metaData(), which returns a string of words to be indexed (or a special array that contains such a string). Using this to our advantage, we will prepare a special datatype, which will not hold any information. Its sole purpose is to return a string of all words indexed for the children of its node. In addition, we will prepare a workflow event that re-indexes the parent object if it has an attribute of the new datatype.

Our example assumes that the built-in eZ Publish search engine is being used, for simplicity and ease of understanding. However, we present a framework that can be tweaked to work with other search engines or to make it work regardless of the search engine used.

Let's take a look at the files in the example extension childrenindexer:

datatypes/childrenindexer/childrenindexertype.php

<?php

include_once( 'kernel/classes/ezdatatype.php' );

define( 'DATATYPESTRING_CHILDRENINDEXER', 'childrenindexer' );

class ChildrenIndexerType extends eZDataType
{
    function ChildrenIndexerType()
    {
        $this->eZDataType( DATATYPESTRING_CHILDRENINDEXER, 'Children Indexer' );
    }

    function isIndexable()
    {
        return true;
    }

    function metaData( $contentObjectAttribute )
    {
        $db =& eZDB::instance();
        $contentObjectID = $contentObjectAttribute->attribute( 'contentobject_id' );
        // Find words indexed for children of the current object's main node
        $words = $db->arrayQuery( "SELECT word 
                                   FROM ezcontentobject_tree ot,
                                        ezcontentobject_tree t,
                                        ezsearch_object_word_link l,
                                        ezsearch_word w
                                   WHERE ot.contentobject_id=$contentObjectID
                                     AND ot.main_node_id=ot.node_id
                                     AND t.parent_node_id=ot.node_id
                                     AND l.contentobject_id=t.contentobject_id
                                     AND w.id=l.word_id", array( 'limit' => 1000 ) );
        $metaData = array();
        foreach ( $words as $word )
        {
            $metaData[] = $word['word'];
        }
        $metaDataString = implode( $metaData, ' ' );
 
        return $metaDataString;
    }
}

eZDataType::register( DATATYPESTRING_CHILDRENINDEXER, 'childrenindexertype' );

?>

eventtypes/event/reindexparent/reindexparenttype.php:

<?php
 
include_once( 'kernel/classes/ezworkflowtype.php' );
include_once( 'kernel/classes/ezcontentcachemanager.php' );
 
define( 'WORKFLOW_TYPE_REINDEXPARENT_ID', 'reindexparent' );
 
class ReindexParentType extends eZWorkflowEventType
{
    function ReindexParentType()
    {
        $this->eZWorkflowEventType( WORKFLOW_TYPE_REINDEXPARENT_ID, 'Reindex parent node' );
    }
 
    function execute( &$process, &$event )
    {
        $parameters = $process->attribute( 'parameter_list' );
        $objectID = $parameters['object_id'];
 
        $db =& eZDB::instance();
 
        $pathString = $db->arrayQuery( "SELECT path_string
                                        FROM ezcontentobject_tree
                                        WHERE main_node_id=node_id
                                          AND contentobject_id=$objectID" );
        if ( $pathString )
        {
            $pathString = $pathString[0]['path_string'];
            $path = array_reverse( explode( '/', trim( $pathString, '/' ) ) );
            array_shift( $path );
            // $path now contains node IDs of all ancestors (starting with the parent node)
 
            foreach( $path as $ancestorNodeID )
            {
                /* Find object ID but only if the ancestor's object contains searchable
                   attribute of the childrenindexer datatype. */
                $ancestorObjectID = $db->arrayQuery( "SELECT a.contentobject_id
                                                      FROM ezcontentobject_tree t,
                                                           ezcontentobject_attribute a,
                                                           ezcontentclass_attribute ca
                                                      WHERE t.node_id=$ancestorNodeID
                                                        AND t.contentobject_id=a.contentobject_id
                                                        AND t.contentobject_version=a.version
                                                        AND a.contentclassattribute_id=ca.id
                                                        AND ca.version=0
                                                        AND ca.data_type_string='childrenindexer'
                                                        AND ca.is_searchable=1", array( 'limit' => 1 ) );
                if ( !$ancestorObjectID )
                {
                    break;
                }
 
                $ancestorObjectID = $ancestorObjectID[0]['contentobject_id'];
 
                require_once( 'kernel/content/ezcontentoperationcollection.php' );
                eZContentOperationCollection::registerSearchObject( $ancestorObjectID, false );
            }
        }
 
        return EZ_WORKFLOW_TYPE_STATUS_ACCEPTED;
    }
}
 
eZWorkflowEventType::registerType( WORKFLOW_TYPE_REINDEXPARENT_ID, 'reindexparenttype' );
 
?>

settings/content.ini.append.php:

<?php /* #?ini charset="iso-8859-1"?
 
[DataTypeSettings]
ExtensionDirectories[]=childrenindexer
AvailableDataTypes[]=childrenindexer
 
*/ ?>

settings/workflow.ini.append.php:

<?php /* #?ini charset="iso-8859-1"?
 
[EventSettings]
ExtensionDirectories[]=childrenindexer
AvailableEventTypes[]=event_reindexparent
 
*/ ?>

In addition to the files listed above, you should also create “edit” and “view” templates for both the datatype and workflow event. If you are testing out the code above, you can either ignore the warnings or create these templates (they can be empty).

To use this extension, enable it, then add a “reindex parent node” event to the workflow triggered after content is published. You also need to add an attribute of the “Children indexer” datatype to classes used as master nodes. If your site already has some content, you must re-index your site by running bin/php/updatesearchindex.php.

How the extension works

Let's have a closer look at how this works. Whenever eZ Publish indexes or re-indexes an object, it goes through all searchable attributes and calls the metaData() function on them to collect words to index. All datatypes use metaData() to obtain the set of the words. For example, for the text string datatype it just takes the content of the string, while for binary files external programs are run to extract such words. We created a special datatype that grabs the keywords indexed for a node's children: the SQL command simply finds all child objects and looks in the index tables to get their indexed words. We have limited the number of words to grab to 1000, in order to avoid indexing too much data. (For performance issues on large sites, you should also consider using delayed indexing – see the DelayedIndexing and The cronjob scripts documentation pages for more information.)

Whenever a node is published or re-published, a workflow event “reindexparent” checks whether the parent node (of the main location of the current object) has a searchable attribute of the “Children indexer” datatype. If it does, the workflow event re-indexes the object of this node and tries to repeat the procedure for its parent.

This enables you to use grandchildren as helper nodes if necessary. Consider the following example:

Node/object G (has a childrenindexer attribute)

 +- Node/object P (has a childrenindexer attribute)

    +- Node/object C

If object C is published or re-published, its parent P is re-indexed and will also index words indexed for object C. After that, object G is re-indexed too and will grab the words indexed for P, which also include words indexed for C. Thus, the words indexed for C are also indexed for G.

The extension does not cover the case when a helper node is removed, as there is no suitable trigger available in eZ Publish. However, removing helper child nodes is rare and the remaining words in the index do not usually cause any problems. If you need to fix this, you can, for example, write some cronjob code to re-index objects of master nodes. (Even more sophisticated, your code can re-index only master nodes that have had a child node removed after the last time the cronjob ran).

Omitting helper nodes in search results

When you search for terms included in helper nodes, you will get hits in the helper object (or objects) as well as in the master parent object, so you need to limit the search by class IDs. You can use content/advancedsearch as the main search function on your site and use a GET or POST parameter named "SearchContentClassID" to limit the classes. Or, you can alter the search template to use a configuration file to limit the class IDs.

First, use this setting in a site.ini override file:

[SearchSettings]
SearchViewHandling=template

Then, limit the search to classes defined in a configuration file in content/search.tpl:

{set $page_limit=$search_page_limit|choose( 10, 5, 10, 20, 30, 50 )}
{if $page_limit|not}
    {set $page_limit=10}
{/if}
 
{def $search=fetch( 'content', 'search', hash(
    'text', $search_text,
    'section_id', $search_section_id,
    'class_id', ezini( 'Search', 'ClassID', 'search.ini' ),
    'subtree_array', $search_subtree_array,
    'publish_timestamp', $search_timestamp,
    'sort_by', array( 'modified', false() ),
    'offset', $view_parameters.offset,
    'limit', $page_limit
) )}
 
{set $search_result=$search['SearchResult']
     $search_count=$search['SearchCount']
     $stop_word_array=$search['StopWordArray']
     $search_data=$search}
 
{*
    Remaining lines are the same as in the original content/search.tpl 
    starting with <div class="content-search"> for the base design or
    <div class="border-box"> for the ezwebin design. Remove the /let
    at the very bottom of the template as well.
*}

Finally, enter the class IDs to search in the search.ini(.append.php) configuration file:

<?php /* #?ini charset="utf-8"?
 
[Search]
ClassID[]
ClassID[]=2
# ...
 
*/ ?>

You can extend this model to enable the choice to include or exclude helper nodes from search results. One approach would involve:

  • introducing another array variable in the settings, such as “AltClassID”, which includes the classes in the “ClassID” array variable as well as for the helper nodes' classes;
  • having a checkbox or hidden field named “SearchAlt” on the search form; and
  • altering the fetch function's hash to use “AltClassID” or “ClassID”, depending on whether “SearchAlt” is checked or set. Don't forget that you need to include the “SearchAlt” parameter in the page navigator links (when there are multiple pages of search results).

This article introduced two main issues that need to be considered with helper child nodes: showing or hiding their full views, and including or excluding them in search results. The outlined solutions are functional, powerful and flexible enough to suit many sites, and can be extended or modified.

If you have any questions or comments, feel free to post comments below, or use this contact form to contact me directly. As stated previously, the extensions discussed in this article can be downloaded here.

I would like to thank Sergey Puschin, Balazs Halasy and Peter Keung for reading drafts of this article and giving me valuable feedback.