Learn / eZ Publish / How to Import and Export RSS Feeds

How to Import and Export RSS Feeds

RSS (Really Simple Syndication) is an XML web standard used to distribute information about a website's content to other websites and Internet users. This involves the process shown below.

An overview of how RSS is used

RSS feeds are usually accessed and processed automatically at set time intervals, such as each hour or each day. In this way, they can deliver a constant stream of changing and up-to-date information.

With eZ Publish, you can import RSS feeds from other websites and include them on your eZ Publish site. You can also create RSS feeds of any of your site content – such as articles, blog entries, and forum entries – to be included on other sites, accessible to RSS readers, and more.

Getting the most out of RSS

There is a large number of RSS feeds available via the Internet. However, the fact that RSS is technically possible does not automatically mean it will be valuable for your business, clients, or website visitors. Some useful things to do include:

  • Asking website visitors what other websites they find useful (then check if these have RSS feeds worth importing to your site)
  • Checking similar or competitor websites to see how they use RSS
  • Thinking about what evidence there would be that RSS feeds are useful for your business and visitors. For examples consider concrete metrics such as leads, hits, and in-bound and out-bound clicks.

If it is not clear why and how particular RSS imports or exports are of value, then chances are, your clients and visitors will not be either. In a nutshell, it is worth considering the “why” questions before getting on to the “how to” questions.

Article prerequisites

Note that this article assumes that you have administrator-like permissions in order to access the Setup tab in the Administration Interface. For information about the general layout of the Administration Interface, see this article or the eZ Publish Content Management Basics book. You should also have some basic experience with eZ Publish templates, configuration files, and the object-oriented content model.

This article uses examples with eZ Publish version 4.0, but is also relevant for earlier versions.

RSS imports

It is possible to configure eZ Publish to parse RSS feeds on every page load and present the content of those feeds to website visitors. However, in an RSS context at least, this has a number of disadvantages because it results in the following:

  • A high load on your web server
  • Slower response time for website visitors
  • Increased web traffic for the RSS file from its originating website

eZ Publish offers a neat and efficient solution: it imports RSS items, creating a new eZ Publish object for each item. For example, BBC News RSS items could be imported as custom eZ Publish "BBC News RSS Item" objects. In addition to avoiding the problems noted above, this also means standard eZ Publish functionality can be used for:

  • Fetching and displaying RSS items
  • Searching RSS items
  • Caching pages with RSS content

This method of handling RSS import is shown in the diagram below.

Overview: eZ Publish RSS import

As indicated by Step 1 in the diagram above, the RSS import process is initiated by an eZ Publish cronjob. This is a script run at set intervals to handle eZ Publish functions like cleaning up drafts, sending e-mail notifications, completing workflows, and importing RSS feeds. Cronjobs will be described more later.

RSS exports

In eZ Publish, RSS exports work by presenting selected website content in an RSS feed. An overview of this is shown in the following diagram.

Overview: eZ Publish RSS export

The broad process for importing RSS feeds involves:

  1. Selecting an RSS feed to import
  2. Configuring eZ Publish to run a PHP cronjob script so that eZ Publish can periodically collect RSS information from an external URL
  3. Creating a content class whose objects will hold the RSS items
  4. Configuring eZ Publish RSS import features to automatically generate these new objects from an RSS feed
  5. Creating a custom template(s) to display the imported RSS items effectively

Selecting an RSS feed to import

The first step is to select an RSS feed to be imported into eZ Publish. If available, RSS feeds are usually indicated by descriptive links or images similar to the RSS logo shown below.

RSS logo

You might find useful RSS feeds by doing a web search for “+RSS +(topic of interest)”, or by searching feed directory sites like www.rssfeeds.com or Syndic8.

This article uses a tennis RSS feed from BBC Sport. The URL of this RSS feed is http://newsrss.bbc.co.uk/rss/sportonline_world_edition/tennis/rss.xml.

Viewed in a web browser, the main part of this feed appears as follows.

Example RSS from BBC Sport

Basically, an RSS feed contains general information about the feed as well as the individual RSS items, as shown below.

Example RSS sections

The format of an RSS feed conforms to the RSS specification (for more information, see Wikipedia: RSS, RSS 2.0 Specification at Harvard Law, or the RSS Advisory Board). The main data elements in an RSS feed item include a title, link, description, and publication date. RSS feeds can also include elements such as:

  • Copyright information
  • Feed image data
  • Author name and email address
  • A link to comment pages for each content item

Note that eZ Publish RSS importing does not handle elements that use XML namespaces (such as <content:encoded> and <dc:creator>). This is because these elements are an extension of the RSS 2.0 specification, not a part of it. These elements come up in many news (such as the BBC RSS feed used in this guide) and some blog system feeds. Basically, if an element has a colon in its XML tag name, it cannot be imported. (For background information about namespaces, see this article on extending RSS 2.0 and creating custom attributes.)

RSS feed copyright, attribution and terms of use

RSS feeds typically have copyright and terms of use information associated with them. This may be indicated in the feed itself (for example, in <copyright> tags), and/or on the site where you found the RSS feed. Pay close attention to this information. Looking at the "BBC RSS Feed Standard License Terms" associated with the example feed we are using, some of the terms of use include:

  • Crediting the BBC as the source of content
  • Linking back to the BBC site and content
  • A requirement that use is non-commercial
  • A requirement that the BBC trademark is not used on your site

If you plan to export your own RSS feeds (described later in this article), it is a good idea to specify your RSS feed's terms of use too. You can use standard licensing schemes like Creative Commons or seek legal advice for stricter needs.

As explained earlier, cronjobs are a way to automatically run scripts on a server, and enable some of eZ Publish's most useful features such as RSS importing.

There are two parts to configuring these automated functions:

  • Configuring eZ Publish cronjob settings
  • Configuring your server's cronjob settings

These steps and the overall cronjob process are shown in the following diagram:

eZ Publish cronjob overview

Configuring eZ Publish cronjob settings

The core eZ Publish script for executing cronjob activities is runcronjobs.php in the root eZ Publish folder. The runcronjobs.php script can be run with parameters, in order to run different sets of eZ Publish cronjob activities, according to how often they should be run. For example, in the broader scope of eZ Publish, you may decide that workflows and notifications will be run every three hours (using a command such as runcronjobs.php frequent), and RSS importing once a day (using a command such as runcronjobs.php rssimport).

Different sets of cronjob activities are called "cronjob parts", and are set in cronjob.ini. Examples of cronjob parts are shown below:

[CronjobPart-infrequent]
Scripts[]=basket_cleanup.php
Scripts[]</span>=linkcheck.php
 
[CronjobPart-frequent]
Scripts[]=notification.php
Scripts[]=workflow.php

To configure a cronjob part specifically for RSS importing, create a settings override file /settings/override/cronjob.ini.append.php with the following content:

[CronjobPart-rssimport]
Scripts[]=rssimport.php

This creates a cronjob part called "rssimport" that executes the script for importing RSS items (which we will configure shortly). With this done, we can give the runcronjobs.php script a parameter that tells it to only run the cronjobs defined under the “rssimport” cronjob part: runcronjobs.php rssimport.

Remember to clear your site's INI cache for the INI changes to take effect (see the Cache window on the right of your site's Administration Interface). For more information on runcronjob.php parameters, see the documentation called the cronjob script.

Configuring cronjob settings on your web server

This section shows how to create a cronjob using cPanel. Cronjobs can also be created and configured from your server's command line (do a web search for “crontab”, or see the eZ Publish documentation on cronjobs).

Many web hosting providers offer cPanel (or a similar web-based interface) as a way to manage and configure the hosting environment.

The URL for cPanel may be something like http://www.(yourwebsitedomain).com/cpanel.

Once you are logged in, click on the Cron jobs item, which will bring up a portal page similar to the screenshot below.

cPanel cronjob page

We will walk you through the "Standard" method of setting up a cronjob. The details of a cronjob can then be specified as shown below.

cPanel cronjob details

There are two elements:

  1. The “Command to run”, which specifies what the server should do
  2. The frequency with which that command should be run – as specified in minutes, hours, days, and/or months

There will be one set of these entries for each cronjob that has been configured for your web hosting service. If there are any existing entries, it will probably be best to leave these and add a new one; check with an expert if you are unsure. You can also use multiple entries to run different eZ Publish cronjob tasks (if you have defined multiple cronjob parts) at different times.

The full text of the “Command to run” used here is shown below, with brief explanations of each part.

Cronjob command breakdown

Expect this command to be slightly different for your site’s server. Translated into human terms, it says “change to the root directory of the eZ Publish installation, and using PHP (located at the path specified), run the runcronjobs.php script (with some parameters)”.

Next, configure the time and frequency information for the cronjob. As processing cronjobs places extra load on your server and can impact your site’s performance, give some thought to the details chosen. Check the site from which you are sourcing your RSS feed – it might have some information on how frequently its RSS feeds are updated. Note also that some RSS feeds have a <ttl> (time to live) element that can give information about how often it is useful to run an RSS import cronjob.

It is logical to wonder what happens if you import RSS items too frequently – in addition to the load on your server, will you end up with multiple copies of each blog or news item? The answer is no. This is because eZ Publish skips those that already exist as imported eZ Publish objects.

When you are done setting the frequency of the cronjob, click the Save Crontab button at the bottom of the page -- and you are done setting up the cronjob!

We will create a new class to contain the imported RSS items. It is possible to load RSS data into any eZ Publish object class, but creating a new custom class provides the following benefits:

  • The meaningful organization of eZ objects (that is, RSS items are of the distinct RSS items class)
  • Easier preparation of template files and overrides to display the imported RSS items
  • Easier handling of any user / user group permissions for RSS items (if required)

Note that you do not need to create a new RSS item class for every new RSS feed you import. You might have, for example, a class for RSS news items and another for RSS blog items.

At this point it is useful to examine the contents of the RSS feed you want to import. This can be done by saving it to your computer and then examining the file with a text editor or using “View – Page Source” in your web browser.

The following illustration has been formatted and edited slightly to make the structure clearer. It shows the basic structure of the news items in the BBC Tennis RSS feed. There is a certain amount of header information, as shown below, but the main area to pay attention to is the content of the actual news items. This is because it will directly influence how the new class should be structured.

Example RSS elements

Looking at the RSS item above, the following elements are useful to import:

  • Title (the RSS feed title, which will be used to identify the source of the RSS item)
  • Title (the title of the individual RSS item)
  • Description
  • Link (to the originating page)
  • pubDate (publication date)

Creating a new class

First, click the Setup tab in the Administration Interface. Then, click the Classes link in the left menu. In the Class groups window, click the "Content" class group, then click the New class button.

This will bring up the Class Edit Interface, as shown below:

Class edit interface

To add each attribute, select the attribute type from the dropdown list at the bottom of the page, then click the adjacent Add attribute button. Rather than repeating each individual step to create the class in this article, the final class information is summarized below.

Name Identifier Attribute Type Default Value Flags
Title title Text line Empty Searchable
Description description Text block (not applicable) Searchable
Link link URL (not applicable)  
Source rss_source Text line Empty  
Publication Date & Time pub_date_time Text line Empty  

Once these attributes have been entered, click the OK button to save the new class.

Creating a new location to contain the RSS objects

Now that we have created the class for the imported RSS objects, we need a location at which to place the objects.

First, click the Content structure tab in the Administration Interface. Next, navigate to the location where you want to place a container for the “RSS Item” objects and create a "Folder" object (although any other container object will do, depending on your needs).

Once the folder has been created, it is important to set the sorting order for its sub-items. This will make it possible to present, for example, an embedded view of the five most recently imported RSS items. At the bottom of the Sub items window, set the sort method to use the published dates in descending order:

Setting the sort method

Now we can return to the Setup tab and click the RSS link in the left menu. Then, click the New import button.

New import button

The RSS Import interface will be displayed.

RSS Import interface

Assign a name for this import, and enter the source URL for the RSS feed. Then, click the Update button. If you have entered a valid feed URL, eZ Publish will reload the same interface with some more fields in order for you to specify the destination for the RSS items that are to be imported.

RSS Import interface more configuration

Click the Browse button adjacent to the Destination path input field, then find the folder you previously created.

Then, in the Class dropdown list, select the "RSS Item" class you created, and click the Set button.

The interface will reload again, with more fields to specify the mapping of RSS item elements from the source feed to object attributes of the destination class.

RSS mapping of elements to attributes

The Title, Description, Link, Source, and Publication Date & Time fields are the attributes of the "RSS Item" we previously created. The task now is to select, from the dropdown lists beside each of the attributes, the elements from the XML file that should map to the attributes, as shown below:

RSS example mapping

The other object attributes listed relate to publish and modification dates, which can be ignored; eZ Publish will assign these attributes to be the dates that the items are actually imported. (As previously mentioned, eZ Publish does not currently support importing the dates specified in the RSS feeds into the Published object attribute.)

Finally, mark the Active checkbox and click the OK button to finish setting up the RSS import for this feed.

RSS import confirmed

The result

After the previous steps have been completed, and with a cronjob having run, the folder that was previously created will now contain "RSS Item" objects:

"RSS Item" objects

Now that the RSS import has been configured and runs properly, we must display the new objects on our site. Here, we will assume that you want to display the newest items in a sidebar, showing the titles, short descriptions, and with links to the original articles:

Display of imported RSS feed

If you have other uses in mind, remember that these are now normal eZ Publish objects, meaning that you can use all of the usual eZ Publish template and display logic available.

Line view template for "RSS Item" nodes

First, we need to configure the override rule, then we can create the template to display our imported news items.

Editing the override rule configuration file (override.ini)

In order to instruct eZ Publish to use our to-be-created override template, we must edit the override.ini.append.php file for the public siteaccess, located at /settings/siteaccess/public_site/override.ini.append.php.

(Replace “public_site” with the name of your public siteaccess.)

Add the following entry to the override file:

[line_rss_item]
Source=node/view/line.tpl
MatchFile=line/rss_item.tpl
Subdir=templates
Match[class_identifier]=rss_item

Creating the line view template

The next step is to create the template file specified in the new override entry. The template file will need to be located in a design directory corresponding to the override entry. For sites that use the Website Interface, the location in which to place the template would be /extension/ezwebin/design/ezwebin/override/templates/line/rss_item.tpl.

(Alternatively, you would create a separate design extension to hold all of your custom templates. This is beyond the scope of this article.)

The code for our example line view template for the "RSS Item" object is as follows:

{* RSS Item - Line view *}

<div class="content-view-line">
   <div class="class-rss-item">

       <h4><a href={$node.data_map.link.content} target="_blank">{$node.data_map.title.content|wash()}
 </a></h4>
      {if $node.data_map.description.has_content}
      <div class="attribute-short">
       {attribute_view_gui attribute=$node.data_map.description} 
 ({attribute_view_gui attribute=$node.data_map.rss_source})
       </div>
      {/if}

   </div>
</div>

Although it is left out of the template above, it is also possible to add the date and time the RSS item was published on the source website, by displaying the value of our "Publication Date & Time" attribute. This would be achieved by adding an extra line, probably after the line that ends with ".rss_source})":

{attribute_view_gui attribute=$node.data_map.pub_date_time}

Embedding the RSS items in a page

A simple way to add the RSS feed to the site is to include its folder as an embedded object on, say, the site’s homepage. The following description assumes that the homepage is an eZ Publish "Frontpage" object.

First, log in to either the Administration Interface or Website Interface and open the Object Edit Interface for the frontpage. In the right column section, click the Insert object icon.

Embed button

In the modal dialog that appears, click the Add existing button, then select the folder that contains your imported RSS items. Assign the "Class" property and the limit for the number of RSS items to show, as per your needs.

Embed properties

Finally, click the OK button to close the modal dialog, then click the Send for publishing button to save the changes. You have now successfully completed the RSS import steps! The feed items will display as shown at the top of this page, and the list will automatically be updated whenever new items are imported from the source site.

The explanations in this article focus on importing only one feed to eZ Publish; it is of course possible to import multiple feeds.

Items from different RSS feeds can be imported to either the same or different locations in the eZ Publish content structure. For example, consider sourcing multiple tennis news feeds from, say, the BBC as well as American and Australian news sources. You could aggregate all items in one eZ Publish location, so that they are all displayed in the same combined list on your site. Or, you can create multiple folders and import the different feeds to different locations, displaying them separately. (Technically, you could also use eZ Publish template logic to display items together from different locations.

It is also important to note that eZ Publish is not currently able to handle different RSS import tasks at different times. For example, suppose you want to import the following RSS feeds:

  1. BBC Tennis News (which updates, say, every 3 hours)
  2. ABC Australia Tennis News (which updates, say, every hour)
  3. CNN Tennis News (which updates, say, every 6 hours)

Although you can run different eZ Publish cronjob scripts at different times, there is currently no way to run separate RSS import tasks for each of these RSS feeds. eZ Publish will attempt to import every RSS feed every time its runcronjobs.php script is called by a cronjob on your web server. Therefore, in the example above, you would need to pick a cronjob frequency that runs at least some of the RSS imports more or less frequently than they are refreshed. Functionally this does not matter (any items already imported will be ignored), but it will mean some extra unnecessary work for your server for each additional feed you import.

Over time, unless you manually delete them, the imported RSS items could potentially number in the thousands. Having a large number of RSS objects may eventually become a burden on your server. There are posts on the eZ Publish forums that discuss ways to archive older RSS items, such as one titled Best way to remove old RSS files.

With the eZ Publish RSS export feature, you can make selected parts your site's content available to other websites and Internet users via RSS. Then, whenever you add or modify content, your RSS feed is automatically updated and thus the changes are syndicated to all relevant parties. This is simpler to set up than the import procedure and can be done completely in the Setup tab of the Administration Interface.

Conceptually, RSS export works by displaying selected eZ Publish content in an RSS format instead of using the standard eZ Publish templates. This means that, in several ways, eZ Publish RSS feeds are like other eZ Publish content:

  • RSS feeds are available at a certain URL within an eZ Publish site
  • eZ Publish caching functions operate for RSS feed exports
  • It is possible to use the eZ Publish permission system to control access to RSS feeds.

Example procedure

To create an RSS feed, first click the RSS link in the Setup tab. Then, click the New export button in the RSS exports window.

New export button

The RSS Export interface will be displayed, as shown below.

RSS Export interface

The fields for the new RSS field are explained in the following table.

Field Explanation Example
Name The name of the RSS feed to create / export. “eZ Tennis Club News”
Description A brief description of the RSS feed that will be included in the XML file (and therefore may be used by people who import or use the RSS feed). “eZ Tennis Club News, including tournaments, training events, game tips, and social events news”
Site URL The base URL for the RSS feed; see the in-context help for more information “http://www.eztennisclub.com/index.php”
Image An image or logo to represent the source of the RSS feed. For example, BBC Tennis news uses a BBC Sport logo. (This is being made available for people to use in conjunction with the RSS items being exported, subject to any terms of use you specify.) An image in your site's media library
RSS version The RSS version for the export (either RSS 1.0 or RSS 2.0). “2.0”
Number of objects The number of eZ Publish content objects to include in the RSS feed at a time. Note that the objects are ordered based on the sort order specified for sub-items at the source location. “20”
Active When marked, this sets the RSS feed as being live and accessible. In the unmarked state, the RSS feed is not accessible. (Checked - True)
Main node only When marked, this means that secondary locations for objects at the source location (to be defined shortly) are not included in the feed. (Checked - True)
Access URL The path where the RSS feed will be publicly accessed. “ez_tennis_club_news”

The bottom part of the RSS Export interface specifies the eZ Publish site content to be included in the RSS feed:

Source location information

The fields are explained in the following table. Basically, the sub-items of the location(s) you specify will be included in the RSS feed.

Field Explanation Example
Source path The node whose sub-items are to be included in the RSS feed. “/Home/eZ Tennis Club News”
Subnodes Whether or not to include content objects from sub-nodes (that is, more than one level below) of the source node. (Unchecked – false)
Class The class whose objects are to be included in the RSS feed. Sub-items that are of other classes will not be included. (Article)
Title This sets the attribute of the eZ Publish object that will be used to provide each RSS item's Title element.
(Note that the attributes available to assign as either the Title or Description elements are populated when the adjacent Set button is clicked.)
(Title)
Description This sets the attribute of the eZ Publish object that will be used to provide each RSS item's Description element. (Summary)

It is possible to add multiple subtree locations as sources for the RSS feed. To add another source location, click the Add source button.

When you have finished configuring the RSS export, click the OK button at the bottom of the interface. The new RSS export will then be listed, as shown below.

RSS export confirmed

Completing an RSS export

With the steps above completed, the RSS export will be available on your website and will be automatically updated whenever you add or modify content at the source location(s). You can check this by going to the URL for the RSS feed, such as http://www.eztennisclub.com/index.php/user_siteaccess/rss/feed/ez_tennis_club_news.

However, having the RSS feed available is only one part of the exercise; you will want to do some things to encourage and organize the use of the RSS feed. For example:

  • Tell people the RSS feed is available, with information on your website about the feed and the URL where it can be accessed
  • Provide information about how often it is useful for people to check the RSS feed for new content (for example, daily if new items are added daily)
  • Provide information on copyright and terms of use (see the earlier comments from the RSS import explanations)
  • Consider listing the RSS feed on RSS directory sites
  • Identify website statistics to monitor in order to assess the performance of your RSS feed