Import rss as literal html

Author Message

michael depetrillo

Tuesday 15 July 2008 10:14:42 am

Hello everyone

I need to import my rss feed as literal html.

I changed the setEZXMLAttribute method in cronjobs/rssimport.php.

The rss feed is importing OK, but when I go to front-end I do not see all the HTML tags.

If I go into back-end with editor enabled, I still do not see all the HTML tags.

If I go into back-end with editor disabled, I see all the HTML. I can then hit save and the editor and front-end will display the correct HTML.

What piece am I missing here?

The feed I am working with is - http://www.cnbc.com/id/20040302/rssCmp/97305/device/rss/rss.xml

function setEZXMLAttribute( $attribute, $attributeValue, $link = false )
{
    //include_once( 'kernel/classes/datatypes/ezxmltext/handlers/input/ezsimplifiedxmlinputparser.php' );
	
    $contentObjectID = $attribute->attribute( "contentobject_id" );
	
	// echo $attributeValue ."\n";
	
	// ADDED FOR LP
	$contentClassID = $attribute->attribute('contentclassattribute_id');
	if ($contentClassID == 206) {
		
		$inputData = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
		$inputData .= "<section xmlns:image=\"http://ez.no/namespaces/ezpublish3/image/\"\n";
		$inputData .= "         xmlns:xhtml=\"http://ez.no/namespaces/ezpublish3/xhtml/\"\n";
		$inputData .= "         xmlns:custom=\"http://ez.no/namespaces/ezpublish3/custom/\">\n";
		$inputData .= "<paragraph>\n<literal class=\"html\">";
		$inputData .= strip_tags($attributeValue, 			"<span><a><p><h1><h2><h3><h4><h5><ul><li><br><table><tr><td><th><tbody><tfoot><hr><img><embed><object>");
		$inputData .= "</literal></paragraph>";
		$inputData .= "</section>";

		$domString = $inputData;
			
	// END ADDED FOR LP
	} else {
		
		$parser = new eZSimplifiedXMLInputParser( $contentObjectID, false, 0, false );
	
		$attributeValue = str_replace( "\r", '', $attributeValue );
		$attributeValue = str_replace( "\n", '', $attributeValue );
		$attributeValue = str_replace( "\t", ' ', $attributeValue );
	
		$document = $parser->process( $attributeValue );
		if ( !is_object( $document ) )
		{
			$cli = eZCLI::instance();
			$cli->output( 'Error in xml parsing' );
			return;
		}
		$domString = eZXMLTextType::domString( $document );
	}
	
	// echo $domString;
	
    $attribute->setAttribute( 'data_text', $domString );
    $attribute->store();
}

Guillaume Kulakowski

Tuesday 15 July 2008 1:37:46 pm

Hello Michael,

I use eZ for a planet : http://planet.fedora-fr.org.

For that, I store RSS content in Text block.
For a valid xHTML content, I use a tidy and a cleaner parser.

You can inspirate of my code :
http://trac.llaumgui.com/browser/ez_publish/myutils/trunk/cronjobs/planet.php (look at setEZTXTAttribute)

My blog : http://www.llaumgui.com (not in eZ Publish ;-))
eZC on RHEL : http://blog.famillecollet.com/pages/Config-en
eZC on Fedora : just "yum install php-channel-ezc"

michael depetrillo

Thursday 17 July 2008 12:12:37 pm

What does the disabled editor due to the HTML before it saves it to a dom document?

Or I could ask

What does the editor due to the HTML from the dom document before it displays it?

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.