How to import HTML into an eZXMLObject?

Author Message

Rainer Krauss

Thursday 25 June 2009 1:39:13 am

Hi,

I'm importing content into my eZ Publish installation.

Some of that content is an article, and the content subset for the body is formatted as XHTML. It When I try to import that into an eZXMLObject, the routine Luke describes at
http://serwatka.net/blog/ezxmltext_how_to_store_and_ouput_your_content
does not work properly.
$parser->process works fine, but eZXMLTextType::domString gives me PHP fatal errors. - Sometimes that is. It's working for XHTML tags such as b and i, but not for font, img, p, ...

Do you have experience with this and can shed some light on how I may import my data successfully? Is there maybe another way to import data .. do I need to do further data conversions (i.e. p tags can be imported when I replace them with paragraph tags) .. or is there something else I have not yet thought of?

Best wishes,
Rainer

Heath

Thursday 25 June 2009 1:50:43 am

We've done this before and published an example,
<i>http://svn.projects.ez.no/bcimportcsv/trunk/extension/bcimportcsv/bin/bccsvjoomlacontenttablehtmlimport.php</i>

The key here is to replace those tags. Our example transforms variable html into valid ezxml (including replacing img and a tags with content object embeds).

Cheers,
Heath

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

André R.

Thursday 25 June 2009 5:36:34 am

Another idea is to use the html parser in Online Editor (5.0), since it already supports quite much (x)html. But I have not had any time to test it, so don't have any code examples other then the one in ezoe.
see: eZOEXmlInput::validateInput() in http://svn.projects.ez.no/ezoe/trunk/ezoe/ezxmltext/handlers/input/ezoexmlinput.php

It will not handle images though, as those are embed tags in ezxml, and you'll need to first import the image in eZ and add a id on the image tag in the form "eZObject_<object_id>".

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Thursday 02 July 2009 12:13:41 am

Thank you, Heath and Andre.

Is there an overview on which HTML tags eZ Publish would accept, please?

André R.

Thursday 02 July 2009 12:59:11 am

The normal xml handler is documented here:
http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
It will accept <h[1-6]> in input as well as of 4.1.

The xml handler in OE will accept the html variants of the tags there, where:
literal -> <pre>
anchor -> <a name="">
embed (image) -> <img id="eZObject_<object_id>" />
In addtion the <u>, <sup> and <sub> tags are mapped to custom tags if enabled.

It is not documented since it was not meant for external imports. So at the moment, enable the 'code' button in ezoe.ini to be able to take a look at what kind of xhtml it uses internally(or use firebug or similar point and click html debuggers).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Monday 06 July 2009 2:25:59 am

Thank you André.

Say, does the parser work case sensitively? Does it mind in case the text to parse contains a paragraph tag starting with <P instead of <p ?

Best wishes,
Rainer

Rainer Krauss

Monday 06 July 2009 2:42:07 am

...it's not the parser that's being selective on case, but XHTML by definition requires all tags to be lower case, different from HTML.

I thus made all HTML tags in the text I parse lowercase using the following PHP function found here: http://www.codingforums.com/archive/index.php/t-108303.html

function lowerCaseHTML($Matches) {

if (preg_match("/<([^>]+)(\s\w+)=([^>]+)>/i", $Matches[1], $NewMatch)) {
return "<" . strtolower($NewMatch[1]) . strtolower($NewMatch[2]) . "=" . $NewMatch[3] . ">";

} else {
return strtolower($Matches[1]);

}

}

André R.

Monday 06 July 2009 2:44:56 am

Its not case sensitive when it comes to tag and attribute name, but it is on tag text content and attribute values (like you would expect :) ).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.