Forums / Developer / Importing Data

Importing Data

Author Message

Deane Barker

Thursday 10 February 2005 10:05:26 am

I have looked for an answer to this, and I have found snippets here and there, but nothing comprehensive. I have a promise for anyone who helps me answer this -- read to the bottom of the post.

I need to import several hundred records from a flat, "traditional" database table to eZ publish. Consider a normal database table with these columns:

FirstName
LastName
Title
Bio

For each record in this table, I need to create an object in eZ from a custom class -- Person. Then I need to transfer the contents of each cell to attributes on that object and store it. For now, let's say all the datatypes are simple text datatypes.

So my object looks like this:

Person
-> FirstName
-> LastName
-> Title
-> Bio

Some assumptions:

-- All objects will have the same node assignment (they'll be in the same folder in the tree)
-- All objects will have the same user as their owner
-- This is a one-time import
-- I want to run this script from the command line

Please know that I have looked for answers. I have the book, and there's a section starting on page 136 that covers the code. You can download this code as a script called "create.php". Sadly this script keeps dying when it tries to instantiate the user. I'd fiddled with it as much as possible, but it will not return a user from the "fetch" method, so the whole things comes apart there.

(What bothers me is that I don't need the user. From what I can tell, I just need the user ID, which I have as an integer. Why do I need to go to all the trouble of instantiating the user just to get their ID when I have to provide the ID to instantiate the user in the first place? I have what I need already.)

I found another thread on these boards here:

http://ez.no/community/forum/developer/inserting_article_through_code_problem

This person is having another problem with this. In this script was some code for adding a folder, but I couldn't get that to work either.

I've been working at this for three hours now, and I'm no closer than I was.

This strikes me as (1) a pretty common operation, and (2) a barrier to entry for new users. I'm sure some people have turned away from eZ because they didn't think they could get their data in easily. (They're right, it turns out.)

I believe in what this guy wrote here:

http://forum.ezpub.co.uk/showthread.php?p=326#post326

"Are those who know how to use eZ publish 3 are too busy to write usefully use case documentation, provide example source code for custom solutions ... It just seems that most people don't contribute back entire solutions which implement the same key changes that just about every user must figure out, implement, and over come in order to configure eZ publish for use in a production web site."

I believe in community, and I believe in giving back when someone helps you.

So, my offer is this: if someone helps me to what I'm trying to do above, I will write an import script and documentation. I will document the crap out of the code and make it utterly, crystal clear to newbies how they can get their data out of Access or some other flat database table and quickly and easily turn it into objects in eZ. I'm a pretty decent writer, and I have no doubt that if I can understand it, I can get other people to understand it too.

Will someone please take me up on this?

Deane

Deane Barker

Thursday 10 February 2005 12:42:46 pm

I finally figured out that you need to set "debug-output" to "true" when instantiating the ezscript object, like this:

$script =& eZScript::instance(
array( 'debug-message' => '',
'debug-output' => true,
'use-session' => true,
'use-modules' => true,
'use-extensions' => true ));

When I did this, it finally told me the actual error:

Fatal error: eZINI: Undefined group: 'FileSettings' in [path]lib\ezutils\classes\ezdebug.php on line 427

So the problem I'm having with the code has nothing to do with instatiating the user object -- the actual error is far above that line.

Of course, I still don't know how to fix it, but there you go.

Kristian Hole

Thursday 10 February 2005 4:40:08 pm

Hi

Allways good when people contribute their code :)

>(What bothers me is that I don't need the user. From what I can tell, I just need the user ID, which I have as an integer. Why do I need to go to all the trouble of instantiating the user just to get their ID when I have to provide the ID to instantiate the user in the first place? I have what I need already.)

In a plain install (or most cases) you can use the userID 14, which is the default administrator.

The function importRSSItem in cronjobs/rssimport.php has some of the basic functionality. It can be used as an example on how to create a new object, filling the attributes and publishing it.

Hope this can be of some help.

Kristian

http://ez.no/ez_publish/documenta...tricks/show_which_templates_are_used
http://ez.no/doc/ez_publish/techn...te_operators/miscellaneous/attribute

Andrew Kelly

Tuesday 25 July 2006 7:36:57 am

Hi Deane,

If you ever did write that commented code, I'd really (REALLY) like to have a look at it.
I mean really.

Andy

Deane Barker

Tuesday 25 July 2006 8:12:46 am

We never did write this. After all these months, I don't remember how we ended up getting around the problem.

I have brought up with David Dempsey the lack of a really good import process for eZ. In fact, the U.S. business lost a big deal over here solely on the lack of import. The customer went with Cascade Server because it had a direct import from static HTML.

We're finding, more and more, than deals are often closed or lost based on data migration costs. (example: we were working on a deal with a newspaper. They had 4,500 articles in differing formats to get into the system. The cost was prohibitive.)

For a big implementation, data migration can easily be the largest expense in a project (example: one client we're pitching right now has 29,000 static HTML pages to bring in).

It can be done, of course, but for us, it's always ended up being a very ugly hack.

Deane

Andrew Kelly

Wednesday 26 July 2006 1:00:56 am

Hi Deane,

thank you very much for your response.

Frustrating innit? There aren't many "luke warm" areas in eZ at all; it's either
great or horrid, no middle ground.
Oh, well...

Andy

Betsy Gamrat

Sunday 30 July 2006 10:42:01 am

Hi,

Importing objects into eZ publish is really pretty straightforward.

I start with the code associated with this book http://ez.no/products/books/learning_ez_publish_3, which has (or had) some scripts online.

I changed the code to use attribute mapping with an array to map the CSV fields to the attributes and indicate the type of data for storage.

The code listed here is intended to be added to the code from the book, it isn't the full listing.

My answer to the issue of importing the data is that the PHP import script available as supplementary materials for the book is excellent. I am submitting my modifications to the book's code, because they simplify the code.

I have used this basic approach to import thousands of objects into eZ, very successfully.

<b>Map array definition</b>

/* 

$attribute_map 
      indexed by the name of the attribute
      index indicates the index of the data array that contains the data for import
      type indicates the type of data which will be stored
        if you are unsure how the data is stored (what type), 
          use MySQL to view sample data with the type in question
*/

$attribute_map = array(
        'ID Number'=> array ( 'index'=>0,'type'=>'data_int'),
        'First Name' => array ( 'index'=>1,'type'=>'data_text'),
        'Last Name' => array ( 'index'=>2,'type'=>'data_text'),
        'Composite'  => array ('index'=>'composite','type'=>'data_text'),
        'Date of birth' => array ('index'=>'date','type'=>'data_int'),
        'Price' => array('index'=>4,'type'=>'data_float'),
        'Comments'=>array('index'=>'comments','type'=>'data_text'));

<b>Extraction of CSV data and placement into an array</b>

$fp = fopen('data.txt', 'r');
while ($data = fgetcsv($fp, 1000, '\t'))
{
        /* composite is created from imported data */
        $data['composite']=$data[0] . '-' . $data[2];       

        /* comments is an XML_block, and must be wrapped properly on import */
        $data['comments']=XML_wrap($data[5]);

        /* if the data comes in as a string (for example 5/21/2006), this will convert it */
        $data['date']=make_date($data[3]);

<b>Use of the attribute map array to populate the content object</b>

        foreach (array_keys($contentObjectAttributes) as $key)
        {
                $contentObjectAttribute =& $contentObjectAttributes[$key];
                $contentClassAttribute =& $contentObjectAttribute->contentClassAttribute();
                $attributeName = $contentClassAttribute->attribute('name');
                $attributeType = $attribute_map[$attributeName]['type'];
                $attributeData = trim($data[$attribute_map[$attributeName]['index']]);
                if ($attributeType != 'data_image')  // This code doesn't handle images
                {
                        $contentObjectAttribute->setAttribute($attributeType,$attributeData);
                        $contentObjectAttribute->store();
                }
        }

<b>Supporting functions</b>

function XML_wrap($text)
{
return '<?xml version="1.0" encoding="UTF-8"?>' ."\n".
                '<section xmlns:image="http://ez.no/namespaces/ezpublish3/image/"'."\n".
                '    xmlns:xhtml="http://ez.no/namespaces/ezpublish3/xhtml/"'."\n".
                '    xmlns:custom="http://ez.no/namespaces/ezpublish3/custom/"><paragraph>'.
                $text."</paragraph>\n</section>\n";
}

function make_date ($text)
{
        global $cli;

        $text=strtr($text,' ','-');
        if (($timestamp = strtotime($text)) === -1)
                $cli->output("Invalid string ($node_id - $date)",true);
        return $timestamp;
}

A final note on large document collections. Large quantities of text in eZ can create extremely large searchindexes. In addition, translating them from different formats can be difficult. Consider using the Mussen search engine and the editor.

Xavier Dutoit

Monday 31 July 2006 7:18:22 am

Hi,

Have a look at the import section on the contrib:
http://ez.no/community/contribs/import_export

This one is rather complete:
http://ez.no/community/contribs/import_export/import_xml_data

As wrote, I've hacked it to cope with more datatypes and run from the shell. Felipe got them, I'm going to ask him if it has worked on making it more useful than a simple hack.

X+

http://www.sydesy.com

Felipe Jaramillo

Monday 31 July 2006 10:12:35 am

Thanks for bringing this up.

We are also desperate for solid data import.

Please see our thread about funding development for the CSV import extension:

http://ez.no/community/forum/developer/funding_work_for_data_import_extension_csv

Regards,

Felipe

Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia

Andrew Kelly

Wednesday 02 August 2006 4:49:02 am

Betsy,

thank you so much for your detailed response, it's very much appreciated.

When you refer to the "import script available as supplementary materials
for the book", are you speaking about the file "Chapter04/create.php"?

Andy

Betsy Gamrat

Sunday 06 August 2006 1:29:34 pm

Andy,

Yes, Chapter_04/create.php has created all sorts of data for me. I have modified it, and used it, over and over and over again.

You might also want to check this thread

http://ez.no/community/forum/developer/importing_update_entry_if_it_already_exists/re_importing_update_entry_if_it_already_exists__6

Good luck,

Betsy

Felipe Jaramillo

Wednesday 16 August 2006 4:36:09 pm

We have been working on the importXMLData that was modified by Xavier and sent to us.

His modifications manage to import from the CLI, and we have tested this now with thousands of objects. The funny thing is we couldnt make it work in our Linux servers, only WIndows for now.

We are trying to find a clean way to import foreign keys and relations. remote_id would be a reasonable place to store what we call a RemotePrimaryKey (the primary key to which the other content's foreign key refer). We later realized that the remote_id is stored using a timestamp so it is not possible to fetch it later on using a foreignKey.

On the other hand we have also managed to import Enhanced Object Relation node_id's.

We are still missing the <b>ezimage</b> attribute. See http://ez.no/community/forum/developer/importing_an_image_attribute_from_php

I'll keep you all posted as we move forward.

Regards,

Felipe

 

Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia

Kristian Hole

Thursday 17 August 2006 8:57:40 am

Felipe,

The remote id field is available to store external relations (typically when importing from somewhere). You can use the remote id for your purpose. The reason why its set as a timestamp, is probably just to make it unique.

Kristian

http://ez.no/ez_publish/documenta...tricks/show_which_templates_are_used
http://ez.no/doc/ez_publish/techn...te_operators/miscellaneous/attribute

Xavier Dutoit

Friday 18 August 2006 12:32:41 am

Felipe,

i've been using the import on linux and it works on my side.

Could you describe the problem ?

X+

http://www.sydesy.com

K259

Sunday 10 September 2006 11:31:52 pm

eZ needs a real import tool.

Kristof Coomans

Monday 11 September 2006 1:50:14 am

@K259: feel free to contribute one ;-)

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Norman Leutner

Monday 11 September 2006 2:29:04 am

I think there are already some usefull import and export contributions,
which can be used as an example to create your own one for your secific needs...

Mit freundlichen Grüßen
Best regards

Norman Leutner

____________________________________________________________
eZ Publish Platinum Partner - http://www.all2e.com
http://ez.no/partners/worldwide_partners/all2e_gmbh