How to speed up import of 1M objects?

Author Message

zurgutt -

Wednesday 22 December 2010 3:38:19 pm

I have to migrate lots of content from one ez installation to other (4.0 -> 4.3). It is not a straight upgrade, there are custom scripts to convert objects to new classes etc.

Problem is, there is nearly a million objects, so while export runs at reasonable speed, the import/publish operations are slow and by my estimates would take days to finish.

I can dedicate a server for this operation and tune it specificly. It is a reasonably fast box with Xeon [email protected] and 12G of ram.

Can you suggest any specific tuneups or tricks to temporarily speed up insert/publish operations for the duration of import?

Certified eZ developer looking for projects.
zurgutt at gg.ee

Jérôme Vieilledent

Wednesday 22 December 2010 10:03:20 pm

Hi Zurgutt

SQLIImport tunes up some performance settings for imports such as :

  • View cache deactivation (only for the script)
  • Delayed indexing

Once the import process is over, a cleanup cronjob runs to clear the cache and trigger indexing.

If you're not using this extension, maybe you should consider it. You could do your transformation stuffs in your important handler :)

Ivo Lukac

Thursday 23 December 2010 4:41:51 am

I second everything what Jerome wrote. With additional few notes:

1. most important thing is to spread nodes over lot of parent nodes. We had lot of bad experience with importing thousands of objects under same node as single publish is a bit slower with every new sibling. I didn't have time to investigate why is that, maybe it can be avoided somehow...

2. to reduce single publish try to hack temporary "publish" operation definition in kernel/content/operation_defintion.php and remove every method that is not crucial, like:
post_publish, remove-temporary-drafts, create-notification, register-search-object, generate-object-view-cache, clear-object-view-cache, pre_publish.
Maybe even some others. You need to know exactly what you are doing, of course. Try different hacks with couple of thousands and measure the single average publish time....

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

gilles guirand

Thursday 23 December 2010 1:22:55 pm

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Ivo Lukac

Tuesday 28 December 2010 3:18:31 am

"

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

"

No, with hack I mean go to kernel/content/operation_defintion.php and comment out some parts of publish method :) temporary just for importing

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Ivo Lukac

Tuesday 28 December 2010 5:43:44 am

Aditionaly, it could be lucrative performance wise to hack out some features (e.g. browserecent, etc), but generally I think those should be possible to disable through ini settings.

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.