How to build an update CLI script without memory problem ?

Author Message

Hakim Bouras

Moderated by: Nicolas Pastorino

Tuesday 05 May 2009 2:07:20 am

Hi,

I write a small client script which goes through 225 user objects to update a new attribute that I added.

When I run the script (client mode), it goes through almost half of the user objects and then stop with the following error :
<i>Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 16 bytes) in (... the php file change when I retry to run my script).</i>

I tried to add a eZContentObject::clearCache(); statement every 50 objects, but it was still the same problem.

Any idea what is not done correctly in my script (or is it because it is not possible to run any script with a limit of 16M of memory)?

Thanks for your help,
Hakim

$parentNodeID = 5;

$users =& eZContentObjectTreeNode::subTreeByNodeId( 
      array( 'ClassFilterType' => 'include',
  'ClassFilterArray' => array( 'user' ),
  'SortBy' => array( 'published', false),
 ),          
      $parentNodeID);

$i = 0;
foreach( $users as $user ) {
 $dataMap =& $user->attribute( 'data_map' );

 // Retriving the email info stored in the user_account data
 $user_account = $dataMap['user_account']->content();
 $old_email = $user_account->attribute("email");

 // Setting the value of the new email attribute
 $new_email = $dataMap['email'];
 $new_email->setAttribute( 'data_text', $old_email );
 $new_email->sync();

 // Clearing the cache every 50 objects
 $i = $i + 1;
 if ($i == 50) {
  eZContentObject::clearCache();
  $i = 0;
 }
}

Ivo Lukac

Tuesday 05 May 2009 2:16:31 am

Hi Hakim,

You can set memory limit on execution like this:

php -d memory_limit=128M yourscript.php

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Hakim Bouras

Tuesday 05 May 2009 2:29:00 am

Thank you Ivo for your help, I am able to run it now.

Any idea, why the script "suddenly" does not have enough memory ?

(I thought that since I am using the same objects during each iteration, there is no reason why the script would suddenly need more memory.)

Hakim

Łukasz Serwatka

Tuesday 05 May 2009 3:55:16 am

Hi,

You are fetching ALL users objects from DB without limit. I would rebuild your script to fetch e.g by 25 users using limit and offset. That should do the trick.

Personal website -> http://serwatka.net
Blog (about eZ Publish) -> http://serwatka.net/blog

Hakim Bouras

Tuesday 05 May 2009 4:32:09 am

Hi Lukasz,

But if the problem is because of the size of the fetch, why do I get the error in the loop (after many iterations) and not just after the fetch instruction ?

Would that mean that the fetch did not explicitely load all of the user data, and each time I am reading the "data_map" for one user (in the loop) I store some more info in memory ?

Hakim

Sergiy Pushchin

Friday 08 May 2009 4:34:33 am

Yes you are right, for large fetches ez publish do not load data_map for object during treenode fetch. and loads it only when you ask for it first time

Hakim Bouras

Friday 08 May 2009 4:58:11 am

Thank you for your explanation Sergiy, it makes the problem and the recommandations of Lukasz clearer now.

Carlos Revillo

Friday 08 May 2009 11:58:57 am

I've ran into the same problem some days ago. adding more "memory-limit" to the script could do the trick, but it's not the best solution.

my situation is like yours. we have to export some user data to cvs files. we started with 128M but when more users registered into the site, the problems started. now we have more than 65.000 registered users and even if you set memory-limit to 1GB the memory get exhausted.

and it seems that the problem is not really an eZ one but a php problem with the memory management.

i suggest to you do this. add a

$cli->output(memory_get_usage()) ;

at the end of every loop. you will see how this is value is increased every step. i mean if you expect a lot of users in your site next months, you'll probably get the problem again.

Event if you unset all variables used, no matter if you change your foreach to for $i=0 ; $i<count($users)... memory will get exhausted.

in a ideal world, php should free memory in every step, but it seems that it's not doing well.

My workaround for this was to build some kinda batch process.

i have two scripts, script1.php and script2.php.

script1 is the one executed with the cron. it it something like

//get the total of users

do 
{
//stuff here
} while $offset < total of users

in every loop script queries a record of a dummy table. at the first step that record is 0.

script1 get that value and pass it to the script to via exec command. i mean something like

exec("script2.php --offset=$offset limit=100);

script2 get the offset param and do the subTreeByNodeId fetch.
when offset is 0, it opens a file to write the results. when is not 0, it opens it for adding results.

when script2 end, it updates the dummy table and set the value to value + 100
script 1 get this value and pass it to script to until the value of the table is bigger than the total users we have.

hope it helps

Remigijus Kiminas

Monday 11 May 2009 12:07:05 pm

Hi,

Why so complicate everything :)

Just add that code bellow your foreach cycle. Every 25 nodes for example.

unset( $GLOBALS['eZContentObjectContentObjectCache'] );
unset( $GLOBALS['eZContentObjectDataMapCache'] );
unset( $GLOBALS['eZContentObjectVersionCache'] );

And no hard coding :)

---------------------------------------------
Remigijus Kiminas

Carlos Revillo

Monday 01 June 2009 11:49:10 am

Sorry, but this is not working for me. i just tried the next piece of code. it would be nice if anyone can do the same and tell everybody about the results

$nodes = eZContentObjectTreeNode::subTreeByNodeId( array("Limit" => 100) , 2 );    
for ( $i = 0; $i < count( $nodes ); $i++ )
{
    $data = $nodes[$i]->dataMap();
    echo memory_get_usage() . "<br />";    
    unset( $node );      
    unset( $GLOBALS['eZContentObjectContentObjectCache'] );
    unset( $GLOBALS['eZContentObjectDataMapCache'] );
    unset( $GLOBALS['eZContentObjectVersionCache'] );
}

As just see, i'm "unsetting" everything in each loop. but that call to echo memory_get_usage();
shows something like

7211636
7214944
7237280
7268092
7328240
7373416
7390356
7457684
7500012
7516584
...

Every time memory usage is increased. It's easy to think that if limit is big enough, memory will get exhausted...

Bartek Modzelewski

Monday 01 June 2009 12:42:06 pm

Hi,

I'm not specialist of this, but I think, that just PHP is not perfect and memory leak happens very offen. I was working on many huge imports and this problem was always comming back. I rememer in one of projects, that setting even 1GB as memory limit did not help in daily data import and stoped script loop after updating about 2200 objects.

Same problem are having other php applications. Although I saw last time very interesting solution in Magento webshop: Product import was launched by ajax, so that each product was imported in single script execution. Import was slow and caused that browser was blocked for some time, but has eliminated memory lack problem.

Other solution used in eZ Publish, was a cli script, that breaks process/import after for ex. each 1000 objects and continue with next execution. This needs some additional work, but for dedicated imports can be very usefull (irreplaceable).

Baobaz
http://www.baobaz.com

Gaetano Giunta

Tuesday 02 June 2009 1:22:29 am

<i>PHP is not perfect and memory leak happens very offen</i>

Sorry, but this is just not true.

"Memory leaks" happen because
- variables are still referenced somewhere and thus not destroyed
- there are circular variable references
- there is an aggressive caching strategy

all of this is part of the application in question, not in the engine itself.

The upside is that it is extremely easy in php to get a dump of the complete list of vars in memory. You can then split up your code in some looping construct, and every 10, 100 or 1000 iterations dump memory contents to a file. After running the script you can compare the dumps and find the runaway variable.

I did once and found out that I was simply appending a value to an array that I forgot to reset at every iteration step. Easy to miss, and would only give problems when run 100.000 times in a row...

Principal Consultant International Business
Member of the Community Project Board

Remigijus Kiminas

Tuesday 02 June 2009 9:56:29 am

Hi,

I did a little bit grepping and found another one variable. Try to append this one, also to your code. Don't know will it help.

$GLOBALS['eZContentClassAttributeCache']

Anywai, i never had memory leaks problem with eZPublish...

---------------------------------------------
Remigijus Kiminas

Stéphane Couzinier

Tuesday 02 June 2009 10:02:09 am

don't forget to clear the user cache
unset( $GLOBALS["eZUserObject_".$userID] );

on ez4.x before the end of the foreach add
eZContentObject::clearCache();
unset( $GLOBALS["eZUserObject_".$userID] );

http://www.kouz-cooking.fr

Carlos Revillo

Tuesday 02 June 2009 10:16:11 am

still doesn't work for me.

I modified the script above to go like this

$userId = eZUser::currentUser()->id();
$nodes = eZContentObjectTreeNode::subTreeByNodeId( array("Limit" => 100) , 2 );    
for ( $i = 0; $i < count( $nodes ); $i++ )
{
   $data = $nodes[$i]->dataMap();
   echo memory_get_usage() . "<br />";    
   unset( $data );
   unset( $GLOBALS['eZContentObjectContentObjectCache'] );
   unset( $GLOBALS['eZContentObjectDataMapCache'] );
   unset( $GLOBALS['eZContentObjectVersionCache'] );
   unset( $GLOBALS['eZContentClassAttributeCache'] );
   eZContentObject::clearCache();
   unset( $GLOBALS["eZUserObject_".$userId] );
}

and the output is like

4143568
4144848
4148140
4171200
4202448
4261052
4307916
4325192
4391336
4434664
4451236
...

Stéphane Couzinier

Wednesday 03 June 2009 3:12:56 am

If you need to extract a lot of data this will work

<?php 
$userId = eZUser::currentUser()->id();  
$limit=1000;
$offset=0;
do{
$nodes = &eZContentObjectTreeNode::subTreeByNodeId( array("Limit" => $limit,'Offset',$offset) , 2 );     

foreach($nodes as $node){
       
$data = &$node->dataMap();
eZContentObject::clearCache();    
unset( $GLOBALS['eZContentObjectContentObjectCache'] );    
unset( $GLOBALS['eZContentObjectDataMapCache'] );    
unset( $GLOBALS['eZContentObjectVersionCache'] );    
unset( $GLOBALS['eZContentClassAttributeCache'] );    
unset( $GLOBALS["eZUserObject_".$node->ContentObjectID] );  
unset( $data ); 
} 

$offset+=$limit;

echo memory_get_usage() . "\n"; 
}while( !empty( $nodes) );

?>

mem usage
59751392
59814408
59946280
59950608
59946168
59952120
59957768
59936880

http://www.kouz-cooking.fr

Carlos Revillo

Monday 08 June 2009 10:55:39 am

Hi Stéphane. Your code really works! thanks a lot.

Luis Micunco

Tuesday 17 November 2009 1:31:41 am

Just a fix in Stéphane's code

$nodes = &eZContentObjectTreeNode::subTreeByNodeId( array("Limit" => $limit,'Offset',$offset) , 2 );

>

$nodes = &eZContentObjectTreeNode::subTreeByNodeId( array("Limit" => $limit,'Offset' =>$offset) , 2 );

Bertrand Dunogier

Tuesday 17 November 2009 2:49:56 am

By the way, you don't need to run both unset ( $GLOBALS['eZContentObject*'] ) AND ezContentObject::clearCache().

What eZContentObject::clearCache() is the exact same thing: it clears these global vars, except for eZContentClassAttributeCache and eZUserCache. Since 4.2.0, you can use eZContentClassAttribute::clearCache() to clear $GLOBALS['eZContentClassAttributeCache'].

For both eZContentObject::clearCache() and eZContentClassAttribute::clearCache(), you can provide an object ID or array of object ID, and in-memory cache will only be cleared for these.

This will make this code a bit more portable, especially since it is most likely that in-memory cache will _ finally _ be refactored in a later version.

Bertrand Dunogier
eZ Systems Engineering, Lyon
http://twitter.com/bdunogier
http://gplus.to/BertrandDunogier

Kristof Coomans

Wednesday 18 November 2009 3:16:58 am

Interesting topic, as I faced this problem myself several times.

What about introducing a specialized implementation of PHP's Iterator interface that hides all the logic of fetching with offset and limit and clearing caches?

The example below shows how the name is displayed of all folders under node 2. After each batch of 100 nodes the iterator takes care of clearing the in-memory caches and fetches the next 100 nodes from the database.

$params = array( 'ClassFilterType' => 'include', 'ClassFilterArray' => array( 'folder' ) );

$nodeList = new eZContentObjectTreeNodeSubTreeIterator( 2, $params, 100 );

foreach ($nodeList as $node)
{
    echo $node->attribute('name');
}

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.