Subtree fecth and performances impact

Author Message

laurent le cadet

Wednesday 04 February 2009 5:41:06 am

Hi,

I need to fetch a whole subtree made of several folders whose contain also folders, then products.

Folder 1 (starting node for the fetch)
- Folder 1.1
-- Folder 1.1.1
--- Product A
--- Product B
--- Product C
--- Product ...
-- Folder 1.1.2
--- Product A
--- Product B
--- Product C
--- Product ...
- Folder 1.2
...

At least there is 10 "first level" folders and 42 "second level" folders.
This structure should not moves.

The site can grow quickly, which means that I will need to fetch for thousands of products.
I only want to fetch products which own to the current user.
Here is the code I'm using :

{def $products                  = fetch( 'content', 'tree', hash( 'parent_node_id', $root_node.node_id,
	 									 'offset', $view_parameters.offset,
							             'attribute_filter', array( 'and',
						  					array( 'owner','=', $current_user_id ) ,
											array( 'class_identifier', '=', 'product' ) ),
										 'sort_by', array( 'name', true() ),
										 'depth', 3,
										 'limit', $page_limit ) )
     $products_counter          = fetch( 'content', 'tree', hash( 'parent_node_id', $root_node.node_id,
							             'attribute_filter', array( 'and',
						  					array( 'owner','=', $current_user_id ) ,
											array( 'class_identifier', '=', 'product' ) ),
										 'depth', 3 ) )
     $products_count            = $products_counter|count
}

As we can see products will be always stored at the depth 3.
This works fine actually but I'm not very sure, due to the futur number of products, that this methods will be really efficient.

Could someone tell me if it will become "dangerous" or if there is a different approach?

Best regards.

Laurent

Denitsa M.

Wednesday 04 February 2009 6:55:48 am

Hello, Laurent,

we did encounter the memory problem with large sets of objects while trying to export some objects in one of our projects. Your example code is in templates, but the problem persist also in php scripts, when we attempt to fetch subtrees at the rate of thousands of objects. Even after script optimization it is not possible to fetch more and 1000-1300 objects at once before script dies in insufficient memory exception( this is with php memory usage of 256MB allowed, also with 256MB for cli ), probably because the memory is not freed until script has finished - here is the issue of unsetting variables, but only as a value in your script, because the memory they had consumed does not get free at all despite that the variables are not at previous values after the unset.

This is why we decided to create a new class as a light version to use for fetches of this rate instead of standard eZContentObjectTreeNode class, and produce much smaller objects, which contain only basic information organized not as the standard objects currently in use by the CMS. Latest test with this new class has allowed us to fetch above 5000 objects at once in php with memory usage at around half of the allowed amount.

Currently this is at stage of php script usage to suit our immediate needs, but probably in time it will develop also in template usage.

So, in few words, the problem with your fetch will appear eventually when the number of products become too large. In such cases the tree count function works fine, but the tree fetch makes your script to fell into a long loop, and eventually throws you to the blank page with php memory allocation error. As far as I know the memory insufficiency cannot be avoided any other way with script optimization in php or templates - the only way is to fetch using limit and offset, which in our case does not serve us good.

Best regards,
Denitsa

Iguana IT - http://www.iguanait.com

André R.

Wednesday 04 February 2009 8:08:32 am

It should help if you set load_data_map to false, but if you load the attributes (using $child.data_map.*) you will as said above run out of memory simply because eZ Publish caches the eZContentObject and eZContentObjectAttribute objects to improve performance.

More on clearing php memory in php scripts (not possible in templates):
http://issues.ez.no/IssueView.php?Id=13006&activeItem=1

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Stéphane Couzinier

Wednesday 04 February 2009 2:25:41 pm

do you really need to fetch all the database???

yes it's dangerous to do this.
maybe you can do a foreach and fetch nodes with offset and limit

http://www.kouz-cooking.fr

laurent le cadet

Wednesday 04 February 2009 11:31:23 pm

Thanks for repplying.
As I really need to fetch a very large number of nodes I have to find another strategie.

Laurent

Stéphane Couzinier

Thursday 05 February 2009 7:04:55 am

for $products_counter why you don't use a tree_count ?

http://www.kouz-cooking.fr

laurent le cadet

Thursday 05 February 2009 9:31:04 am

That's what I did and the code is a bit smarter now.

Jérôme Vieilledent

Friday 06 February 2009 7:53:24 am

It should help if you set load_data_map to false

Sorry for asking, but what is the <b>load_data_map</b> parameter you're talking about ? I can't find it in the reference doc.
Anyway, there is a way to just fetch a result set by setting <b>as_object</b> to <i>false</i> in your fetch. The only problem I have with this technique is that you can't have the real <i>url_alias</i>, but only the <i>path_identification_string</i> ("_" delimiters).
Is there a way to get the real url_alias without loading the attributes ?

Thx

André R.

Friday 06 February 2009 1:35:59 pm

load_data_map was added in 3.10.
Before that you could not control the fact that 'list' and 'tree' fetches in templates pre fetches node attributes to save sql calls. Witch is fine if you fetch less then say ~100 nodes. But above that it causes several performance issue:
1. the sql generated to fetch all attributes gets really big and really slow, especially on mysql server under heavy load.
2. attribute objects are cached to avoid re fetching them, you run out of memory if you don't clear it something you can't do in templates, and when you have fetched to many nodes it already to late.

So if you want the node object with access to url_alias and some of the other stuff you don't have when you use as_object parameter, use it.

We are considering setting it to false by default in 4.1, for the reasons above, and since we have reduced some of the overhead when you call *.data_map.* when it is turned off.
And yes, it should be documented, in the mean time you can also use it in php on eZContentObjectTreeNode::subTree[ByNodeID] where it is disabled by default as 'loadDataMap'.

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Jérôme Vieilledent

Saturday 07 February 2009 6:48:11 am

Thanks André

So if I understand well, I can do a fetch with <b>as_object</b> to true an <b>load_data_map</b> to false to be sure to get the url_alias stuffes and not to fetch the attributes ? Will it save resources compared to a regular fetch ?

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.