Forums / Extensions / eZ Find / Problem with really BIG solr indices

Problem with really BIG solr indices

Author Message

Xavier Serna

Thursday 19 March 2009 4:52:13 am

Hi all guys,

let's try to explain the problem encountered with eZFind 2.0.0 (also with previous versions).

Background: we've indexed in the solr engine about 30k eZContentObjects, also with 226k external XML files, this generates about 5 GB of index data.

Currently we have started the solr engine with this command:

/usr/bin/java -server -Xmx600m -Xms600m -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -jar start.jar

Our main problem, is that every time that a commit is executed in the solr engine, all index files (remember, 5GB data) are regenerated from scratch, the size of the data folder grows up until about 10 GB, then the old files are gone and the new one remain in the data folder.
As we have delayed indexing enabled, this is not a critical problem on publishing content, but it is when deleting something, as every time we remove some object the system freezes until indices are regenerated.

Anyone out there with similar escenario that can guide us?

Thanks for reading!
Xavier

--
Xavier Serna
eZ Publish Certified Developer
Departament de Software
Microblau S.L. - http://www.microblau.net
+34 937 466 205

Ali Nebi

Thursday 09 April 2009 6:50:18 am

Hi,

we just made some tests with ezfind2 and we found the same problem. The solr indexes took 650GB. This is really big. The same solr indexes with ezfind1 and related solr is 9,5GB.

Why this happen and how to solve this problem?

Thanks in advanced!

Iguana Information Technologies, SL - http://www.iguanait.com

Nicolas Pastorino

Friday 10 April 2009 12:32:55 am

@Xavier :
Any feedback on your issue ? Did the proposed solution of disabling the OptimizeOnCommit directive + setting up a daily 'optimize' workflow work ?

@Ali :
This index size is very surprising. Did the indexed content base grow a lot between the eZ Find 1.x usage and eZ Find 2.0 ? Are external elements indexed ( through the DataImportHandler Solr extension for instance ? ) too ? Websites crawled ?

Best regards,

--
Nicolas Pastorino
Director Community - eZ
Member of the Community Project Board

eZ Publish Community on twitter: http://twitter.com/ezcommunity

t : http://twitter.com/jeanvoye
G+ : http://plus.tl/jeanvoye

Ali Nebi

Wednesday 15 April 2009 8:20:04 am

Sorry for my late reply.

We use the same database for tests and the data in database is not changing. Also we don't index any external elements.

We continue to do tests with this. We test in one other test server and there the size of data dir was less than the other server, where it was 650GB, but it is still big. 14GB for 40% indexed data.

Regards, Ali Nebi!

Iguana Information Technologies, SL - http://www.iguanait.com

Xavier Serna

Thursday 16 April 2009 12:50:04 am

Hi Nico,

many thanks for your proposed solution, it seems to work fine now disabling optimizeoncommit.
Only one detail, in the updatesearchindexsolr.php on each commit, every 1000 objects, it's forced an optimize, not respecting the setting in the ini file. I believe that this should be updated, because reindexation of the whole xml files takes more than 4 hours.

thanks!

--
Xavier Serna
eZ Publish Certified Developer
Departament de Software
Microblau S.L. - http://www.microblau.net
+34 937 466 205

Ali Nebi

Monday 01 June 2009 4:56:57 am

Hi,

after some more tests and spending more time for ezfind 2 tests, we found out why the solr indices were so big.

First we needed to use userFork to false. The real problem was explained here from Denitsa M.:

http://ez.no/developer/forum/extensions/ez_find/ezfind2_indexing_speed_incredibly_low_er

When the indexing start to index objects that have relationlist attribute(s), then indexing loops between these objects and indices are getting bigger and bigger. When we did these attributes no searchable, then for 2 GB database indexing was much faster and the indices size was hundred of MB.

Regards, Ali Nebi!

Iguana Information Technologies, SL - http://www.iguanait.com