Reporting - how to get custom data out of ezpub database?

Author Message

chris mol

Saturday 01 March 2008 7:49:17 pm

I work for a small company that is considering using ezpub as our enterprise content management tool. It has all the cms functionality we need and more. However, we have some concerns around ezpub's OO database and extracting data for reporting.

My company's business is to book events online. We book about 50,000 events per year, with a user base of 20,000 users that belong to 35,000 organizations that then roll up into about 50 clients. We store a lot of data, probably close to 500,000 records.

We are in talks with a local web shop to customize ezpub as our web scheduling application.

Our daily business depends heavily on the ability of the IT dept to deliver reports including all the data points listed above. We have some concerns that nearly all our custom data points will be stored in what amounts to 4-5 tables in ezpub (ezcontentclass, ezcontentobject, etc.).

Can anyone provide input regarding extracting data from the ezpub db for custom reporting (BI suites like Pentaho, MS, etc.) ? Is it even possible? If so, does it take a lot of effort in an writing ETLs to separate my data from the ezpub object data?

This is a huge issue for us and I would appreciate any input from users who have experience using the ezpub database for custom reporting.

Thanks.

Felix Laate

Sunday 02 March 2008 2:12:55 am

Hi Chris,

so the ezp-database is kind of abstract, BUT there are excellent ways to make a proxy that produces the output you need.

Say e.g. that you plan to use Pentaho. It supports many data sources, amongst them XML-based data sources. Then you could quite easily make a "view" with e.g. the layout-module that produces the XML you need. It then works pretty much like any feed.

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Piotrek Karaś

Sunday 02 March 2008 3:46:30 am

That is all possible plus more - you can extend eZ Publish to handle data with eZ API rather than though presentation layer. Also, the content model is not that difficult once you've learned how eZ Publish handles content, so then you can directly pull from the DB in any way that suits you.

One question, though, to Felix and other experienced developers is: should those 500000 records be attempted to be managed with that model? Would you go for that? Or would you choose some sort of integration or extension with dedicated DB tables and interfaces? Looks like data management rather than content management project to me.

What do you think?

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

Felix Laate

Monday 03 March 2008 12:49:31 pm

Hi again,

>> should those 500000 records be attempted to be managed with that model? Would you go >> for that? Or would you choose some sort of integration or extension with dedicated DB
>> tables and interfaces? Looks like data management rather than content management
>> project to me.

Obviously, with that amount of data, a joint solution (ezp CMS + separate database) would be a good one. My suggestion (XML view of CMS-data) is not that efficient of course, but on the other side, it's quite easy to set up.

If you want more control and a more efficient solution, then I would opt for an separate extension based on the API.

Anyhow, I think the ezp-approach is a good one for projects like this. Where you have the classic needs for a CMS in combination with the need to provide access to and from just about any kind of database-systems. You need flexibility most of all. And that's, IMHO, what ezp is all about.

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Björn Dieding@xrow.de

Monday 03 March 2008 4:50:25 pm

Hi,

the main problem of the 500,000 records stored in the database is the content object tree table. Due it`s architecture and design it can`t deliver certain fetches very effective ( path like 'mytree/%' ). A better model has been already developed for the eZ components.

So your keys to success are:
* Only store the least necessary data in the content object related tables
* Get a cool hardware for the db, buy a lot of ram, tweak your db(maybe your model holds the demand just by doing this)
* Get technology from the components as an early adopter.

I say get a good eZ partner and build a proof of concept with them. Since your model isn`t complex proving that it would not brake shouldn`t be expensive..

For reporting I would let the reporting tool directly access the db and create it`s reports.

If one knows it better, prove me wrong :-).

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.