New image system

Author Message

Jan Borsodi

Wednesday 15 October 2003 9:28:42 am

After working on some system benchmarking we saw the need to do some changes to the image datatype and image system. This combined with other known problems with the current system has led us to the current suggestion.

First the goals of the new system is as follows:
- Performance
- Flexibility
- Simplicity
- Better filenames

Performance:
The current solution uses creates too many PHP objects and generates too many SQL queries. To make the image faster all image data and variations should be stored in the content object attribute as a serialized XML field.
When asking for a variation you will get an array with data instead of an image object, the array will contain all the data that the template engine needs.

The two image tables will be obsolete but the system will be backwards compatible.
When an image datatype is used it will check if it has serialized image data, if not it will fetch all variations and image data and serialize it. An upgrade script will also be available for upgrading all attributes with images.

The XML structure will look something like this:
<image xml_version="1.0" name="original" filename="" original_filename=""
mime_type="" width="500" height="255" alt_text="" >
<var name="small"
reference_name="original"
filename="var/storage/dsfgsdfgsdfg/"
original_filename="Doc.jpeg"
mime_type=""
width="500"
height="255" />
<var name="small"
reference_name="original"
filename="var/storage/dsfgsdfgsdfg/"
original_filename="Doc.jpeg"
mime_type=""
width="500"
height="255" />
</image>

Flexibility:
The current variation system only allows you define sizes, if you want specific filters you have specify them globally to the ImageMagick setting.
The new system should allow you to define aliases which has a number of filters to be run, image size is a normal filter and can be omitted.
Configuration of the image handlers should also be better. Instead of having ImageMagick and ImageGD (semi) hardcoded in the system, you should be able define new conversion/filter handlers when needed.

Simplicity:
The conversion and filtering code needs to be have it's complexity reduced. Also the datatype needs to be simpler to work with.

Better filenames:
The name of the stored image should reflect the content path, for instance if an image object is created in the path media/cars and named ferrari the image should be named var/storage/image/media/cars/1234/ferrari.jpg (The number is the attribute id).
Search engines will love this new scheme.

However with this system there will be two issues that we currently see.

1. Removal of content objects.
When a content object is removed from the system all images of all versions needs to be removed as well.
To avoid having to go trough all versions of an object we should store the information on all created images
in an SQL table. For each object there should be a list of image filenames.

2. Renaming and movign of content objects.
When a content object is renamed or moved the storage path cannot be renamed. If it is renamed the old url will no longer work.
Instead the image should be copied to the new name and path.

More details on the new system will come.

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Björn [email protected]

Wednesday 15 October 2003 11:43:19 pm

It would so sooooooooooooooooooooooooooo cool if we could write all binary data of the origional files into the database. An instance of that file should be also written to the var dir if the file is requested on the server

From my point of view that will have the following advantages.

-easier backups
-easier implemention of a large site that runs on more then one server

What do you think?

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Bård Farstad

Wednesday 15 October 2003 11:57:46 pm

I think this can be a good feature. It does not really solve the multiple servers problem since you would need to have a local file of the image to use. Then we have the same problem with cache coherence that we have with all files, i.e. make sure that every server has a valid cache. We have a solution for this which works but it's not perfect.

I'm not sure how this would be in terms of performance though. This SQL table would be very huge.

Any experience with storing images/files in SQL tables? ( My general view is that this is not good, but for backups it's really great )

We would also need to implement this in binaryfile and the media datatype as well for it to solve backups.

--bård

Documentation: http://ez.no/doc

Marco Zinn

Thursday 16 October 2003 1:51:48 pm

Just my 2 pence:
- I cannot complain about the performance. For us, the image conversion itself (ImageMagick) takes some seconds, which is way too much, when the variation needs to be created the first time. After that, everything is fine (no production site stats yet)
- Backward compatibilty is fine, but don't provide it too long (see MS-DOS). It would just crowd the code and not perform too well. Provide _good_ upgrade scripts!
- I'm not sure, how you will handle more image conversion systems. How many are there anyway? You should concentrate on fixing the bugs in the current use of ImageMagick and GD for now ;)
- What about the reference image concept? I discussed with Bard about this and I would love to remove the need for reference images completly. I know, this is handy, when you have large images, but we (and quite some other users) will not have large images.
- Take care about the ">" character in ImageMagick scaling operatoin (ScaleLargeThanOriginal). Avoid that by letting ez decide, if scaling should be done or not.

About the better filenames: I like the idea.
About your 2 issues:
1. Why don't you just go though all versions of the object, when it is deleted? Can't be sooo many. Storing the image filenames of an object in a seperate table mean duplicate storage (another possible point of failure).
2. I don't like the idea of copying images (or files) in the filesystem, just because the object is moved. What about doing symlinks?

Marco
http://www.hyperroad-design.com

Jan Borsodi

Wednesday 22 October 2003 4:27:19 am

>- I cannot complain about the performance. For us, the image
> conversion itself (ImageMagick) takes some seconds, which
> is way too much, when the variation needs to be created the
> first time. After that, everything is fine (no production site
> stats yet)

The performance we talked about were when the variations are created and the data must be fetched from the database. In itself it's not a huge problem but when you want to do a multi fetch for all attributes of all realated objects you get lots of extra queries and object creations.

Regarding the long time it takes to create variations, how about a switch to define which variations should be created when publishing?

> - Backward compatibilty is fine, but don't provide it too long
> (see MS-DOS). It would just crowd the code and not perform
> too well. Provide _good_ upgrade scripts!

Backwards compatability will be maintained in the templates only, that means that you can do content.small.full_path and still get the path to the image. All settings and existing code will be changed.
Also the system will work with the old image tables, it will read in the original information and create a new entry from it. All variations will be removed. The old image filename will also be moved to use the new naming scheme.
And yes, upgrade scripts will be available.

> - I'm not sure, how you will handle more image conversion
> systems. How many are there anyway? You should
> concentrate on fixing the bugs in the current use of
> ImageMagick and GD for now

The reason for the current bugs was the design of the image system, now we've done it from scratch with a more sound design. An extra benefit is that it's easier to add new image conversion/filter programs.

> What about the reference image concept? I discussed with
> Bard about this and I would love to remove the need for
> reference images completly. I know, this is handy, when you
> have large images, but we (and quite some other users) will
> not have large images.

There's no longer any explicit reference image, instead you define one or more Image Aliases, each alias has a name a reference to another Image Alias (optional) and a list of filters (Image size is a filter).
If you wanted you could just have the original Image Alias (which is always present, no need to define it).

> - Take care about the ">" character in ImageMagick scaling
> operatoin (ScaleLargeThanOriginal). Avoid that by letting ez
> decide, if scaling should be done or not.

We will see if there is a way around this, the problem was when the > was escaped and the path to the conversion program was also quoted (because of spaces), then the php system() command did not execute the program correctly (at least on windows).

> About the better filenames: I like the idea.

> About your 2 issues:
> 1. Why don't you just go though all versions of the object,
> when it is deleted? Can't be sooo many. Storing the image
> filenames of an object in a seperate table mean duplicate
> storage (another possible point of failure).

Perhaps, we're not 100% sure what we will do here.

> 2. I don't like the idea of copying images (or files) in the
> filesystem, just because the object is moved. What about
> doing symlinks?

Good idea, at least on linux/unix we can do this.

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Jan Borsodi

Thursday 23 October 2003 12:04:54 pm

The first version of the new image system is now implemented and committed. You can check out in the trunk in revision 3774.
Note: Remember the SQL updates.

1.
A separate table was needed, the class for it is small and simple so there is little change for mistakes. Without it the code for removal would be quite complex.
2.
Symlinks is now supported (at least on unix). That and avoiding the need for a reference image will reduce the disk usage significantly.

The things that is missing:
a) ImageGD support
b) Removing old variations (table and file) when found (just for the compatability check)
c) Storing the xml internally with the same charset as the site (for performance, we do the same in ezxmltext).
d) Supporting defining the the original image alias, that way you can convert the original image to a predefined size or format (reducing disk usage again).

Some would-be-nice features that is not implemented yet:
i) User chosen filtering in when editing image, for instance to rotate an image 90 degrees (should be simple to implement with the new system).
ii) Possiblity to send parameters to the filters, the parameters should be defined when the image is edited. This is useful when used with the crop filter.

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Jan Borsodi

Tuesday 11 November 2003 6:46:38 am

Some updates:

1.
The image system no longer uses symbolic links but instead hard links, there were several problems with the symbolic links (eg. moving them) and the code to deal with it would be too complex and would certainly have bugs.
2.
The 'original' alias can now be defined, this means you can for instance set the original to be a specific format or to be automatically scaled down to a certain size.
3.
Analyzers for images has been added, this is PHP code that reads the image file and extracts information.
Currently extraction from GIF and EXIF (requires PHP built with --enable-exif) data from JPEG and TIFF is supported.
4.
Possiblity to allow/disallow filters for specific formats, for instance MIME types with image/gif and is_animated set to 1 will now disable all geometry filters.
In plain english this means that it is now possible to upload animated GIFs to image datatypes. You can even run filters on them (at least some, eg. grayscale).

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.