Forums / Suggestions / Document Management

Document Management

Author Message

Tony Wood

Tuesday 13 May 2003 8:04:02 am

I'd like to see an integration with documents that are used in a companies workflow. So OpenOffice and MS Word binary documents etc could be stored. This would extend the capabilities of eZ to a document management solution.
This could be implemented via the DE engine, maybe the documents at the backend could be stored in a Subversion type system or maybe even Subversion itself?

My dream end goal would be to store the entire document in XML inside eZ and then produce the OpenOffice, PDF, MS Word document upon demand. That way you get true collaboration.

I think some sort of Desktop client is needed so documents can be dragged and dropped to the system, maybe this will be a feature of the DE? This needs to be KDE/Gnome and Windows.

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Paul Borgermans

Tuesday 13 May 2003 12:18:45 pm

Tony,

Quite a lot of a DMS can be implemented using a minimum of effort with workflows (You figured that out already I presume) and some programming.

What needs to be added in the ezp core are ad hoc triggers (eg to be used in templates) for launching a certain workflow (think "archive this"). Also triggers/workflows should discriminate between classes, not only sections. Sections are supposed to be agnostic towards classes (the current workflow system forces you to groups objects of the same class in sections if you want to do this).

IMHO the status of objects should also be extended or made extensible with more status values which can then be used in workflow processes.

Things like drag and drop are possible even with a web browser. I've seen things like that at least with IE. With Mozilla its possible too, albeit with some XUL programming.

Your XML dreams are also mine, but before that, ez publish should be extended with support for more XML doctypes. The rest is not in the realm of ezpublish, but rather with the user side (resistance to change) and the lack of WYSIWY(G/M) XML editors. LyX has the most potential in this area with its supprt for math (formulas), bibliographic references, ...) which currently are not within the scope of browser based editors (xopus, bitflux, ...).

When SOAP is implemented for storing/changing/.. content objects, you may want to write add-ins for some office products to "integrate" the CMS (CMS=DMS++) with them.

Just my 0.02 ¤

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Tony Wood

Tuesday 13 May 2003 3:11:52 pm

>>DMS
Your right on the DMS... The trick is getting people to use it and for that you need a Desktop client and a binary diff, so people can click to open, make changes and click to save. This is the basics. You then either integrate a mmeta tag retrieving system into the client app or pop up a Browser window to get the info.

>>Triggers
Yes, I'll back that.

>>Workflow
For workflow, I found it good, but limiting, A graphicaly workflow front end and the ability to link workflows would be an improvement that could be made here.

>>XML
I notice you mention LyX here and not OOo. HJave you had a bad experience of OOo?

>>SOAP
The Pear classes can be used with your own operators now, we are looking at a WSDL operater... Its early days, but it looks straight forward.... (fingers crossed)

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Paul Borgermans

Wednesday 14 May 2003 2:15:18 am

OOo: I must admit I did not look at OOo for a few months before a minute ago.

The beta for OO 1.1 lists docbook import/export, as well as flat xml export. Oh la la. I'm gonna take a good look at it.

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Tony Wood

Wednesday 14 May 2003 2:34:47 am

unzip a .sxw file and you will find all the xml files within it. OOo stores its files natively in xml.

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Ekkehard Dörre

Tuesday 30 March 2004 9:03:11 am

Hi,
while looking for the new version (1.1.1) of ooo I found under http://development.openoffice.org/index.html under
"Have you integrated OpenOffice.org into your solution?"

OfficeIntegration into CMS
http://www.icoya.com/produkt/module/index_html/tocarticle_view#1

it is ZPL and download
http://www.icoya.de/support/download_area/zope/CMFOODocument

Problem: It is in Python. But it is a nice way to publish content and have real DMS. Any comments?

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Tony Wood

Thursday 01 April 2004 10:07:25 am

Thanks for the heads up Ekke.

Looks like this may be moving closer to reality, just requires a little work.

Thanks

Tony

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Ekkehard Dörre

Thursday 13 May 2004 4:28:24 pm

Hi, while looking, I found very interesting new projects on typo3:

Plans for an Workflow Engine
http://typo3.org/projects/workflow-engine/

Digigtal Asset Management
http://typo3.org/projects/digital-asset-management/

And plans for Projectmanager
http://typo3.org/projects/projectmanager/

And an General Office Displayer
[...] Displays a Word or Excel file from Microsoft Office 2003 if saved in the new XML format. Additionally it supports Open Office Writer documents.[...]

http://typo3.org/extensions/repository/search/rlmp_officeimport/

Time to sleep,
ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Ekkehard Dörre

Thursday 13 May 2004 4:38:14 pm

and in Python for older Word documents (<2003):

http://www.icoya.com/produkt/wordxml/

... and after making ez able to read word / ooo document, it can write word / ooo documents, too ;-)

ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Ekkehard Dörre

Friday 14 May 2004 7:11:56 am

http://typo3.org/extensions/repository/search/rlmp_officeimport/

works in Typo3, i'm impressed

Greetings ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Ekkehard Dörre

Tuesday 18 May 2004 4:43:41 am

My vision:

db folder = Folder class in ez
server folder = Folder on the server under var/....

Publishing:
Publisher can write a page (e.g. article) in ooo writer, calc, or impress or word, excel, powerpoint 2003 and put it with webdav into a db folder with right control an publish it.

System work while publishing:
In the system the file is parsed, images, flash, sounds etc. are saved in ez server folder structure and a db folder in Media is automatically created with the document name and the image.

In pagedocument (e.g. article) the object id is automatically added.

Editing:
The Editor takes with webdav out of the db folder a ooo writer, calc, or impress or word, excel, powerpoint 2003 file and edit it on desktop.

System work while editing:
Ez generates a ooo writer, calc, or impress or word, excel, powerpoint 2003 file, which can be downloaded via webdav. Article status is draft.

What is, if another editor tries to edit, while the document is in editing?
Via browser, the currently is draft by another editor message is shown,
via webdav, same procedure.

What is, when the document was edited via browser?

No problem, the
ooo writer, calc, or impress or word, excel, powerpoint 2003 file is fresh generated.

What is, when the word document is very large?
Via ini file you can say: Ez please make an new article with all content after header 2 and before the next header 2 or end of file.

Diff function:
In Typo3 you can diff 2 document versions.

Comments: For workflow you need comment functions.
ooo writer:


<text:p text:style-name="Standard">Hello Woarld.
<office:annotation office:create-date="2004-05-18">
<text:p/>
<text:p>---- 18.05.2004, 13:36 ----</text:p>
<text:p>typo in world!</text:p>
</office:annotation></text:p>

ez translation in xmltext field e.g.:


Hello Woarld.
<comment create-date="2004-05-18" by="chief editor Jim">
typo in world! Thomas, please take earth in whole document.
</comment>

What do you think about?

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Tony Wood

Wednesday 19 May 2004 2:32:58 am

Ekke,

>>XML
I think ti is vital to have a standard XML format for documents. This could either borrow from the OpenOffice team or it could use the eZ publish format. I would prefer the former as it is widly used by OO and also now by Koffice.

>>oowriter
On a technical implementation matter. We did some tests with OOWriter a while back and reviewed using SVN as a files system for storing these files. The problem we found is that oowriter makes multiple writes to the .sxw zip bundle. This would cause a problem as eZ publish as each update would produce mulitple versions as oowriter updates each XML file within the .sxw file in turn. If this problem could be resolved so that eZ would wait for all oowriter updates then this system would be great.

>>features
All the features you speak about would be a great boom as it would change eZ into a real information repository.

-- tony
http://www.visionwt.com

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Ekkehard Dörre

Saturday 19 June 2004 3:45:47 am

Hy Tony,

there is an additional posting in
http://ez.no/community/forum/suggestions/please_ensure_online_editor_work_with_nonmicrosoft_software
to bring both together.

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Ekkehard Dörre

Tuesday 29 June 2004 10:56:06 am

@ Tony

What do you think is the best way to get out oooxml out of ez?

The first step is to get the content.xml. Is this possible by layout/set/ooo or better the pdf way via admin?

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Paul Forsyth

Tuesday 29 June 2004 11:15:24 am

You could write a routine to import the xml into a standard text field. We would probably need to use XSLT to present the document, and have some method of generating XSLT from the settings in the office document. But this must have been solved by others? I've not looked.

You mention pdfs, different layout views. How would you like to see the document? Inline, external, able to regenerate and pipe to your office program? What are the best ways of dealing with this?

paul

--
http://www.visionwt.com

Paul Forsyth

Tuesday 29 June 2004 11:30:02 am

This looks promising:

http://hoopajoo.net/projects/soffice2html.html

Its a very simple converter, not quite what we want, but by this simplicity it really shows how easy it will be to import oo docs. I just ran it and it works well. Reminds me of running latex2html scripts :)

The difficult part will be exporting to rebuild the original doc exactly. Producing a basic oo file will be simple i think, and could probably use the same production system as the pdf system does.

paul

--
http://www.visionwt.com

Ekkehard Dörre

Tuesday 29 June 2004 1:47:54 pm

Importing of content.xml is no problem. Dezip. Only the content.xml converted in ezxml. And the pictures converted with imagemagick into an own standard image class (5) and put into the e.g. article class with object id.

This works here already on local machine.

My way:
Inside ez everything is ezxml. For a new document ezxml goes via template or stylesheet (for style, margins, fonts etc.)into oooxml.

My idea for export is:

Take the pdf class and rewrite it to make sxw files. Then create on demand.
Generating openoffice with php is no problem, works fine:

http://phpdocwriter.sourceforge.net/

From phpdocwriter:
<i>
PHP DocWriter is a set of PHP classes that generates simple StarOffice/OpenOffice.org documents.
It builds the document following the file format specification and doesn't need any StarOffice/OpenOffice.org installation.

At the moment this class supports several things like:

* Page styles
* Paragraph styles
* Page breaks
* Text styles
* Page headers and footers
* Textboxes
* Images
* Tables
* Drawings
* Meta-information of the document (title, author, etc)
</i>
.. and this works only with openoffice on server:
<i>
* Automatic conversion of created documents to other formats like MS Word, PDF, RTF, StarWriter, LaTeX, XHTML, HTML, etc.
</i>

but isn't there an easier way?
transform xml in ez into xml from openoffice?

Paul wrote:
<i>
The difficult part will be exporting to rebuild the original doc exactly. Producing a basic oo file will be simple i think, and could probably use the same production system as the pdf system does.</i>

I think, this isn't necessary to rebuild exactly, just create a new document. We have images and content.xml, generate the other automatically. It is like ez works for web:
content is raw with some tags, output is styled html, pdf and sxw
Styles are made by designers (like Latex, writers write and latex does the rest)

Try to explain my idea again:
ez is the central point for content (publish once, ...)
inside only ezxml

when importing ooo:
ooo content.xml stripped and converted into ezxml in ezxmlfield
images into image class (5)
take the object id's from 5 and put it into the right place in the xmlfield
The rest of information is lost.

exporting:

klick "export this folder"
inside is class with ezxmlfield
images in class 5
metadata in new class with:
Title
Author
Subject
KeyWords

that's all. SXW is ready.

I think in about 3 weeks I can send you an cleaned import script for testing. This long time is because of lower priority here and after normal work.
It works for sxw and MSword2003xml but there the images not tested.

And by the way online editor, it should by easy to put images in ooo with left, right, center and read this out for ez.
Greetings ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Paul Forsyth

Tuesday 29 June 2004 2:41:43 pm

I didnt realise you were so close to releasing!

I have a couple of concerns.

Im not convinced ezxml will sufficiently capture the information from the content.xml file. However, as you are close to finishing you may have found it ok. I wouldnt want to loose too much information. At some point down the line i would want to capture style information too.

As you are using phpdocwriter you are building a new document from scratch. I like this approach because it gets this project off the ground, plus it also follows very closely the way the pdf mechanism works. pretty handy for developing :)

From what i can see of phpdocwriter it looks too basic at the moment. I dont think i can add style information easily from an imported sxw. But it may improve. After all it is an interface to the oo sdk.

A new class would be sufficient to hold every piece of information. Related objects can hold your meta information quite easly. Images can even go in as related objects.

Let us know when you have a working version :) Will you put it on pubsvn?

paul

--
http://www.visionwt.com

Ekkehard Dörre

Tuesday 29 June 2004 3:44:50 pm

Paul wrote:

<i> I didnt realise you were so close to releasing!</i>
it is very dirty and a lot of work to make it readable for other, but is only the import script. I took the Sample import file by you etc.

<i>
Im not convinced ezxml will sufficiently capture the information from the content.xml file. </i>
I think this is enought:


			'tagWraps.' => array (
			'heading1' => '<header level="1"> | </header>',
			'heading2' => '<header level="2"> | </header>',
			'heading3' => '<header level="3"> | </header>',
			'heading4' => '<header level="4"> | </header>',
			'heading5' => '<header level="5"> | </header>',
			'heading6' => '<header level="6"> | </header>',
		//	'heading7' => '<header level="7"> | </header>',
		'paragraph' => '<paragraph> | </paragraph>',

			'bold' => '<strong> | </strong>',
			'italic' => '<emphasize> | </emphasize>',
			'underlined' => '<custom name="underlined"> | </custom>',
			'unorderedlist' => '<ul> | </ul>',
			'listitem' => '<li> | </li>',
			'superscript' => '<custom name="sup"> | </custom>',
			'subscript' => '<custom name="sub"> | </custom>',
			'preformatted' => '<custom name="pre"> | </custom>',
			'indented' => '<custom name="blockquot"> | </custom>',
			//'firstLineIndent' => '<paragraph> | </paragraph>',
			'firstLineIndent' => '|',

And tables and images.
<i>
However, as you are close to finishing you may have found it ok. I wouldnt want to loose too much information. At some point down the line i would want to capture style information too.</i>

The styles inside of the tags and in stylepart of content.xml are gone.
ooo import is only for content, Layout is made by ez pub
So we can change inside ez and inside ooo the content.

But I think, your way is possible to.

In ez is only the xzxml for working. The other static oooxml parts can be saved anywhere.
I have in database all paths to the .xml files, so they can by opened an saved in database to.

<i>As you are using phpdocwriter you are building a new document from scratch. I like this approach because it gets this project off the ground, plus it also follows very closely the way the pdf mechanism works. pretty handy for developing :)</i>

That's the easiest way, I think. I found phpdocwriter 3 days before.
After having a simple class (native ez) working, it should be possible to expand. But for the first step, it is enought work.
<i>A new class would be sufficient to hold every piece of information. Related objects can hold your meta information quite easly. Images can even go in as related objects.</i>
Yep. metaoffice ;-)
e.g. meta.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-meta PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN"
 "office.dtd">
<office:document-meta xmlns:office="http://openoffice.org/2000/office"
 xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:meta="http://openoffice.org/2000/meta" office:version="1.0">
 <office:meta>
  <meta:generator>ezpub
  </meta:generator><!--3.4.1-->
  ##### static ###
  <dc:title>My Page</dc:title> <!-- out of class metaoffice -->
  <meta:initial-creator>Ekke</meta:initial-creator> <!-- out of class metaoffice -->
  <meta:creation-date>2004-03-23T17:29:00</meta:creation-date><!-- out of ez -->
  <dc:date>2004-06-19T12:28:39</dc:date><!-- out of ez -->
  <meta:print-date>2004-03-23T16:47:00</meta:print-date><!-- out of ez -->
  <dc:language>en-US</dc:language><!-- out of ez -->
  <meta:editing-cycles>8</meta:editing-cycles><!-- out of ez -->
  <meta:editing-duration>PT0S</meta:editing-duration><!-- out of ez -->

  <meta:document-statistic meta:table-count="2" meta:image-count="8"
 meta:object-count="1" meta:page-count="12" meta:paragraph-count="310"
 meta:word-count="1113" meta:character-count="8718"/><!-- out of ez -->
 </office:meta>
</office:document-meta>

e.g. settings.xml

2 ways: out of ez via template (like pdf now)
or stored from imported document.

<i>Will you put it on pubsvn?</i>
Yes.

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing

Ekkehard Dörre

Tuesday 29 June 2004 3:59:07 pm

Yes, the PDF writer way is a good way. First with ezxml later with mixed ez- and oooxml or pure oooxml.

We have http://example.com/content/ooo/82 for one site and can make whole books via admin.

Then they can be transformed on client into docbook, latex if anybody want.

I like the ooo xml more and more.

to be continued...

Greetings, ekke

http://www.coolscreen.de - Over 40 years of certified eZ Publish know-how: http://www.cjw-network.com
CJW Newsletter: http://projects.ez.no/cjw_newsletter - http://cjw-network.com/en/ez-publ...w-newsletter-multi-channel-marketing