Forums / Developer / Integration of Topic Maps (light) in ez Publish

Integration of Topic Maps (light) in ez Publish

Author Message

Felix Laate

Monday 14 May 2007 3:42:28 pm

Hi all!

I'm currently exploring the possibility to integrate the Topic Maps way of thinking with ez Publish. I envision it as an extension:

1) the extension should be able to handle an indefinate number of categories (AKA topics)

2) the user should be able to add/edit/delete/reorder the categories

3) the categories and their responding keywords should be stored in a flat file

4) searches should be done against this file the AJAX-way with e.g the Knuth, Morris & Prath algorithm.

5) searches should be done automatically via a datatype in the admin interface, resulting in a list of proposed categories which the user can revise if needed

6) relations between the working object and the category-objects should be handles by ez Publish as normal

Has anyone done anything like this before? Or are the any comments, proposals or whatever?

:-) Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Xavier Dutoit

Tuesday 15 May 2007 11:37:56 pm

Hi,

What I usually do for topic like thinks is to use enhanced object relation (most of its have been put in the 3.9).

That's basically a layer allowing easy selection of related objects (based on the type or location) or creation of new ones. It's made to add new interface of selection, so you could easily add a ajax interface as the selection.

X+

http://www.sydesy.com

Felix Laate

Wednesday 16 May 2007 12:21:27 am

Hi Xavier!

So you think it would be possible to combine the enhaced object relation with the cocept I described, allowing for a automatic preselect of certain categories through an ajax lookup?

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Xavier Dutoit

Wednesday 16 May 2007 5:32:40 am

Hi,

Yes, have a look at the template, on the datatype edit you could add a new type "felix ajax", on the object edit, you can create your own case with your own xajax lookup.

X+

http://www.sydesy.com

Paul Wilson

Wednesday 16 May 2007 5:01:54 pm

Hi Felix,

I did a little tinkering a couple of months ago, but had to stop for some other urgent work. Some thoughts follow in case it helps your efforts or triggers a useful idea. I realise it's slightly different to what you're thinking of.

I was:

- looking to use the object relation approach mentioned by Xavier.

- allow hierarchies of topics that would inherit membership (ie select "blue car", a child of "car", therefore topic also exists in "car" - like behaviour in EZ users section). In this way users can give multiple pieces of information with a single click (okay, some more for navigating to see the category too, perhaps).

- generate/manage mulitple hierarchies of topics as a content structure within EZ (perhaps as a different root section (like users, media, content, etc). Multiple hierarchies so that multiple classification schemes were possible (eg location, topic, action required). The other potential benefit of using an EZ content structure seemed to be that it was easily adaptable and had most/all ez fetch and other functions available.

- adapt the content tree structure menu from the admin interface to present this hierarchy/tree except with check-boxes instead of icons (like appears in issues.ez.no report an issue page).

There's an extension that does some of the object relation / categorisation work. I think it was "ez keyword nodes".

None of these efforts were directed towards AJAX - just thinking in terms of useful functions / efficiency of user interaction with an EZ site.

There might be some broader usability issues to consider too. For example, with keywords, there's no real information about the keywords - if, as content creater and reader, we both understand the same thing by a keyword, then the system works. But it many circumstances tag/keyword is too short and not meaningful. It might be useful to add a one - line description to tags (eg displayed in an adjacent panel) so that they can confirm an object belongs in that category.

I hope this helps.
Regards
- Paul

Xavier Dutoit

Thursday 17 May 2007 1:56:34 am

Hi,

I like the tree idea+automatically select the category (car).

However, that's for me an interface thing for the edit (add the parent when you select a children), and everything is stored as relations.

As for how to manage the tree, using "topic" content class and the structure of nodes will do it. the edit template fetch the tree.

X+

http://www.sydesy.com

Kristof Coomans

Friday 18 May 2007 1:02:25 am

Regarding the object relation ideas: I don't think you necessarily have to add an object relation to the parent automatically. Take a look at the component type map on http://projects.ez.no/types. It uses an extended attribute filter to get all reverse related objects of a specific node (the component type) AND its children. This extended attribute filter will be publicly released soon.

By the way, this is a very interesting topic and I'm looking forward to hear any further ideas on this subject.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Xavier Dutoit

Friday 18 May 2007 4:30:23 am

Who's coming in norway ? We can add that as one of the topic on saturday ?

X+

http://www.sydesy.com

Felix Laate

Friday 18 May 2007 7:53:06 am

Hi all!

Thank you for your informative answers :-)

The system I intend to create has two parts, as mentioned:

1) assignment to a topic based on the text. Let's say the text goes <i>BMW i an quite nice automobile, but not as nice as Skoda</i>. The system then should valuate each word against a db. The words <i>BMW</i>, <i>automobile</i> and <i>Skoda</i> should result in the topic <i>cars</i> being assigned.

2) then, based on the topic assignment, the proposed topics for the text (or article or whatever content object) should be related to that object.

You all give good answers for the second part, know I need to figure out the first and the make an extension to test it out :-)

Norway? I am in Norway. Wazzup?

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Paul Wilson

Friday 18 May 2007 12:50:38 pm

Hi Everyone,

(I'm in Tasmania, Australia about as far away from Norway as possible, so online is my best chance for sharing ideas at the moment). Felix, the ideas in your second (missing?) post sound more sophisticated ... might be useful to briefly explain some of my EZ/other work...

A lot of my work has been directed towards developing useful knowledge for action in complex and dynamic situations (eg industry analysis, understanding national education/training needs, regional business/industry adoption of e-commerce / internet technologies). One of the fundamental problems in these situations is making sense out of the vast amount of information available. One strategy is unstructured text analysis - as used by intelligence agencies and data-mining/marketing companies (or see text analysis here: http://en.wikipedia.org/wiki/Text_mining).

On this theme, I started investigating text analysis using GATE (see www.gate.ac.uk) which is a "General Architecture for Text Engineering". This analyses documents (eg imagine web / ez content) to draw automatically out concepts and relationships (eg People, Locations, Time, Money, and more). It can do this from/to a mysql database or generate topic-classifed XML of content. One automated pathway for using GATE is via ECLIPSE (that name ring any bells?)

I have experimented with some (basic and non-ez) PHP code to analyse and present the GATE analysis for my industry/other work. I got to a proof-of-concept stage with a semi-automated process going from web/mysql through GATE text analysis engine and back to mysql/web presentation.

I guess a key issue is what to do for presentation/user interaction with this rich categorisation of topics. I've been looking at using treemaps (eg http://www.cs.umd.edu/hcil/treemap-history/), and Self Organising Maps / clustering or other visual means (eg see various methods/code here http://www.publicwhip.org.uk/mpsee.php, http://iv.slis.indiana.edu/sw/ )

While such (potentially) industrial-strength approaches may seem like overkill for EZ sites in themselves, I suggest it is useful to look towards the Web 2.0 possibilities of EZ. For me this means going beyond EZ’s core Web 1.0 content-presentation capabilities, and starting to think about rich user interaction and value-adding/information-sensemaking services.

For example, in Tasmania at the moment there's a lot of community concern about a "world scale" (ie very very big) pulp mill project. I used some of these techniques to analyse 780 public documents (around 4500 pages) and present this as a topic / treemap. It's not online, but it helps reveal themes and relationships people just don't see reading through content page by page ... which is why I produced it.

So, bringing all of this back to the current thread, what does this mean? Well, I wanted to share some of my thinking, and in relation to Felix's post (18/05/2007 4:53 pm - missing from ez site?) on how to do step (1), I note:

- It is pretty heavy processing work, but GATE does some of this type of analysis, and/or,
- Even if GATE is overkill, it's methods of grammar/keyword analysis may be useful for you (it uses some flat text files to match locations, for example).

BTW, the attribute relation method Kristof describes sounds excellent. It means tagging is possible on multiple dimensions (around a datatype/attribute) rather than using relation to parent, which only allows one. Makes a lot of sense.

Apologies for length. Hope this helps.

Regards
- Paul

Xavier Dutoit

Saturday 19 May 2007 12:12:02 am

About related objects vs hierarchical relations.

Not sure we understood properly. In my mind, it has to be related objects, not hierarchical (enhanced object relation does that).

What we discussed was about hierarchical topics:

- Car (topic)
-- BMW (topic)
-- Toyota (topic)

This is hierarchical. Assuming topic is a regular ez class, that's the parent children relations between BMW and Car and that's good.

When you create a new article about the latest toyota engine (can we drop this car exemple soon, I don' know anything about car ;), you place it wherever it has to be (ex: under the June issue) and you create relations between it and the topics.

If I got him right, K said that you don't need a relation between the article and Car, as you can infer it from its relation between the article and Toyota.

Otherwise, I've been using such classification system for years and it works well. The key problem is that authors don't bother checking the rights topics, create a new one that already exists...

I'm curious about the result for automatic classification.

X+

http://www.sydesy.com

Paul Borgermans

Saturday 19 May 2007 1:15:33 am

Just a quick comments on this

- automatic classification: this is one of the main topics all major search engine vendors are looking at, not easy at all. It is a subject that conceptually dates back a while ago: taxonomies (single or multiple). It even becomes more difficult when also your topics/categories are unknown.

- manual classification (with object relations): that's currently a path used in an application here: the categories can be multiple and hierarchical for a knowledge base application. That inherently is the classification that Xavier describes

- semi-manual: with similarity analysis and meta-data (like keywords or a full blown manual classificatio in parallel). This can serve a kind of dynamic topic maps: when looking at a certain article, give me similar articles (similarity in content), but ordered along the keywords or hierarchical topic maps.

The similarity issue is tackled by a student who developed such methods for his master thesis, based on Lucene. Not sure if we can use it (copiright/license). If not, a more sophisticated "more like this" along facets (categories, keywords) can be used inside the search engine.

This is exactly the topic I'll talk about during the summer conference and community developer day in Skien.

Regards
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Felix Laate

Saturday 19 May 2007 10:17:02 am

Hi again,

as Paul says, the automatic classification is difficult indeed. That is why I favor the semi-manual approach where the system suggests topics, and the user the can approve or remove these and add other ones manually.

The Lucene-approach you mentioned could perhaps be bought and be made some kind of open source. As the task of classification is so difficult, it would certainly be favorable to have an open source project that could be adapted to the many languages, cultures and other needs adapted.

The system would have to be very flexible indeed.

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Xavier Dutoit

Sunday 20 May 2007 2:45:19 am

Hi Felix,

As you are in norway, try to pop in in skien on saturday so you can see what voodoo magic Paul has done. I will discuss about advanced object relations (attribute level, multiple locations, keyword) present some useful contribs to make it work properly and give some real case exemples that's been live for a few years.

I'm sure we'll find the time to brainstorm about your the classification thing you mention and might find a way to make it real ?

Well, it assumes I can sort out the details of my trip, be able to find some informations written in a language I can understand (Norwegian doesn't qualify), find the exchange rate between euro and seals' teeth (my swedish wife told me that's the local currency up north ;)...

X+

http://www.sydesy.com

Paul Borgermans

Sunday 20 May 2007 7:14:50 am

@Xavier

The voodoo magic is really Solr magic,

@Felix

It will be open source, don't worry

Hope to see you in Skien

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Paul Forsyth

Wednesday 23 May 2007 1:14:09 am

This is an interesting topic.

We are performing an upgrade which currently has a complex topic map structure based on a mixture of relations, heirarchy and keywords. It works well but is complex to administer and extend. So, part of the upgrade is to use solr to replace some of the mix.

Im at the conference from Wednesday onwards so perhaps we can schedule a small meeting about this?

P