Can't save article: Invalid UNICODE character sequence found

Author Message

Erik Ziesler

Monday 02 May 2005 10:37:07 am

eZ publish can't save article due to "Invalid UNICODE character sequence". What is causing this problem and how do I solve it?

eZ publish debug: http://www.infotorg.net/ez_debug.htm

kracker (the)

Monday 02 May 2005 7:36:00 pm

I wonder....

How would one test this breakdown?

How would you break down the content to find the specific character which eZ publish is having qualms with?

For Erik, I could see how you might try to submit smaller chunks of the content until you found the "block" of text which contains the "Invalid Unicode character sequence".

For others trying to replicate the problem it becomes more complicated as I in the USA don't deal with Unicode very often and as such I'm not that familiar with it...I just don't know how that would work..

But even if I don't know I can see if I can't try to break down the error message.

From the eZ debug information, It looks like Postgress(DB) is kicking the error.

It kicks the error while performing the update query ...

data_text='<?xml version="1.0" encoding="UTF-8"?>
<section xmlns:image="http://ez.no/namespaces/ezpublish3/image/"
xmlns:xhtml="http://ez.no/namespaces/ezpublish3/xhtml/"
xmlns:custom="http://ez.no/namespaces/ezpublish3/custom/">
<paragraph>Dette er ingressen. Den største skriftstykket kommer nedenfor. Ære være årelange forsøk på å utvikle den perfekte CMS-en.</paragraph>
</section>'

Now this is just a guess but is this a new problem with non-english content or a suddenly appearing problem with a brand new configuration.

I don't know what I'm talking about but I would guess that PostgreSQL may benefit from configuration to support non-english characters or something along those lines ...

It's not an answer but it's an idea,

//kracker

Can I kick it ? - A Tribe Called Quest

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Monday 02 May 2005 7:52:11 pm

A little more looking and it seems that error is known in the postgresql nets ...

general:
http://www.google.com/search?num=50&hl=en&lr=&safe=off&c2coff=1&q=ERROR%3A+Invalid+UNICODE+character+sequence+found+&btnG=Search

very close: http://www.issociate.de/board/post/135979/Unicode_problem_inserting_records_-_Invalid_UNICODE_character_sequence_found_(0xfc7269).html
http://www.issociate.de/board/post/2862/Pb_with_the_French_accentuated_characters.html

It could simply be a language configuration breakdown between input text -> browser -> server -> eZ publish -> database ...

If it's not configuration (and it might not be, in leu of certain evidence). Then I would say your running : PostgreSQL 7.3.4 and not the suggested 7.3.6 (from the 2nd issociate.de link above) ...

//kracker

Grouch - Once Upon A Rhyme

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Monday 02 May 2005 8:19:06 pm

Well,

It was fun! I hope you all enjoyed it more than I did ;)

It was a good way to eat my mini-pizza dinner and take a few swings at an odd little bugger of an issue that had has been backgrounding in my head for a little while.

I think the ideas brought up really take a few solid steps in the right direction.

I think it will take a little more testing to be certain but once certain a plan of action / resolution becomes clear fairly quickly.

//kracker
cheaper than paying for support ....

2pac__tupac : there_u_go

2pac__tupac : still ballin

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Tuesday 03 May 2005 7:10:58 pm

Back to the lab again ...

I also wanted to note that Erik's PostgreSQL DB is (and has been confirmed to be) encoded in UNICODE.

//kracker

<i>anticon : family values : making love to your disk drive</i>

<b>anticon : family values : games (molemen feat. sebutone)</b>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Erik Ziesler

Wednesday 04 May 2005 9:30:36 am

I think it might be related to the PostgreSQL database 7.3.4. I have found out that it is accepting the sequence 'æøå', but not 'æ ø å'. It won't accept the letters 'æ', 'ø' or 'å', or the sequences 'Rå' and 'Rø'.

Erik Ziesler

Wednesday 04 May 2005 4:05:55 pm

I thought it had something to do with PostgreSQL ..., and it had <i>something</i> to do with the database, but the problem was really that the character encoding was not uniform. The string in site.ini specifying the character encoding for the database was also empty. When I changed all the character encoding settings (site.ini, template.ini, i18n.ini) to utf-8 I was able to save the article I previously couldn't. Because utf-8 is not working with the .pdf output, I made a new PostgreSQL database with LATIN10, installed the new eZ publish 3.5.2 and changed all encoding settings to iso-8859-15. Unfortunately I aquired another problem with the new eZ publish installation, one which I might post at a later time.

Thanks Kracker for putting me on the trail or rather pointing out two specific, probable causes.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.