ezfind : problem with special chars and pdf

Author Message

Romain Bremaud

Tuesday 07 December 2010 1:31:21 am

Hello everybody,

I use the following code for indexing my pdf :

http://share.ez.no/learn/ez-publish/indexing-multiple-binary-file-types/%28page%29/3

This script use the xpdf library http://www.foolabs.com/xpdf/download.html

The problem is when I use the following command line : php updatesearchindexsolr.php -s <admin siteacces> the pdf are indexed but the special chars disappear and are replaced by a white space.

But if I do the same thing with the command line interface : pdftotext example.pdf example.txt It works.

I do not manage to identify why it doesn't work...

Thanks in advance.

Romain Bremaud
Les clefs du net

Ivo Lukac

Tuesday 07 December 2010 1:52:55 am

Hi Romain,

I would recommend using http://projects.ez.no/eztika as it deals with special characters and non-latin alphabets much better than xpdf, in my experience

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Romain Bremaud

Tuesday 07 December 2010 6:05:28 am

Thanks for your help. It's work with eztika :)

But It was hard to configure it because I work on a window's environnement. But now it's work ^^

Thanks

Romain Bremaud
Les clefs du net

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.