Friday 11 September 2009 12:32:31 pm
Hello eztika is not too robust wrt asian character sets, but should be fine with others For pdf in general, the best is to use xpdf tools You need to create a wrapper script for xpdf's pdftotext utility This is what I use (locally called ezpdftotext):
#!/bin/sh
/opt/local/bin/pdfinfo $1 >> /tmp/ezpdftotext.log
/opt/local/bin/pdftotext -enc "UTF-8" $1 -
the pdfinfo line is used for logging and can be suppressed if all goes well configuration wise
So all considered: use eztika for everything except pdf, for which you should use xpdf Expect eztika to improve in the future, it is also getting into Solr (and when stable enough, eZ Find will use that instead of the binary file wrappers)
Cheers Paul
eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans
|