eZFind - indexing errors

Next topic

Author	Message
Fabien Mas	Friday 11 September 2009 2:32:02 am Hi, I have a lot of errors when I index my site : **** Warning: Fonts with Subtype = /TrueType should be embedded. But Arial-ItalicMT is not embedded. and so my object is not indexed after that ( not only the file datatype, all my object is not indexed) How can I solve it ? Thx
Paul Borgermans	Friday 11 September 2009 2:58:06 am What do you use for conversion of binary files? It seems you use pstotext as is the default setting (but far from the best) Paul eZ Publish, eZ Find, Solr expert consulting and training http://twitter.com/paulborgermans
Fabien Mas	Friday 11 September 2009 5:01:22 am Hi Paul, Effectively, I am using pstotext Which one do you advice me to use ? thx for your help :) Fabien
Vincent Tabary	Friday 11 September 2009 5:31:52 am Hi all, That could be interesting for me too :) I installed pstotext because eZFind asked for it but I do not know any other software for that Vinz http://vincent.tabary.me
Fabien Mas	Friday 11 September 2009 5:40:38 am I have activated the eztika extension but I have also some troubles Exception in thread "main" org.apache.tika.exception.TikaException: Unable to extract PDF content at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:58) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:108) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:80) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:111) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Paul Borgermans	Friday 11 September 2009 12:32:31 pm Hello eztika is not too robust wrt asian character sets, but should be fine with others For pdf in general, the best is to use xpdf tools You need to create a wrapper script for xpdf's pdftotext utility This is what I use (locally called ezpdftotext): #!/bin/sh /opt/local/bin/pdfinfo $1 >> /tmp/ezpdftotext.log /opt/local/bin/pdftotext -enc "UTF-8" $1 - the pdfinfo line is used for logging and can be suppressed if all goes well configuration wise So all considered: use eztika for everything except pdf, for which you should use xpdf Expect eztika to improve in the future, it is also getting into Solr (and when stable enough, eZ Find will use that instead of the binary file wrappers) Cheers Paul eZ Publish, eZ Find, Solr expert consulting and training http://twitter.com/paulborgermans
Fabien Mas	Monday 14 September 2009 7:59:32 am Hi Paul, I have created my own parser using xpdf. I have no error now. I log the text generated and it's ok But I have a new problem ;) With the default searchengine, it works well but with ezfind activated, no word of my file is indexed (even if xpdf works well) When I search a word, I have no result Is there a specific thing to do for ezfind ? Thx, Fabien
Fabien Mas	Thursday 17 September 2009 1:41:30 am I got it :) That was the pagebreaks who made mischief in the xml generated by solr so now I use this code and it works fine : pdftotext -enc "UTF-8" -eol unix -nopgbrk $1 -