Forums / Setup & design / PDF Indexing

PDF Indexing

Author Message

Betsy Gamrat

Friday 29 December 2006 7:42:48 pm

Hi,

I followed the directions on this page: http://ez.no/ezpublish/documentation/configuration/optimization/speeding_up_acrobat_pdf_document_indexing_ and was able to index PDFs, with no trouble.

I wanted to install ezpdftotext and pdftotext on the server in /usr/local/bin, so I could access them from all the sites on the server, but I can't get it to work.

I checked the server error logs, and the eZ logs, and they weren't helpful.

I ran the commands with just straight PHP, using passthru, and everything was okay.

Any ideas?

Thank you in advance,

Betsy

kracker (the)

Friday 29 December 2006 11:37:25 pm

Betsy,

That documentation entry looks rather dated despite the fresh notes (comments).

Have you read this article?
<i>http://ez.no/layout/set/printarticle/community/articles/indexing_multiple_binary_file_types</i>

I did and was then sent down this path.
<i>http://ez.no/community/forum/developer/binary_file_search_index_creation_debugging_3_7_4
http://ezpedia.org/wiki/en/ez/references
http://ezpedia.org/wiki/en/ez/solution_building_php_cli_for_ez_publish_command_line_scripts
http://ezpedia.org/wiki/en/ez/debugging
http://ezpedia.org/wiki/en/ez/tips_for_working_with_ez_publish_cli_scripts
</i>

eZ search is simple to setup, yet one can run into php references problems (segfaults) without a custom patched php-cli binary.

<i>I got by with a little help from my friends.</i>

Cheers,
<i>//kracker

eminem : don,t call me' marshall</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Betsy Gamrat

Saturday 30 December 2006 7:03:24 am

Kracker,

Thank you, I have alot of information to review.

My real question is: why will <b>ezpdftotext</b> run out of the site's local directory, but not out of <i>/usr/local/bin</i>. The indexing works great as described on the rather dated post - except I can't make it available to the rest of the server.

Since I got <b>ezpdftotext</b> to run under PHP, and execute the scripts in all the locations, I was wondering if eZ had some security settings that prevented execution of scripts outside the local directory. I did check the server settings, and since the code ran okay outside of eZ, I am assuming the settings are alright.

My goal is to construct a robust infrastructure that will allow extremely efficient deployment of eZ sites. :)

Paul Borgermans

Saturday 30 December 2006 7:24:08 am

Hi Betsy

I do exactly this (putting it in /usr/local/bin to share among different web sites) so it really looks like a permission (or maybe a path problem). I would add some hard coded statements and increase the log level (error_reporting in php.ini) if there is not enough in the debug output.

Good luck identifying the problem
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Betsy Gamrat

Saturday 30 December 2006 11:04:11 am

Hi,

I tried one more time, before calling for help ... and it worked. <b>:)</b>

After all is said and done, these are the key components:

A <b>custom class</b> (I called mine 'File - Indexed') that has a file attribute and sets the 'Is
Searchable' flag to true for that attribute.

<b>/usr/local/bin</b> has these files

-rwxr-xr-x 3 root root 62 Dec 30 12:51 <b>ezpdftotext*</b>
-rwxr-xr-x 3 root root 1135987 Dec 24 08:34 <b>pdftotext*</b>

<b>ezpdftotext</b>

#!/bin/sh
#ezpdftotext script
<b>/usr/local/bin/pdftotext</b> $1 -

<b>override/binaryfile.ini.append.php</b>

<?php /* #?ini charset="utf8"?

[PDFHandlerSettings]
TextExtractionTool=<b>/usr/local/bin/ezpdftotext</b>

*/ ?>

I think the path in the ini file is probably unnecessary, because
/usr/local/bin is in the path anyway.

I uploaded a PDF file, and it worked.

Thanks for the support - it really helped.

Betsy