Using IFilters for indexing binary files

Author Message

Jonathan Cutting

Thursday 20 January 2005 1:08:34 pm

Perhaps someone has already covered this but I find no mention of it in the documentation. For those of you working on Windows, a convenient method for indexing binary files is the IFilter mechanism used in Microsoft Indexing Service.

The Microsoft Platform SDK has an executable in the bin directory called FiltDump.exe. It takes the name of a file as an argument and uses the registered IFilter, if any, to print the file's text content to stdout.

For example, the command

filtdump -b test.doc

 

will dump the contents of test.doc to stdout using the IFilter registered for .doc files. The -b switch turns off error messages and other extraneous information. Note that Indexing Service must be installed but it need not be running for this to work.

IFilters for HTML, Word, Excel, Visio, Powerpoint, and plain text are available from Microsoft. An IFilter for PDF is available from Adobe. Others - including StarOffice/OpenOffice, DWG, etc. are available commercially.

Now, I've tried to implement this in ezPublish 3.5.0 (Windows installer version) but without success. I've overridden binaryfile.ini, I've cleared all caches, I've rebuilt the search index manually with the --clean option, and I've marked binary file attributes as searchable in classes of interest. Still no luck.

My binaryfile.ini overrides:

[HandlerSettings]
MetaDataExtractor[application/pdf]=IFilter
MetaDataExtractor[application/msword]=IFilter

[IFilterHandlerSettings]
TextExtractionTool=filtdump -b

I've tried locating filtdump.exe in a number of different places, including in the ezpublish root and in a directory on the system search path. I have no evidence that it's being executed at all. I've also tried making it run a batch script:

@ECHO OFF
ECHO %1
filtdump -b %1

 

Still no luck.

Can someone please help me understand what needs to be done to make this work? Where should filtdump.exe be located? Do Apache or PHP need to be configured any differently? Again, I'm using the basic Windows installer for 3.5.0 - nothing special.

Jonathan

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.