Has anybody managed to get strigi to index pdf files?
I have been playing around with strigi, the file indexing program, but have not been able to index .pdf files. From my research it seems that this should be possible, as strigi uses pdftotext to convert .pdf files to text for indexing. However, even for .pdf files that I have created from simple text files, indexing has not been successful.
This has been tried in slackware-current and slackware64 using both the supplied package of strigi-0.6.4 and the latest strigi-0.6.5. (In slackware64 I have added a symlink 'ln -s /usr/lib64/gcj-4.3.3-9/libjvm.so /usr/lib64/libjvm.so' to enable java and then edited ~/.kde/share/config/nepomukserverrc to change the soprano backend to sesame2 to be the same as in slackware-current).
If I try 'xmlindexer <some_pdf_file>' I do not see the expected text output and no .pdf files are shown when 'strigiclient' is used to search for a known text string.
Yet strigi is happily indexing openoffice .odt files and MS Word .doc files.
I am curious as to whether anybody has this working, and if so, were any tweaks required?
That will be using some library to read pdfs. If that's missing on the system, compile won't fail, but pdf functionality will be disabled.
recompile from source, read the docs, and you will find which libs to install. try this:
which strigli and this will give you the exact path. Then
ldd /path/to/strigli shows you the libraries it uses.
ldd /path/to/strigli |grep found shows you the missing ones
Thanks for the reply. I have checked with ldd and have not found any missing libraries for strigiclient.
This is not a showstopper for me, but it could be a very useful tool. Perhaps when Sebastian Trüg has finished fixing k3b for KDE4, then this will advance a little further.
ldd will only show what the binary was linked against, not what it *could* have linked against. If there's a compile option to include some other libraries, that'll show up in the install or readme files of the source (I'd hope). I don't have everything to hand to check it myself right now, or I would :) Maybe later on.
|All times are GMT -5. The time now is 04:03 PM.|