Ping a Specific Port

Question

pufferfish

Asked: 2009-07-02 05:42:53 +0800 CST2009-07-02 05:42:53 +0800 CST 2009-07-02 05:42:53 +0800 CST

Indexing PDF files on Ubuntu

772

I'm looking for a solution in Ubuntu that indexes PDF (and ps?) files for searching later.

The criteria would be:

Compatibility: Often extracting text varies, depending on what software was used to create the PDF. Some PDFs can also be "locked", which I guess one should respect.
Search functionality: wildcards, regex's, "fuzzy" matching.
Speed of search

In my case I want to index a folder of academic journal articles, hence the requirement that it works consistently regardless of what software created the PDF. I'm already using a reference manager so would rather not replace that.

For example: A good front-end to Beagle, and a plugin that allows it to index PDFs would be perfect.

3 Answers

Voted

wzzrd · Answer 1 · 2009-07-03T22:51:58+08:00

Best Answer

wzzrd

2009-07-03T22:51:58+08:002009-07-03T22:51:58+08:00

Tracker does the same thing as Beagle and Strigi, but contrary to Beagle, it's written in pure C (Beagle is a Mono application). Allegedly, it is a lot faster than Beagle, though I haven't done the math myself.

I can't find you a link to Tracker, but I'm sure it's in the default Ubuntu repositories.

2

sleske · Answer 2 · 2009-07-02T06:44:45+08:00

sleske

2009-07-02T06:44:45+08:002009-07-02T06:44:45+08:00

Lucene does fulltext indexing of PDF, HTML, Microsoft Word, and OpenDocument. It's just a library, but there are several applications/CMS using it, or you could use it as a base for your own solution.

It is free software (Apache license).

Edit:

If you are looking for something with a frontend, you might consider Beagle or Strigi:

Beagle

Strigi

1

MercerKernel · Answer 3 · 2009-07-03T18:18:33+08:00

MercerKernel

2009-07-03T18:18:33+08:002009-07-03T18:18:33+08:00

I use google desktop for searching on linux. Not free, but it's the best i've found.

0

Indexing PDF files on Ubuntu

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?