It's been suggested that we use mysql for our site's search as it'd be running on the same server that hosts our web server (nginx) and our db (mysql).
Since not all of our pages are created from the database, it's been suggested that we have a crawler that can crawl the site, and toss the page url and data into mysql and have sphinx index on that.
Does anyone know of an open source spider that has a mysql storing option out of the box.
Thanks.
I think sphider is what you are looking for - we had ok results with it before. Plus it can index pdfs and docs, which is very useful
http://www.sphider.eu/