Ian

Asked: 2010-02-23 07:14:23 +0800 CST2010-02-23 07:14:23 +0800 CST 2010-02-23 07:14:23 +0800 CST

Site crawler/spider that tosses results into mysql

It's been suggested that we use mysql for our site's search as it'd be running on the same server that hosts our web server (nginx) and our db (mysql).

Since not all of our pages are created from the database, it's been suggested that we have a crawler that can crawl the site, and toss the page url and data into mysql and have sphinx index on that.

Does anyone know of an open source spider that has a mysql storing option out of the box.

Thanks.

1 Answers

Voted

Best Answer

konung
2010-03-18T07:54:55+08:002010-03-18T07:54:55+08:00
I think sphider is what you are looking for - we had ok results with it before. Plus it can index pdfs and docs, which is very useful

http://www.sphider.eu/
1

Site crawler/spider that tosses results into mysql

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?