The company I work for produces software that takes in Gigabytes of data a day, and outputs (lesser amounts of) Gigs a day. We have a significant infrastructure for managing data flow, and by significant I mean significantly bad. When we require a new type of data in our software, we develop a script for pulling data from an HTTP or FTP source, drop it on a server, and go. There are often complaints of over/under-utilization across most of our servers, but we have no good way for managing the balance of resources spent retrieving / writing out data.
Is there a software package that is designed to manage data transfer externally (retrieval) as well as passing it around from server-to-server inside our network?
I'm looking for something that offers;
- A common base for simplifying the design of download scripts.
- A logging/monitoring framework for knowing what's going on, what may have not been retrieved, etc. (Bonus points if the monitoring aspect can integrate into NAGIOS.)
- A single location for viewing the status of download tasks, etc. (i.e. A Dashboard)
- An implementation meant to be run across multiple servers, with ultimately what amounts to multiple server download slaves.
- Simplicity (relatively speaking). Again, we're trying to get off of a nightmare infrastructure.
The OASIS group published a standard language for defining "business processes" called BPEL (Business Process Execution Language). We have used the expensive, commercial version for a few years for the very reasons you listed: maintainability, scalability, extensibility, verifiability. Fortunately for you, many projects have taken these tasks on. Some open-source versions are at freshmeat.net. Us mere mortals call this "workflow software." One package that caught my eye was Orchestra as it is both LGPL and comes with commercial support options. Given that it is a critical business process, you may want some level of support, or at least contribute back any changes to the community.
Fortunately for you, many, many others have blazed this trail. It's just a matter of following in a way that is politically correct in your organization. I recommend a private prototype and a demonstration to get senior management to buy in to a larger, business-relevant test. Get the boss to buy-in fairly early and you might be able to change this core business practice.