I work part time for a small private school. The 24 node computer lab kept having hardware failures (mostly drives and cooling fans) so I turned it into a Linux based thin client network. Although the workstations now boot from the network, most still have working hard drives. They also use only a fraction of their computing power to run an x server.
I'm looking for ways to but these computing resources to good use. Each workstation has a 40GB HDD, a Pentium 4 processor and 256M RAM.
I've considered:
- Installing a fault tolerant distributed file system on each of the workstations. This would make use of both the hard drive space and computing resources of each workstation and yet continued hardware failures would have minimal impact.
- Removing the hard drives and putting them in a couple of file servers. Running a distributed computing client on the workstations to take advantage of free CPU cycles. Ok, though I'm sure to find a place for a few more file servers, I'll admit I don't really have any application in mind for a distributed processing environment.
If you think the first idea has merit, I'd be interested in any information you can give on the various distributed file systems available. I did a bit of searching but couldn't find one that really fit the situation. I'm looking for redundancy and fault tolerance but it needs to have support for user and group level access restrictions as well.
Any other suggestions would be appreciated as well.
As Kevin said, pull the drives, keeping them powered up is a waste when you could buy that amount of storage again in a year for the cost of the saved power. For that matter, unless there is a particularly compelling reason to spend the money, your best bet is to leave them running as only thin clients.
Set them to suspend as quickly as is reasonable after use. It's cool to have extra computing power, but the cost of keeping machines powered 24x7 adds up quickly, especially what I assume are probably slightly older desktop machines. If you run some kind of distributed computing project on them, you will increase your power usage significantly, and hasten the demise of hardware that's already near death.
It feels wasteful of the cycles, but you don't want something running in the background causing sluggish UI for your users. When they aren't using the machines, they should be powered down, be that suspend or all the way off. Power is expensive.
Alternatively, you could petition for budget to distribute the existing machines into classrooms, and then purchase dedicated thin-client machines for the lab. This would probably put your existing overpowered hardware to better use than anything else I can think of.
I wouldn't store confidential data on the hard disk drives of these PCs. Physical access implies access to the data stored within. Unless you use some kind of encrypted storage you run a risk that students with physical access to the machines could access data stored there, logical access control mechanisms aside.
You could run iSCSI targets on the machines, I suppose, and use them as some kind of RAID. Without a dedicated network for iSCSI, though, you're going to have reliability issues and throughput will be variable.
I suppose you could install something like Hadoop on the computers. If your school has any programming classes a project on distributed systems might be something worthwhile.
Chuck 'em. Hard drives take about 10 watts of power and if this lab is like any other school labs, systems are left on at all times correct?
10 watts / 1000 watts x 24 hours x 30 days = 7.2 kW/month x 24 drives = 172.8 kW/month
Call it a green lab, save some dough, whatever your motivation just chuck them.