Ping a Specific Port

Question

wlk

Asked: 2010-06-26 12:23:05 +0800 CST2010-06-26 12:23:05 +0800 CST 2010-06-26 12:23:05 +0800 CST

Hadoop disk fail, what do you do?

772

I would like to know about your strategies on what to do when one of the Hadoop server disk fails.

Let's say, I have multiple (>15) Hadoop servers and 1 namenode, and one from 6 disks on slaves stops working, disks are connected via SAS. I don't care about retrieving data from this disk, but for general strategies for keeping cluster running.

What do you do?

2 Answers

Voted

Amala · Answer 1 · 2010-09-01T16:05:12+08:00

Best Answer

Amala

2010-09-01T16:05:12+08:002010-09-01T16:05:12+08:00

We deployed hadoop. You can specify replication numbers for files. How many times a file gets replicated. Hadoop has a single point of failure on the namenode. If you are worried about disks going out, increase replication to 3 or more.

Then if a disk goes bad, it's very simple. Throw it out and reformat. Hadoop will adjust automatically. In fact as soon as a disk goes out, it will start rebalancing files to maintain the replication numbers.

I am not sure why you have such a large bounty. You said you don't care to retrieve data. Hadoop only has a single point of failure on the name node. All other nodes are expendable.

3

Rob Olmos · Answer 2 · 2010-09-03T23:49:02+08:00

Rob Olmos

2010-09-03T23:49:02+08:002010-09-03T23:49:02+08:00

You mentioned this system was inherited (possibly not up to date) and that the load shoots up indicating a possible infinite loop. Does this bug report describe your situation?

https://issues.apache.org/jira/browse/HDFS-466

If so, it's been reported as fixed in the latest HDFS 0.21.0 (just released last week):

http://hadoop.apache.org/hdfs/docs/current/releasenotes.html

Disclaimer: To my disappointment I have yet to have the need to use Hadoop/HDFS :)

3

Hadoop disk fail, what do you do?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?