Ping a Specific Port

Question

BigChief

Asked: 2011-07-17 11:17:47 +0800 CST2011-07-17 11:17:47 +0800 CST 2011-07-17 11:17:47 +0800 CST

Hadoop + NAT scenario

772

I have a situation where I'd like to run Hadoop spread across 2 clusters. The first cluster (ClusterA) is normal and all nodes are publicly accessible. The second cluster (ClusterB) is behind a NAT.

Nodes in ClusterA will be running both Mapred and HDFS, while nodes in ClusterB will be running Mapred without HDFS and will not be allowed to run Reduce Tasks. The master node (jobtracker, namenode, secondary namenode) will be in ClusterA.

My question is: if I start the ClusterB TaskTrackers independently without using bin/start-all.sh from the JobTracker, will this setup work? TaskTrackers in ClusterB will open their own C&C connection to the JobTracker, and should receive MapTask assignments via this connection. HDFS will be entirely in ClusterA, so all nodes should be able to access chunks fine.

The only issue I can think of is Reduce tasks running in ClusterA attempting to get intermediate data stored on ClusterB nodes. Is this a push or a pull operation? Are there any other scenarios where the NAT will cause problems?

1 Answers

Voted

BigChief · Answer 1 · 2011-07-24T07:44:38+08:00

Best Answer

BigChief

2011-07-24T07:44:38+08:002011-07-24T07:44:38+08:00

The answer is the Reduce tasks do pull and therefore need access through the NAT. Otherwise it appears to work.

0

Hadoop + NAT scenario

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?