Ping a Specific Port

Question

Jim Dennis

Asked: 2012-08-29 10:33:08 +0800 CST2012-08-29 10:33:08 +0800 CST 2012-08-29 10:33:08 +0800 CST

Implications of Multiple JobTracker nodes in a Hadoop cluster?

772

I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration files (conventionally under /etc/hadoop/conf/ --- at least for the Cloudera Distribution of Hadoop (CDH). Can we define multiple Job Trackers in mapred-site.xml? Something like:

<configuration>
   <property>
     <name>mapred.job.tracker</name>
     <value>jt01.mydomain.not:8021</value>
   </property>
   <property>
     <name>mapred.job.tracker</name>
     <value>jt02.mydomain.not:8021</value>
   </property>
...
</configuration>

Or is there some other allowed syntax for this?

What are the implications of doing this. Does each JobTracker get information about the load on each TaskTracker node. In other words can the two JobTracker co-ordinated their scheduling across the TT nodes only based on the gossip information from the TTs or would they need to talk to one another?

Is this documented anywhere?

1 Answers

Voted

Chakri · Answer 1 · 2012-08-31T12:58:40+08:00

Best Answer

Chakri

2012-08-31T12:58:40+08:002012-08-31T12:58:40+08:00

Multiple JobTracker can be useful in Multi-Cluster architecture. So the cluster level load can be distributed between the JobTrackers.

In a single cluster, following could become issues.

(a) If multiple JobTracker servers will share an HDFS cluster, each must have a different mapred.system.dir, or the JobTrackers will delete each other's job files.

(b) Admin script "Start-all or stop-all" will become an issue, unless each gets a different port.

3

Implications of Multiple JobTracker nodes in a Hadoop cluster?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?