I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration files (conventionally under /etc/hadoop/conf/
--- at least for the Cloudera Distribution of Hadoop (CDH). Can we define multiple Job Trackers in mapred-site.xml
? Something like:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jt01.mydomain.not:8021</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>jt02.mydomain.not:8021</value>
</property>
...
</configuration>
Or is there some other allowed syntax for this?
What are the implications of doing this. Does each JobTracker get information about the load on each TaskTracker node. In other words can the two JobTracker co-ordinated their scheduling across the TT nodes only based on the gossip information from the TTs or would they need to talk to one another?
Is this documented anywhere?
Multiple JobTracker can be useful in Multi-Cluster architecture. So the cluster level load can be distributed between the JobTrackers.
In a single cluster, following could become issues.
(a) If multiple JobTracker servers will share an HDFS cluster, each must have a different mapred.system.dir, or the JobTrackers will delete each other's job files.
(b) Admin script "Start-all or stop-all" will become an issue, unless each gets a different port.