As I am setting up Hadoop, one question keeps popping in my mind but I can't find the answer.
Which Hadoop configuration files need to be copied to which nodes. For example, I'm making changes to the following files:
hadoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml, masters, slaves
Do I need to copy these files to ALL my Hadoop nodes (which is kind of a pain if I update one file). Do only certain files need to be copied? Or, do I only need to make the changes on my master nodes?
Can't seem to find the answer anywhere, so I wanted to ask here. (Up to this point, I have been mirroring all the files across every node, but that seems inefficient. My setup does work.)
In terms of what reads which files:
hadoop-env.sh
: Everythingcore-site.xml
: Everythinghdfs-site.xml
: HDFS (NameNode, SecondaryNameNode, DataNode)mapred-site.xml
: MapReduce (JobTracker, TaskTracker)masters
andslaves
: I don't think that these are read by the applications directly, but are used by the management scripts instead.I would however suggest that setup a deployment system so you can easily distribute all these files to all nodes, instead of trying to figure out what needs what. This could just be a script which calls
ssh
with public key authentication, or it could be something like Puppet or Chef.