Have a HDFS/Hadoop cluster setup and am looking into tuning.
I wonder if changing the default HDFS replication factor (default:3) to something bigger will improve mapper performance, at the obvious expense of increasing disk storage used?
My reasoning being that if the data is already replicated to more nodes, mapper jobs can be run on more nodes in parallel without any data streaming/copying?
Anyone got any opinions?