I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks.
I've done this before within a Hadoop workload using the org.apache.hadoop.fs.FileSystem.create() function, but is there a way to do it from the commandline ?
You can do this by setting -Ddfs.block.size=something with your hadoop fs command. For example:
As you can see here, the block size changes to what you define on the command line (in my case, the default is 64MB, but I'm changing it down to 1MB here).
NOTE FOR HADOOP 0.21 There's an issue in 0.21 here you have to use -D dfs.blocksize instead of -D dfs.block.size