Ping a Specific Port

Question

BigChief

Asked: 2011-08-12 07:22:19 +0800 CST2011-08-12 07:22:19 +0800 CST 2011-08-12 07:22:19 +0800 CST

Hadoop HDFS: set file block size from commandline?

772

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks.

I've done this before within a Hadoop workload using the org.apache.hadoop.fs.FileSystem.create() function, but is there a way to do it from the commandline ?

2 Answers

Voted

Travis Campbell · Answer 1 · 2011-08-16T11:37:18+08:00

You can do this by setting -Ddfs.block.size=something with your hadoop fs command. For example:

hadoop fs -Ddfs.block.size=1048576  -put ganglia-3.2.0-1.src.rpm /home/hcoyote

As you can see here, the block size changes to what you define on the command line (in my case, the default is 64MB, but I'm changing it down to 1MB here).

:;  hadoop fsck -blocks -files -locations /home/hcoyote/ganglia-3.2.0-1.src.rpm 
FSCK started by hcoyote from /10.1.1.111 for path /home/hcoyote/ganglia-3.2.0-1.src.rpm at Mon Aug 15 14:34:14 CDT 2011
/home/hcoyote/ganglia-3.2.0-1.src.rpm 1376561 bytes, 2 block(s):  OK
0. blk_5365260307246279706_901858 len=1048576 repl=3 [10.1.1.115:50010, 10.1.1.105:50010, 10.1.1.119:50010]
1. blk_-6347324528974215118_901858 len=327985 repl=3 [10.1.1.106:50010, 10.1.1.105:50010, 10.1.1.104:50010]

Status: HEALTHY
 Total size:    1376561 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):  2 (avg. block size 688280 B)
 Minimally replicated blocks:   2 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      12
 Number of racks:       1
FSCK ended at Mon Aug 15 14:34:14 CDT 2011 in 0 milliseconds


The filesystem under path '/home/hcoyote/ganglia-3.2.0-1.src.rpm' is HEALTHY

Buck · Answer 2 · 2011-09-06T13:32:19+08:00

Buck

2011-09-06T13:32:19+08:002011-09-06T13:32:19+08:00

NOTE FOR HADOOP 0.21 There's an issue in 0.21 here you have to use -D dfs.blocksize instead of -D dfs.block.size

1

Hadoop HDFS: set file block size from commandline?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?