I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file.
I used -copyFromLocal
to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current copying status.
Is there any way to show the current file copying status with this command?
Thank you guys in advance for your help!
CopyFromLocal
does not have the ability to display the file copy progress. Alternatively, you could open another shell and run the$ watch hadoop fs -ls <filenameyouarecopying>
. This will display the file and its size once every 2.0 seconds.It is also possible to track the progress of reading of the local file using
pv
command and pipe the file content tohdfs dfs
stdin:pv mylargefile.txt | hdfs dfs -put - /path/to/file/on/hdfs/mylargefile.txt
It doesn't look like there's a verbose option to any of the copy commands (copyFromLocal, copyToLocal, get, put). Your best bet is probably to look at the size of the file at it's destination on HDFS in order to gauge it's progress.
You can use "nohup &" to execute the copying as a background process. nohup will make the process to execute even after you log out of the server. When ever you need, you can check the process using "hadoop fs -ls .