we have hadoop cluster ( Ambari platform with HDP version - 2.6.4 )
and we performed verification step in order to understand if we have under replica blocks
the first verification was with:
su hdfs
hdfs fsck / - -->
its gives the results:
Total size: 17653549013347 B (Total open files size: 854433698229 B)
Total dirs: 843714
Total files: 11752836
Total symlinks: 0 (Files currently being written: 16)
Total blocks (validated): 11792203 (avg. block size 1497052 B) (Total open file blocks (not validated): 6381)
Minimally replicated blocks: 11792203 (100.00001 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 6
Number of racks: 1
so as we can see above Under-replicated blocks
is 0
BUT
when we perform the next verification:
hdfs dfsadmin -report
then we get
Configured Capacity: 141275429535744 (128.49 TB)
Present Capacity: 140886991802565 (128.14 TB)
DFS Remaining: 84748655941292 (77.08 TB)
DFS Used: 56138335861273 (51.06 TB)
DFS Used%: 39.85%
Under replicated blocks: 4212067
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
so from above we can see that Under replicated blocks is --> 4212067
about to know what is the right under replica number:
why we get differences between hdfs fsck /
and hdfs dfsadmin -report
?
BTW - from Ambari we get the ~ same results as from hdfs dfsadmin -report