from my understanding Using 20-30TB disks with HDFS can present some challenges, but it can also be managed effectively with proper configuration
using 20-30TB disks with HDFS is possible, it requires careful consideration of block size, rebuild times, data distribution, metadata management, and performance. Proper planning and configuration can help mitigate these challenges.
Performance: Large disks can lead to longer seek times and potentially impact performance, especially for workloads that require frequent random access.
based on above can we intend to use disks of 20T-30T on our new data nodes machines?
Note we intend to install from scratch 16 data nodes machines based on DELL HW , when each data node should contain 12 NON-RAID disks (when each disk size is ~22T)
Disclaimer: OK, I'll get my hands dirty and would summarize everything.
Summary: Modern HDDs typically run out of IOPS long before they run out of capacity, so you can get enormous storage pool, the only question is, would you have enough IOPS to power your configuration? Ment'd 4TB and 22TB 7,200 rpm SATA / NL-SAS drives are limited with maybe ~80 IOPS, capacity's irrelevant. Is 80 IOPS per physical drive enough to full fill your requirements? We don't know, you do your math!
Hint: Parity RAID configs have a write penalty on top, so be aware!
https://theithollow.com/2012/03/21/understanding-raid-penalty/
Warning: SMR drives are performance hogs (They are like tapes basically...) and can't be used for anything except sequential writes mostly (Think about CCTV workloads and tape emulation maybe...), you absolutely need "classic" CMR HDDs here. See:
https://www.pitsdatarecovery.co.uk/blog/cmr-vs-smr/