We will have a machine at work, that on peak performance, should be able to push 50 ("write heads") x 75GB of data per hour. That's peak performance of ~1100MB/s write speed. To get that from the machine, it requires two 10GBi lines. My question is what kind of server+technology can handle/store such data flow ?
Currently for data storage we work with ZFS, although write speeds were never a question. (we are not even close to these speeds) Would ZFS (zfs on linux) be an option ? We also need to store a lot of data, the "IT guide" suggests somewhere between 50-75 TB in total. So it probably can't be all SSDs unless we want to offer our first-born child.
Some additions based on the excellent replies :
- the maximum is 50x75GB/hour during peak which is less than 24h (most likely <6h)
- We don't expect this to happen soon, most likely we will run 5-10x75GB/hour
- it's a pre-alpha machine, however requirements should be met (even though a lot of question marks are in play)
- we would use NFS as connection from the machine to the server
- layout : generating machine -> storage (this one) -> (safe raid 6) -> compute cluster
- so read speed is not essential, but it would be nice to use it from the compute cluster (but this is completely optional)
- most likely it's going to be large data files (not many small)
For such extreme write speed, I suggest against ZFS, BTRFS or any CoW filesystem. I would use XFS, which is extremely efficient on large/streaming transfer.
There are many missing informations (how do you plan to access these data? are read speed important? are you going to write in large chunks? etc.) to give you specific advices, however some general advices are:
Absolutely... ZFS on Linux is a possibility if architected correctly. There are many cases of poor ZFS design, but done well, your requirements can be met.
So the main determinant will be how you're connecting to this data storage system. Is it NFS? CIFS? How are the clients connecting to the storage? Or is the processing, etc. done on the storage system?
Fill in some more details and we can see if we can help.
For instance, if this is NFS and with synchronous mounts, then it's definitely possible to scale ZFS on Linux to meet the write performance needs and still maintain the long-term storage capacity requirement. Is the data compressible? How is each client connected? Gigabit ethernet?
Edit:
Okay, I'll bite:
Here's a spec that's roughly $17k-$23k and fits in a 2U rack space.
This setup would provide you 80TB usable space using either hardware RAID6 or ZFS RAIDZ2.
Since the focus is NFS-based performance (assuming synchronous writes), we can absorb all of those easily with the P3608 NVMe drives (striped SLOG). They can accommodate 3GB/s in sequential writes and have a high enough endurance rating to continuously handle the workload you've described. The drives can easily be overprovisioned to add some protections under a SLOG use case.
With the NFS workload, the writes will be coalesced and flushed to spinning disk. Under Linux, we would tune this to flush every 15-30 seconds. The spinning disks could handle this and may benefit even more if this data is compressible.
The server can be expanded with 4 more open PCIe slots and an additional port for dual-port 10GbE FLR adapters. So you have networking flexibility.
25Gbps Ethernet is already borderline-mainstream while PCIe-base NVMe will lap up that traffic easily.
For reference I recently built a small 'log capture' solution using four regular dual-xeon servers (HPE DL380 Gen9s in this case), each with 6 x NVMe drives, I used IP over Infiniband but those 25/40Gbps NICs would be the same and we're capturing up to 8GBps per server - works a treat.
Basically it's not cheap but it's very do'able these days.
Doesn't sound like a big deal. Our local hardware supplier has this as a standard product - apparently it can push 1400MB/s sustained in CCTV recording mode, which should be harder than your peak requirements.
(Link is to default 12GB config, but they note 20x4TB is also an option. No personal experience with this particular model server.)
Sequential writes at 1100MB/s are not an issue with modern hardware. Anecdotally, my home setup with 8x5900 RPM laptop drives, 2x15000 RPM drives and 2x7200 RPM drives sustains 300 MB/s with a 16GB one-off payload.
The network is a 10GbE with fiber cables, 9000 MTU on ethernet, and the application layer is Samba 3.0. The storage is configured in raid50 with three stripes over three 4-drive raid5 volumes. The controller is LSI MegaRAID SAS 9271-8i with up to 6Gb/s per port (I have an additional, slower port-multiplier).
Talk to any seasoned sysadmin and they should be able to tell you exactly which controller(s) and drives would meet your requirements.
I think you can try with any 12Gb/s controller and configure two mirrored stripes of eight 7200 RPM drives each (almost any drive should do). Start 3-4 TCP connections to saturate the link and if a single pair of 10GbE cards can't handle it, use four cards.
Something of a tangent, but consider using InfiniBand instead of dual 10GbE links. You can get 56Gbps Infiniband cards quite cheap, or 100Gbps ones for not too much more, and on Linux it's easy to use NFS with RDMA over IB, which will give you extremely low latency and near theoretical line speed throughput (if your underlying storage can handle it). You don't need a switch, just two InfiniBand cards and a direct attach cable (or an InfiniBand fiber cable if you need longer distances).
A single-port Mellanox 56Gbps card (8x PCIe 3.0) like the MCB191A-FCAT is less than 700 bucks, and a 2-meter copper direct attach cable is like 80 dollars.
Performance will generally blow 10GbE out of the water in all use cases. There are no downsides, unless you need to access the server from lots of different clients that can't all use InfiniBand (and even then, Mellanox' switches can bridge 10GbE and 40GbE to IB, but that is a bit more of an investment, of course).
Doing this with ZFS is possible, however, consider using FreeBSD as FreeBSD has the faster network stack. This would allow possibly 100 GBit on a single machine.
1100 MBps sounds like a lot, but you can realistically achieve this by using only regular harddrives. You say you need 75 TB of space, so you could use 24 8 TB harddrives in mirrors. This would give you 12x write speed of a single drive, and 24x drive read speed. Since these drives have more write speed than 100 MBps, this should easily be able to handle the bandwidth. Make extra sure to not get SMR drives, as these have hugely slower write speeds.
ZFS does create checksums for every block. This is implemented single-threaded. As such, you should have a CPU with a reasonably fast clock rate to not block.
However, exact implementation details hugely depend on details.
We have pegged a 10G NIC dumping data to a Gluster cluster over their fuse client. It takes a little tuning bit you wouldn't believe the performance it can achieve since 3.0.