We are developing and deploying a data warehouse based on Oracle 11g EE with partitioning on RHEL5 64bit for a client.
The total storage required would be around 4TB of usable space.
I have been reading about balanced hardware configuration of data warehouses and that storage throughput is essential to the performance of a data warehouse; storage should be spec'd for sequential throughput not capacity, stripe and mirror everything.
Our customer is very keen on using their SAN (NetApp) for a number of reasons: e.g. centrally managed and backed up and they have already spent a lot of money on it.
It seems to me that even a relatively small number of local disks could have better throughput than a SAN e.g.
16 x 10k RPM 600GB SFF disks = 9600 GB raw = 4800 GB usable space using RAID-10
If each disk can produce 60MB/s throughput the aggregate throughput is 8 x 60 = 480 MB/s
My question: Is it possible to get >400MB/s sequential throughput to a single Oracle database host connected to a SAN?
I realize I'll need at least a 4Gbps or greater connection to the SAN. I see no theoretical reason why a SAN couldn't deliver this speed given enough disks.
Can a SAN deliver data at "data warehouse" rates in practice (at least 400MB/s)? Does anyone see these speeds in the real world? Perhaps there is some limit that I am not aware of that prevents seeing these kinds of rates to a single host via a SAN.
We have lots of Oracle and Red Hat experience but we are not that familiar with SAN. We are a small company and don't have one internally.
You might find that theoretical SAN throughput bears little or no relation to actual SAN throughput when the SAN is a centrally-hosted SAN serving many existing clients within an enterprise. The SAN is likely managed by a third-party and they will have their own SLAs on which to deliver (perhaps focussed on availability rather than throughput).
My advice would be to request current typical throughput figures from the SAN and ask what throughput you might expect for your application/database. If you get a cautious response you should go down the route of specifying a throughput-based SLA for the SAN side of your project.