I have read in various places (e.g. here and here) that NFS' I/O performance does not scale, while Lustre's does, and that Lustre can deliver better I/O rates in general.
There seem to be various architectural differences between the two, but I can't identify what exactly gives Lustre the bump in speed or scalability. I guess my question is somewhat conceptual, but what is the key difference with respect to NFS, or feature, that allows Lustre to scale and deliver faster I/O rates?
NFS is from 1 client to 1 server, so the overall performance is limited by the performance of that 1 server. Adding more servers does not help.
Lustre splits the data, the data gets requested from 1 server, but can be sent from one or more other servers. So adding more servers does help (which is why "Lustre scales"). This is an important bit from your first link: