Historically the common wisdom has been that web server performance (which is to say a high-volatility workload with relatively short per-transaction lifetimes) is much more a function of available memory than number of cores. The OS process scheduler will employ considerably deeper magic than just round-robin-ing the most CPU-intensive processes among its available processors; rather than trying to second-guess the scheduler, your better bet is to ensure you've got enough RAM to keep more shallow-queue processes alive than your expected concurrent-request load, and let the scheduler handle how to get them cycles in a timely manner.
The answer is that it depends on the application you're serving and in particular the runtime.
If you're using Python or Ruby then you will probably want one process per logical core – unless your application uses a lot of native code that is able to use multiple cores.
If you're using Go or another language that is able to run code concurrently on multiple cores then you need only a single process.
As for whether you need a deep or shallow queue (socket listen queue length), this needs to be at least as long as the number of processes x number of threads and longer depending on what's in front of uWSGI.
Historically the common wisdom has been that web server performance (which is to say a high-volatility workload with relatively short per-transaction lifetimes) is much more a function of available memory than number of cores. The OS process scheduler will employ considerably deeper magic than just round-robin-ing the most CPU-intensive processes among its available processors; rather than trying to second-guess the scheduler, your better bet is to ensure you've got enough RAM to keep more shallow-queue processes alive than your expected concurrent-request load, and let the scheduler handle how to get them cycles in a timely manner.
The answer is that it depends on the application you're serving and in particular the runtime.
If you're using Python or Ruby then you will probably want one process per logical core – unless your application uses a lot of native code that is able to use multiple cores.
If you're using Go or another language that is able to run code concurrently on multiple cores then you need only a single process.
As for whether you need a deep or shallow queue (socket listen queue length), this needs to be at least as long as the number of processes x number of threads and longer depending on what's in front of uWSGI.