I'm not quite sure if there's a term for the architecture I'm looking to set up, but I'll try to explain.
I have an API and clients calling it with a session ID. The session ID is associated with local session info stored in a PHP session in Memcache (or fetched from a MySQL database if expired). Depending on the session info the user may or may not get access to the API (so it doesn't just require a valid login to access the API, but rather a license associated with that particular client).
I would like to cache requests such that two requests with same parameters (but from different clients/with different session IDs) are cached as one request, but maintaining the session info check.
In the former case (Varnish with separate auth check):
________ _____________ _________ ________
| | Request w. | SSL | | | | Apache |
| Client |--------------| termination |-----| Varnish |------| + PHP |
| | SID | | | | | API |
‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾|‾‾‾‾ ‾‾‾‾‾‾‾‾
| SID
____|_____
| Validate |
| SID |
| |
‾‾‾‾‾‾‾‾‾‾
In other words:
- Client sends request with SID.
- SSL termination because Varnish.
- Varnish separates the SID from the requests and forwards it to validation in one backend.
- If valid Varnish separates and forwards the parameters to the API backend (or if not valid Varnish returns the error response from the validation backend).
- Varnish returns (and caches) the API response without SID.
Or the latter case (Varnish after an auth check):
________ __________ _________ ________
| | Request w. | Validate | Request without | | | Apache |
| Client |--------------| SID |-------------------| Varnish |------| + PHP |
| | SID | | SID | | | API |
‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾
In other words:
- Client sends request with SID.
- A separate process (I'm currently thinking Node.js) validates the SID.
- If valid the request parameters will be separated and forwarded to Varnish and API backend (or if invalid an error response will be returned).
- The response from Varnish is returned.
I have no problems implementing the latter solution, but I'm somewhat concerned about using Node.js as a reverse proxy (some of the requests are for files of 200-500 MB) and I would also like to avoid implementing a Node.js session validation which I already have implemented in Apache/PHP (using that validation in the latter solution is not an option due to the amount of concurrent request the API is receiving).
I'm sure the former option is also considered the "better" option, but the posts I have found all seem to claim that responses are cached on a per-user basis, which is not what I want.
Any input on which solution to go for and how to implement the former would be appreciated.
What you want to achieve is doable.
This article describes why you would want that and hints at what you need. This slidedeck will give you a more hands-in idea.
As I said, it is doable, but it is by no means a walk-in-the-park unless you know what you are doing.
Hope that this helps!