I'm working on a gRPC application that will be served behind HAProxy using the http mode. If the server application immediately (ie. before sending any responses) aborts the response streaming call with specific error, then the client application will receive CANCELLED
error instead of the one that was sent. The error details will be Received RST_STREAM with error code 8. I'm working with HAProxy 2.3.2 and grpc 1.34.0.
For each of such requests, HAProxy log entry have SD--
flags set in the session state at disconnection field, eg.
<134>Jan 9 18:09:39 1a8328663d74 haproxy[8]: 172.28.0.4:41698 [09/Jan/2021:18:09:39.346] grpc-server-fe grpc-server-fe/grpc-server-be 0/0/0/-1/+0 -1 +115 - - SD-- 1/1/0/0/0 0/0 "POST http://proxy:6000/service.Service/StreamStream HTTP/2.0"
In the HAProxy documentation those flags are defined as follows:
- S: the TCP session was unexpectedly aborted by the server, or the server explicitly refused it.
- D: the session was in the DATA phase.
Additionally:
SD The connection to the server died with an error during the data transfer. This usually means that haproxy has received an RST from the server or an ICMP message from an intermediate equipment while exchanging data with the server. This can be caused by a server crash or by a network issue on an intermediate equipment.
Knowing that, and having in mind that the HTTP connection was opened with an intention of data streaming, as a workaround I have tried sending an empty gRPC message before raising the error. The solution have helped partially - the error codes could have been received by the client for the most requests, but the issue still happened from time to time.
As a next step I have inspected the network traffic using wireshark. Following is a trace of the HTTP response served by the gRPC server on an event of immediate call abortion:
HyperText Transfer Protocol 2
Stream: SETTINGS, Stream ID: 0, Length 0
Length: 0
Type: SETTINGS (4)
Flags: 0x01
0... .... .... .... .... .... .... .... = Reserved: 0x0
.000 0000 0000 0000 0000 0000 0000 0000 = Stream Identifier: 0
Stream: HEADERS, Stream ID: 1, Length 88, 200 OK
Length: 88
Type: HEADERS (1)
Flags: 0x05
.... ...1 = End Stream: True
.... .1.. = End Headers: True
.... 0... = Padded: False
..0. .... = Priority: False
00.0 ..0. = Unused: 0x00
0... .... .... .... .... .... .... .... = Reserved: 0x0
.000 0000 0000 0000 0000 0000 0000 0001 = Stream Identifier: 1
[Pad Length: 0]
Header Block Fragment: 88400c636f6e74656e742d74797065106170706c69636174...
[Header Length: 120]
[Header Count: 4]
Header: :status: 200 OK
Header: content-type: application/grpc
Header: grpc-status: 7
Header: grpc-message: Details sent by the server
Stream: RST_STREAM, Stream ID: 1, Length 4
Length: 4
Type: RST_STREAM (3)
Flags: 0x00
0... .... .... .... .... .... .... .... = Reserved: 0x0
.000 0000 0000 0000 0000 0000 0000 0001 = Stream Identifier: 1
Error: NO_ERROR (0)
So the server sends response headers with details on error with End Stream
and End Headers
flags set. And then closes the stream with NO_ERROR
code. According to the answer provided in https://stackoverflow.com/questions/55511528/should-grpc-server-side-half-closing-implicitly-terminate-the-client/55522312 everything is ok at this stage. I have also briefly reviewed the RFC 7540 and couldn't find anything that is off in the terms of HTTP/2 protocol.
Quoted gRPC server HTTP response is followed by HAProxy originating TCP ACK
, and next, HAProxy dispatches its response to the client.
HyperText Transfer Protocol 2
Stream: HEADERS, Stream ID: 1, Length 75, 200 OK
Length: 75
Type: HEADERS (1)
Flags: 0x05
.... ...1 = End Stream: True
.... .1.. = End Headers: True
.... 0... = Padded: False
..0. .... = Priority: False
00.0 ..0. = Unused: 0x00
0... .... .... .... .... .... .... .... = Reserved: 0x0
.000 0000 0000 0000 0000 0000 0000 0001 = Stream Identifier: 1
[Pad Length: 0]
Header Block Fragment: 885f106170706c69636174696f6e2f67727063000b677270...
[Header Length: 120]
[Header Count: 4]
Header: :status: 200 OK
Header: content-type: application/grpc
Header: grpc-status: 7
Header: grpc-message: Details sent by the server
Stream: RST_STREAM, Stream ID: 1, Length 4
Length: 4
Type: RST_STREAM (3)
Flags: 0x00
0000 0000 = Unused: 0x00
0... .... .... .... .... .... .... .... = Reserved: 0x0
.000 0000 0000 0000 0000 0000 0000 0001 = Stream Identifier: 1
Error: CANCEL (8)
It can be seen that the flags and all the contents of the HEADERS
frame are in place, so the error details are passed to the client, but the code of RST_STREAM
has changed to CANCEL
. In effect the client recieves all the expected data, but after that it receives unexpected RST_STREAM(CANCEL)
that is being mapped on gRPC CANCELLED
error, as specified in the gRPC documentation.
In the course of further investigation, I have referred to HAProxy source code. I have discovered that the code is set in the h2_do_shutr
function of mux_h2.c
(experiments with custom HAProxy builds have proven that it is indeed this place). The involved code branch has the following comment:
a final response was already provided, we don't want this stream anymore. This may happen when the server responds before the end of an upload and closes quickly (redirect, deny, ...)
So those are the details I have managed to gather on the issue. I'm not entirely sure whether the problem lies in the gRPC core (being too messy in the terms of HTTP2 streams handling) or HAProxy (being too careless while rewriting RST_STREAM
codes). The final question is, how can I tune the configuration of HAProxy and gRPC core server to work correctly on an event of immediate call abortion. The minimal HAProxy configuration that reproduces the issue is as follows:
global
log stdout local0
listen grpc-server-fe
bind *:6000 proto h2
mode http
log global
option logasap
option httplog
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
server grpc-server-be server:6000 proto h2
I have prepared a repository with minimal example containing simple python client & server. It also contains docker-compose
with network environment including configured HAProxy.