I can't get "multi-primary multi-network" to play nice with locality failover (or locality load balancing for that matter). The endpoints are registered fine. The istio-system
is labeled with network information, and each node is labeled with zone and region information and when I check the /clusters
page on the client's envoy admin interface, the zone and region information is set correctly for each endpoint.
The issue seems to be that the control plane isn't assigning priority to the endpoints. However, to a stale source, this should work automatically, provided that I've created a DestinationRule (which I have). I've also crated a VirtualService for good measure.
$ istioctl proxy-config endpoints -n client client-6889f68cbc-z5jb6 --cluster "outbound|80||server.server.svc.cluster.local" -o json | jq '.[0].hostStatuses[] | del(.stats)'
{
"address": {
"socketAddress": {
"address": "10.244.1.25",
"portValue": 80
}
},
"healthStatus": {
"edsHealthStatus": "HEALTHY"
},
"weight": 1,
"locality": {
"region": "region2",
"zone": "zone2"
}
}
{
"address": {
"socketAddress": {
"address": "172.18.254.1",
"portValue": 15443
}
},
"healthStatus": {
"edsHealthStatus": "HEALTHY"
},
"weight": 3,
"locality": {
"region": "region1",
"zone": "zone1"
}
}
My setup is two 1.20.2 clusters running locally using KinD + metallb, with Istio operator v1.9.1. Each cluster is configured to occupy a different region & zone.
Istio VS and DR
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: server
namespace: server
spec:
host: server
trafficPolicy:
connectionPool:
http:
http2MaxRequests: 10
maxRequestsPerConnection: 10
loadBalancer:
localityLbSetting:
enabled: true
simple: ROUND_ROBIN
outlierDetection:
baseEjectionTime: 1m
consecutive5xxErrors: 1
interval: 1s
maxEjectionPercent: 51
minHealthPercent: 0
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: server
namespace: server
spec:
hosts:
- server
http:
- route:
- destination:
host: server
Kiali View
As you can see from the Kiali dashboard, the DR and VS are both active. Both clusters are routable. But traffic is flowing to both equally, where it ought to be flowing only to one. I've also tried specifying distribute and failover explicitly in my DR spec with no success.
This is a bug in istio 1.9.1 when running in a bare-metal environment. The client must have a service attached to it. When a service is provided, the locality is pulled from the first instance. However, when there is no service defined, the cloud metadata provider is used to assign locality to the proxy instances (the sidecar itself queries the metadata server).
See:
https://github.com/istio/istio/blob/bf5dd51386f4d78b20dd1f9c14f09b562a6ecd6e/pilot/pkg/xds/ads.go#L584-L600