We are running an on-premise service fabric cluster (5.4.145.9494) but we have some funny quirks with it. Basically whenever we run an application (and esspecially when it contains replica's) we notice that the services cannot start most of the time. Inside SF the error message isn't that descriptive (unhealthy partition ...) however on the eventlogs it becomes apparent that the service cannot start because the port it has chosen is already in use by another application (ranging from an svchost process to winit basically any application).
In this case the developers DON'T assign a port themselves, so basically SF has to figure this out. In our setup we assigned both ephemeral ports and application ports as per https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-manifest and we tried both options since the documentation is quiet confusing regarding application ports being a subset of ephemeral ports while examples show it's not. Another funny thing is, since the ephemeral ports configuration basically changes the dynamic port range of windows itself, anything we change here also changes the port ranges of ANY other application running inside windows.
Next to this it seems SF is not trying to use another port once it notices the port is already in use, so it also won't fix itself. Simple snippet of the event log:
transport 35d3ce77c0 failed to bind on 0.0.0.0:49160, error = 0x80072740, port 49160 already held by process 204
in this case process 204 is the spoolsv.exe but again it can be any process.
At this moment the config for the node is set to:
<NodeType Name="NodeType0">
<Endpoints>
<ClientConnectionEndpoint Port="19000" />
<LeaseDriverEndpoint Port="19002" />
<ClusterConnectionEndpoint Port="19001" />
<HttpGatewayEndpoint Port="19080" Protocol="http" />
<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
<ServiceConnectionEndpoint Port="19003" />
<ApplicationEndpoints StartPort="49152" EndPort="50000" />
<EphemeralEndpoints StartPort="49152" EndPort="65534" />
</Endpoints>
But as stated before we already tried putting the ApplicationEndpoints on it's own range, which won't fix it ;-).
Any help would be very welcome ;-)