This post is part of the series Microsoft Orleans - Problems & Solutions.
Problem
Upon startup, one of the silos would be marked as alive and the other would me marked as dead. This behavior would flip-flop, and the only consistent thing is that 1 silo was alive, and 1 was dead.
FATAL EXCEPTION from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! I should be Dead according to membership table.
Configuration
- Two silos hosted in kubernetes.
- No CPU limits in kubernetes deployment config file (thereby no aggressive throttling was being applied).
kind: Pod
metadata:
...
labels:
orleans/clusterId: my-cluster
orleans/serviceId: my-service-1
...
--------------------------------------
kind: Pod
metadata:
...
labels:
orleans/clusterId: my-cluster
orleans/serviceId: my-service-2
...
Solution
The solution lies within the kubernetes deployment config file, specifically the serviceId
must be the same value across the deployment configs.
kind: Pod
metadata:
...
labels:
app: my-app
orleans/clusterId: my-cluster
orleans/serviceId: my-service
...
--------------------------------------
kind: Pod
metadata:
...
labels:
app: my-app
orleans/clusterId: my-cluster
orleans/serviceId: my-service
...
Explanation
If you think from a hierarchical view, the concept of a service is a superset of a single or multiple clusters. A service is not bounded to a process, physical machine, or even data center.
In order to keep the cluster healthy, the silos ping each other to see if they are alive. Whichever was the first silo to startup, it could not reach the other silo because the config was set so that each silo lives in a different service. Due to this misconfiguration, the silos would be in different clusters also, albeit both having the same name, my-cluster.
This led to the first silo, first suspecting the second one, and than proceeding to mark it as dead.
If you found this article helpful please give it a share in your favorite forums đ.