-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: validate pods and systemd-networkd restart for PRs #1909
Changes from 14 commits
1d970d6
1981d8f
b498742
f1c208c
4683017
cf20f57
ecb1a55
3990be1
3835203
ae4ecfa
5c8a710
d0a72cd
8e2243b
f14cb52
5203681
dfaaf95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,10 @@ do | |
echo "Node internal ip: $node_ip" | ||
privileged_pod=$(kubectl get pods -n kube-system -l app=privileged-daemonset -o wide | grep "$node_name" | awk '{print $1}') | ||
echo "privileged pod : $privileged_pod" | ||
if [ "$privileged_pod" == '' ]; then | ||
kubectl describe daemonset privileged-daemonset -n kube-system | ||
exit 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did we encounter such case during testing ? Can we add the status of privileged pod deployment then. ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure I'll add that, here you can see this run just got stuck in the loop. I had to manually cancel it https://dev.azure.com/msazure/One/_build/results?buildId=71680459&view=logs&j=4ea62961-c456-50ab-e773-f15fbc744993&t=6637b73f-d7ef-5d5e-d4d4-eb0bbec757cb&s=c689f5d8-16f1-5a52-95fe-f6a4e6a9e7fe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I meant if we fail to get the privileged pod then it will be good to have the status of daemon set. Basically before exiting we can have that as I see from the pipeline run, Ideally we should be waiting for the deployment to be complete before proceeding i think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah makes sense, I made the switch |
||
fi | ||
while ! [ -s "azure_endpoints.json" ] | ||
do | ||
echo "trying to get the azure_endpoints" | ||
|
@@ -34,6 +38,16 @@ do | |
sleep 10 | ||
done | ||
|
||
cns_pod=$(kubectl get pod -l k8s-app=azure-cns -n kube-system -o wide | grep "$node_name" | awk '{print $1}') | ||
echo "azure-cns pod : $cns_pod" | ||
|
||
while ! [ -s "cns_endpoints.json" ] | ||
do | ||
echo "trying to get the cns_endpoints" | ||
kubectl exec -it "$cns_pod" -n kube-system -- curl localhost:10090/debug/ipaddresses -d '{"IPConfigStateFilter":["Assigned"]}' > cns_endpoints.json | ||
sleep 10 | ||
done | ||
|
||
total_pods=$(kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName="$node_name",status.phase=Running --output json) | ||
|
||
echo "Checking if there are any pods with no ips" | ||
|
@@ -60,7 +74,7 @@ do | |
echo "Number of azure endpoint ips : $num_of_azure_endpoint_ips" | ||
|
||
if [ "$num_of_pod_ips" != "$num_of_azure_endpoint_ips" ]; then | ||
printf "Error: Number of pods in running state is less than total ips in the azure ednpoint file" >&2 | ||
printf "Error: Number of pods in running state is less than total ips in the azure endpoint file" >&2 | ||
exit 1 | ||
fi | ||
|
||
|
@@ -92,7 +106,25 @@ do | |
fi | ||
done | ||
|
||
num_of_cns_endpoints=$(cat cns_endpoints.json | jq -r '[.IPConfigurationStatus | .[] | select(.IPAddress != null)] | length') | ||
cns_endpoint_ips=$(cat cns_endpoints.json | jq -r '(.IPConfigurationStatus | .[] | select(.IPAddress != null) | .IPAddress)') | ||
echo "Number of cns endpoints: $num_of_cns_endpoints" | ||
|
||
if [ "$num_of_pod_ips" != "$num_of_cns_endpoints" ]; then | ||
printf "Error: Number of pods in running state is less than total ips in the cns endpoint file" >&2 | ||
exit 1 | ||
fi | ||
|
||
for ip in "${pod_ips[@]}" | ||
do | ||
find_in_array "$cns_endpoint_ips" "$ip" "cns_endpoints.json" | ||
if [[ $? -eq 1 ]]; then | ||
printf "Error: %s Not found in the cns_endpoints.json" "$ip" >&2 | ||
exit 1 | ||
fi | ||
done | ||
|
||
#We are restarting the systmemd network and checking that the connectivity works after the restart. For more details: https://github.com/cilium/cilium/issues/18706 | ||
kubectl exec -i "$privileged_pod" -n kube-system -- bash -c "chroot /host /bin/bash -c 'systemctl restart systemd-networkd'" | ||
rm -rf cilium_endpoints.json azure_endpoints.json | ||
rm -rf cilium_endpoints.json azure_endpoints.json cns_endpoints.json | ||
done |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: cilium | ||
rules: | ||
- apiGroups: | ||
- networking.k8s.io | ||
resources: | ||
- networkpolicies | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- discovery.k8s.io | ||
resources: | ||
- endpointslices | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- namespaces | ||
- services | ||
- pods | ||
- endpoints | ||
- nodes | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- apiextensions.k8s.io | ||
resources: | ||
- customresourcedefinitions | ||
verbs: | ||
- list | ||
- watch | ||
# This is used when validating policies in preflight. This will need to stay | ||
# until we figure out how to avoid "get" inside the preflight, and then | ||
# should be removed ideally. | ||
- get | ||
- apiGroups: | ||
- cilium.io | ||
resources: | ||
- ciliumbgploadbalancerippools | ||
- ciliumbgppeeringpolicies | ||
- ciliumclusterwideenvoyconfigs | ||
- ciliumclusterwidenetworkpolicies | ||
- ciliumegressgatewaypolicies | ||
- ciliumegressnatpolicies | ||
- ciliumendpoints | ||
- ciliumendpointslices | ||
- ciliumenvoyconfigs | ||
- ciliumidentities | ||
- ciliumlocalredirectpolicies | ||
- ciliumnetworkpolicies | ||
- ciliumnodes | ||
verbs: | ||
- list | ||
- watch | ||
- apiGroups: | ||
- cilium.io | ||
resources: | ||
- ciliumidentities | ||
- ciliumendpoints | ||
- ciliumnodes | ||
verbs: | ||
- create | ||
- apiGroups: | ||
- cilium.io | ||
resources: | ||
- ciliumendpoints | ||
verbs: | ||
- delete | ||
- get | ||
- apiGroups: | ||
- cilium.io | ||
resources: | ||
- ciliumnodes | ||
- ciliumnodes/status | ||
verbs: | ||
- get | ||
- update | ||
- apiGroups: | ||
- cilium.io | ||
resources: | ||
- ciliumnetworkpolicies/status | ||
- ciliumclusterwidenetworkpolicies/status | ||
- ciliumendpoints/status | ||
- ciliumendpoints | ||
verbs: | ||
- patch |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
metadata: | ||
name: cilium | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole | ||
name: cilium | ||
subjects: | ||
- kind: ServiceAccount | ||
name: "cilium" | ||
namespace: kube-system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl apply
works on directories. if these have ordering constraints, name the files with a priority prefix.