Could not Get External IP for Load Balancer on Azure AKS

I used Kubernetes service on Google Cloud Platform and it was a great service. I also wrote one blog, Running Spark on Kubernetes, on this area. Recently I used Azure Kubernetes Service (AKS) for a different project and run into some issues. One of major annoying issues was that I could not get external IP for load balancer on AKS. This blog discusses the process I identified the issue and solution for this problem.

I used the example from Microsoft, Use Azure Kubernetes Service with Kafka on HDInsight, for my testing. The source code can be accessed at https://github.com/Blackmist/Kafka-AKS-Test. The example is pretty simple and straight forward and the most import part is file kafka-aks-test.yaml. Here is the content of the file.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: kafka-aks-test
spec:
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  minReadySeconds: 5
  template:
    metadata:
      labels:
        app: kafka-aks-test
    spec:
      containers:
        - name: kafka-aks-test
          image: microsoft/kafka-aks-test:v1
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 250m
            limits:
              cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-aks-test
spec:
  type: LoadBalancer
  ports:
    - port: 80
  selector:
    app: kafka-aks-test

We can see the Service is using LoadBalancer. So it should automatically get an External IP for my load balancer of the service. Unfortunately, I can not get this external IP and was stuck in Pending stage forever.

[root@ Kafka-AKS-Test]# kubectl get service kafka-aks-test --watch
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kafka-aks-test   LoadBalancer   192.168.130.97   <pending>     80:32656/TCP   10s

To make the debugging process simpler, I used the following two lines of commands to create a NGIX service. This is a nice and quick way to find out whether AKS is working or not.

kubectl run my-nginx --image=nginx --replicas=1 --port=80
kubectl expose deployment my-nginx --port=80 --type=LoadBalancer

Got the same issue. For AKS service, a good way to find out what’s going on in the service is to use kubectl describe service command. Here is output from this command.

[root@ AKS-Test]# kubectl describe service my-nginx
Name:                     my-nginx
Namespace:                default
Labels:                   run=my-nginx
Annotations:              <none>
Selector:                 run=my-nginx
Type:                     LoadBalancer
IP:                       <pending>
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31478/TCP
Endpoints:                10.2.5.70:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                      Age               From                Message
  ----     ------                      ----              ----                -------
  Warning  CreatingLoadBalancerFailed  2m (x3 over 3m)   service-controller  Error creating 
  load balancer (will retry): failed to ensure load balancer for service default/my-nginx: 
  [ensure(default/my-nginx): lb(kubernetes) - failed to ensure host in pool: "network.InterfacesClient#CreateOrUpdate: 
  Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service 
  returned an error. Status=403 Code=\"LinkedAuthorizationFailed\" Message=\"The client 
  '11b7e54a-e1bc-4092-af66-b014c11d9b87' with object id '11b7e54a-e1bc-4092-af66-b014c11d9b87' 
  has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope 
  '/subscriptions/763d9895-8916-4d35-8b43-d51b52642cef/resourceGroups/MC_exa-dev01-ue1-aksc2-
  vnet2-rg_exa-aksc2_eastus/providers/Microsoft.Network/networkInterfaces/aks-agentpool-40875261-nic-0'; 
  however, it does not have permission to perform action 'Microsoft.Network/virtualNetworks/
  subnets/join/action' on the linked scope(s) '/subscriptions/763d9895-8916-4d35-8b43-d51b52642cef/
  resourceGroups/exa-dev01-ue1-vnet2-rg/providers/Microsoft.Network/virtualNetworks/exa-dev01-ue1-vnet2/
  subnets/snet-aks2'.\"", ensure(default/my-nginx): lb(kubernetes) - failed to ensure host 
  in pool: "network.InterfacesClient#CreateOrUpdate: Failure responding to request: StatusCode=403 
  -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"LinkedAuthorizationFailed\" 
  Message=\"The client '11b7e54a-e1bc-4092-af66-b014c11d9b87' with object id '11b7e54a-e1bc-4092-af66-b014c11d9b87' 
  has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope
  . . . . 

It seems this is a common issue and many people run into the similar issue. Checked out the issue site for Github and found out one issue related to my problem, Azure AKS CreatingLoadBalancerFailed on AKS cluster with advanced networking. One of recommendations was to add AKS’ Service Principal (SP) to the subnet or VNet as Contributor. Did not work on me. Tried to add the SP as Owner. It didn’t work.

If running command kubectl get all –all-namespaces, it provides everything related to Kubernetes on AKS.

[root@ ~]# kubectl get all --all-namespaces
NAMESPACE     NAME                                                                  READY     STATUS             RESTARTS   AGE
kube-system   pod/addon-http-application-routing-default-http-backend-66c97fw842d   1/1       Running            1          2d
kube-system   pod/addon-http-application-routing-external-dns-c547864b7-r7zts       1/1       Running            1          5d
kube-system   pod/addon-http-application-routing-nginx-ingress-controller-642qfcp   0/1       CrashLoopBackOff   4          1m
kube-system   pod/azureproxy-79c5db744-7ndvk                                        1/1       Running            4          5d
kube-system   pod/heapster-55f855b47-q5jtf                                          2/2       Running            0          2d
kube-system   pod/kube-dns-v20-7c556f89c5-5ngp5                                     3/3       Running            0          5d
kube-system   pod/kube-dns-v20-7c556f89c5-djf7d                                     3/3       Running            3          2d
kube-system   pod/kube-proxy-dpt28                                                  1/1       Running            2          5d
kube-system   pod/kube-proxy-jq8hx                                                  1/1       Running            1          5d
kube-system   pod/kube-proxy-v4xc5                                                  1/1       Running            0          5d
kube-system   pod/kube-svc-redirect-77kj4                                           1/1       Running            2          5d
kube-system   pod/kube-svc-redirect-j9545                                           1/1       Running            1          5d
kube-system   pod/kube-svc-redirect-kvh2r                                           1/1       Running            0          5d
kube-system   pod/kubernetes-dashboard-546f987686-ws5nm                             1/1       Running            0          2d
kube-system   pod/omsagent-4xn72                                                    1/1       Running            2          5d
kube-system   pod/omsagent-fbjsp                                                    1/1       Running            1          5d
kube-system   pod/omsagent-pvfrt                                                    1/1       Running            0          5d
kube-system   pod/tiller-deploy-7ccf99cd64-tstvl                                    1/1       Running            1          23h
kube-system   pod/tunnelfront-55bbb6b96c-nhlbk                                      1/1       Running            0          5d

NAMESPACE     NAME                                                          TYPE           CLUSTER-IP        EXTERNAL-IP   PORT(S)                      AGE
default       service/kubernetes                                            ClusterIP      192.168.0.1       <none>        443/TCP                      1d
kube-system   service/addon-http-application-routing-default-http-backend   ClusterIP      192.168.89.103    <none>        80/TCP                       5d
kube-system   service/addon-http-application-routing-nginx-ingress          LoadBalancer   192.168.205.83    <pending>     80:32704/TCP,443:32663/TCP   5d
kube-system   service/heapster                                              ClusterIP      192.168.2.201     <none>        80/TCP                       5d
kube-system   service/kube-dns                                              ClusterIP      192.168.0.10      <none>        53/UDP,53/TCP                5d
kube-system   service/kubernetes-dashboard                                  ClusterIP      192.168.150.149   <none>        80/TCP                       5d
kube-system   service/tiller-deploy                                         ClusterIP      192.168.34.240    <none>        44134/TCP                    23h

NAMESPACE     NAME                                     DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.extensions/kube-proxy          3         3         3         3            3           beta.kubernetes.io/os=linux   5d
kube-system   daemonset.extensions/kube-svc-redirect   3         3         3         3            3           beta.kubernetes.io/os=linux   5d
kube-system   daemonset.extensions/omsagent            3         3         3         3            3           beta.kubernetes.io/os=linux   5d

NAMESPACE     NAME                                                                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.extensions/addon-http-application-routing-default-http-backend       1         1         1            1           5d
kube-system   deployment.extensions/addon-http-application-routing-external-dns               1         1         1            1           5d
kube-system   deployment.extensions/addon-http-application-routing-nginx-ingress-controller   1         1         1            0           5d
kube-system   deployment.extensions/azureproxy                                                1         1         1            1           5d
kube-system   deployment.extensions/heapster                                                  1         1         1            1           5d
kube-system   deployment.extensions/kube-dns-v20                                              2         2         2            2           5d
kube-system   deployment.extensions/kubernetes-dashboard                                      1         1         1            1           5d
kube-system   deployment.extensions/tiller-deploy                                             1         1         1            1           23h
kube-system   deployment.extensions/tunnelfront                                               1         1         1            1           5d

NAMESPACE     NAME                                                                                       DESIRED   CURRENT   READY     AGE
kube-system   replicaset.extensions/addon-http-application-routing-default-http-backend-66c97f5dc7       1         1         1         5d
kube-system   replicaset.extensions/addon-http-application-routing-external-dns-c547864b7                1         1         1         5d
kube-system   replicaset.extensions/addon-http-application-routing-nginx-ingress-controller-6449fd79f9   1         1         0         5d
kube-system   replicaset.extensions/azureproxy-79c5db744                                                 1         1         1         5d
kube-system   replicaset.extensions/heapster-55f855b47                                                   1         1         1         5d
kube-system   replicaset.extensions/heapster-56c6f9566f                                                  0         0         0         5d
kube-system   replicaset.extensions/kube-dns-v20-7c556f89c5                                              2         2         2         5d
kube-system   replicaset.extensions/kubernetes-dashboard-546f987686                                      1         1         1         5d
kube-system   replicaset.extensions/tiller-deploy-7ccf99cd64                                             1         1         1         23h
kube-system   replicaset.extensions/tunnelfront-55bbb6b96c                                               1         1         1         5d

NAMESPACE     NAME                               DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.apps/kube-proxy          3         3         3         3            3           beta.kubernetes.io/os=linux   5d
kube-system   daemonset.apps/kube-svc-redirect   3         3         3         3            3           beta.kubernetes.io/os=linux   5d
kube-system   daemonset.apps/omsagent            3         3         3         3            3           beta.kubernetes.io/os=linux   5d

NAMESPACE     NAME                                                                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/addon-http-application-routing-default-http-backend       1         1         1            1           5d
kube-system   deployment.apps/addon-http-application-routing-external-dns               1         1         1            1           5d
kube-system   deployment.apps/addon-http-application-routing-nginx-ingress-controller   1         1         1            0           5d
kube-system   deployment.apps/azureproxy                                                1         1         1            1           5d
kube-system   deployment.apps/heapster                                                  1         1         1            1           5d
kube-system   deployment.apps/kube-dns-v20                                              2         2         2            2           5d
kube-system   deployment.apps/kubernetes-dashboard                                      1         1         1            1           5d
kube-system   deployment.apps/tiller-deploy                                             1         1         1            1           23h
kube-system   deployment.apps/tunnelfront                                               1         1         1            1           5d

NAMESPACE     NAME                                                                                 DESIRED   CURRENT   READY     AGE
kube-system   replicaset.apps/addon-http-application-routing-default-http-backend-66c97f5dc7       1         1         1         5d
kube-system   replicaset.apps/addon-http-application-routing-external-dns-c547864b7                1         1         1         5d
kube-system   replicaset.apps/addon-http-application-routing-nginx-ingress-controller-6449fd79f9   1         1         0         5d
kube-system   replicaset.apps/azureproxy-79c5db744                                                 1         1         1         5d
kube-system   replicaset.apps/heapster-55f855b47                                                   1         1         1         5d
kube-system   replicaset.apps/heapster-56c6f9566f                                                  0         0         0         5d
kube-system   replicaset.apps/kube-dns-v20-7c556f89c5                                              2         2         2         5d
kube-system   replicaset.apps/kubernetes-dashboard-546f987686                                      1         1         1         5d
kube-system   replicaset.apps/tiller-deploy-7ccf99cd64                                             1         1         1         23h
kube-system   replicaset.apps/tunnelfront-55bbb6b96c                                               1         1         1         5d

Pay attention more on the pod that has CrashLoopBackOff error. I saw this CrashLoopBackOff thing restarted over 1000 times within 5 days in our first AKS cluster. This is one Pod that is used internally by AKS before we can deploy anything else.

I opened a ticket with Microsoft and got Microsoft Support to work with me. After a very long conference call and even completely reinstalled AKS cluster, we finally figured out the way to get around this issue. The key is to give correct permission for AKS Service Principal.

There is one drawback when deploying AKS with Azure UI. You can not specify the name of Service Principal and SP is automatically created with the name like . For us, we have installed and uninstall AKS multiple times, so we have a few SP names. It is confusing to decide which one is the one we really care. Finding out the correct SP name is a challenge task. Anyway, the followings are the steps to add correct permission to AKS Service Principal.

1. Get Client ID
Run the following command to get client id.

[root@ AKS-Test]# az aks show -n exa-aksc2 -g exa-dev01-ue1-aksc2-vnet2-rg | grep clientId
"clientId": "27ae6273-9706-4156-b546-607279623990"

2. Get SP Name
Click Azure Active Directory, then click App registrations. Change dropdown from My Apps to All apps. Then input the clientId. It should show the SP name as screen below.

3. Set Correct Permission for the SP
At the time when AKS creates the cluster, it creates a SP showing above. Then grant Contributor role to the SP. This is the problem as certain operations require OWNER permissions. So need to add Owner role to the SP. All the resources used by AKS cluster are under MC_* resource group. In our case, it is MC_exa-dev01-ue1-aksc2-vnet2-rg_exa-aksc2_eastus.

Click Resource Group, then MC_exa-dev01-ue1-aksc2-vnet2-rg_exa-aksc2_eastus. Click Access Control (IAM), then click + Add.

After this change, our issue was gone. Here is the result from describe service. No error this time.

[root@exa-dev01-ue1-kfclient1-vm Kafka-AKS-Test]# kubectl describe service my-nginx
Name:                     my-nginx
Namespace:                default
Labels:                   run=my-nginx
Annotations:              <none>
Selector:                 run=my-nginx
Type:                     LoadBalancer
IP:                       10.242.237.5
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  32026/TCP
Endpoints:                10.2.10.70:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  8s    service-controller  Ensuring load balancer

The deployment also looks good.
Sample output:

[root@exa-dev01-ue1-kfclient1-vm Kafka-AKS-Test]# kubectl describe deployment my-nginx
Name: my-nginx
Namespace: default
CreationTimestamp: Thu, 14 Jun 2018 15:03:23 +0000
Labels: run=my-nginx
Annotations: deployment.kubernetes.io/revision=1
Selector: run=my-nginx
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: run=my-nginx
Containers:
my-nginx:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Environment:
Mounts:
Volumes:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets:
NewReplicaSet: my-nginx-9d5677d94 (1/1 replicas created)
Events:
[/code]

For more information about our issue, you can check it out at https://github.com/Azure/AKS/issues/427.

Advertisements