Canary deployment strategy with Argo Rollouts

In my previous article, I explained how to implement a canary deployment strategy with OpenShift Service Mesh. In this article we will go through the same example—this time using Argo Rollouts.

Argo Rollouts is a Kubernetes controller and set of CRDs that provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes. In this demo, we are going to use canary capabilities.

You can also check this other article for a blue/green deployment strategy with Argo Rollouts.

In the next steps, we will see a real example of how to install, deploy, and manage the life cycle of cloud native applications doing canary deployment using Argo Rollouts.

Let's start with some theory. After that, we will have a hands-on example.

Canary deployment

Canary deployment is a strategy where the operator releases a new version of their application to a small percentage of the production traffic. This small percentage may test the new version and provide feedback. If the new version is working well, the operator may increase the percentage until all the traffic is using the new version. Unlike blue/green, canary deployments are smoother and failures have limited impact.

Shop application

We are going to use very simple applications to test canary deployment. We have created two Quarkus applications, Products and Discounts.Figure 1 shows the shop applications.

Products call Discounts to get the product's discount and expose an API with a list of products with discounts.

Shop Canary

To achieve canary deployment with cloud native applications using Argo Rollouts, we have designed this architecture that you can see in Figure 2.

Figure 2: Architecture designed to achieve blue/green deployment with cloud native applications.

OpenShift Components—Online:

Routes and Services declared with the suffix—online
Routes mapped only to the online services
Services mapped to the rollout

In blue/green deployment we always have an offline service to test the version that is not in production. In the case of canary deployment, we do not need it, because progressively we will have the new version in production.

We have defined an active or online service called products-umbrella-online. The final user will always use products-umbrella-online. When a new version is deployed, Argo Rollouts creates a new revision (ReplicaSet). The number of replicas in the new release increases based on the information in the steps, and the number of replicas in the old release decreases in the same number. We have configured a pause duration between each step. To learn more about Argo Rollouts, read this.

Shop Umbrella Helm Chart

One of the best ways to package cloud native applications is Helm. In canary deployment, it makes even more sense. We have created a chart for each application that knows nothing about Canary. Then we pack everything together in an umbrella Helm chart that you can see in Figure 3.

In the Shop Umbrella Chart we use several times the same charts as Helm dependencies but with different names.

We have packaged both applications in one chart, but we may have different umbrella charts per application.

Demo

Prerequisites:

Red Hat OpenShift 4.13 with admin rights
- Download Red Hat OpenShift Local for OCP 4.13
- Getting started guide
Git
GitHub account
oc 4.13 CLI
Argo Rollouts CLI

We have a GitHub repository for this demo. As part of the demo, you will have to make some changes and commits. So you must fork the repository and clone it in your local.

git clone https://github.com/your_user/cloud-native-deployment-strategies

If we want to have a cloud native deployment, we cannot forget CI/CD. Red Hat OpenShift GitOps will help us with this.

Install OpenShift GitOps

Go to the folder where you have cloned your forked repository and create a new branch canary :

git checkout -b canary
git push origin canary

Log in to OpenShift as a cluster admin and install the OpenShift GitOps operator with the following command. This may take a few minutes.

oc apply -f gitops/gitops-operator.yaml

Once OpenShift GitOps is installed, an instance of Argo CD is automatically installed on the cluster in the openshift-gitops namespace and a link to this instance is added to the application launcher in OpenShift Web Console.

Log in to Argo CD dashboard

Argo CD upon installation generates an initial admin password which is stored in a Kubernetes secret. To retrieve this password, run the following command to decrypt the admin password:

oc extract secret/openshift-gitops-cluster -n openshift-gitops --to=-

Click Argo CD from the OpenShift Web Console application launcher and then log in to Argo CD with admin username and the password retrieved from the previous step.

Configure OpenShift with Argo CD

We are going to follow, as much as we can, a GitOps methodology in this demo. So we will have everything in our Git repository and use Argo CD to deploy it in the cluster.

In the current Git repository, the gitops/cluster-config directory contains OpenShift cluster configurations such as:

Namespaces gitops
Role binding for Argo CD to the namespace gitops
Argo Rollouts project.

Let's configure Argo CD to recursively sync the content of the gitops/cluster-config directory into the OpenShift cluster.

Execute this command to add a new Argo CD application that syncs a Git repository containing cluster configurations with the OpenShift cluster.

oc apply -f canary-argo-rollouts/application-cluster-config.yaml

Looking at the Argo CD dashboard, you will notice that an application has been created.

You can click the cluster-configuration application to check the details of sync resources and their status on the cluster.

Create Shop application

We are going to create the application shop that we will use to test canary deployment. Because we will make changes in the application's GitHub repository, we have to use the repository that you have just forked. Edit the file canary-argo-rollouts/application-shop-canary-rollouts.yaml and set your own GitHub repository in the reportURL.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: shop
  namespace: openshift-gitops
spec:
  destination:
    name: ''
    namespace: gitops
    server: 'https://kubernetes.default.svc'
  source:
    path: helm/quarkus-helm-umbrella/chart
    repoURL:  https://github.com/change_me/cloud-native-deployment-strategies.git
    targetRevision: canary
    helm:
      parameters:
      - name: "global.namespace"
        value: gitops
      valueFiles:
        - values/values-canary-rollouts.yaml
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
oc apply -f canary-argo-rollouts/application-shop-canary-rollouts.yaml

Looking at the Argo CD dashboard, you'll notice that we have a new shop application as shown in Figure 4.

Test shop application

We have deployed the shop with Argo CD. We can test that it is up and running.

We have to get the online route:

curl -k "$(oc get routes products-umbrella-online -n gitops --template='https://{{.spec.host}}')/products"

Notice that in each microservice response, we have added metadata information to see better the version of each application. This will help us to see the changes while we do the canary deployment. We can see that the current version is v1.0.1:

{
   "products":[
      {
         ...
         "name":"TV 4K",
         "price":"1500€"
      }
   ],
   "metadata":{
      "version":"v1.0.1", <--
      "colour":"none",
      "mode":"online"
   }
}

We can also see the rollout's status. Argo Rollouts offers a kubectl plug-in to enrich the experience with Rollouts.

kubectl argo rollouts get rollout products --watch -n gitops
NAME                                  KIND        STATUS     AGE  INFO
⟳ products                            Rollout     ✔ Healthy  38s  
└──# revision:1                                                   
   └──⧉ products-67fc9fb79b           ReplicaSet  ✔ Healthy  38s  stable
      ├──□ products-67fc9fb79b-4ql4z  Pod         ✔ Running  38s  ready:1/1
      ├──□ products-67fc9fb79b-7c4jw  Pod         ✔ Running  38s  ready:1/1
      ├──□ products-67fc9fb79b-lz86j  Pod         ✔ Running  38s  ready:1/1
      └──□ products-67fc9fb79b-xlkhp  Pod         ✔ Running  38s  ready:1/1

Products canary deployment

We have already deployed the products version v1.0.1 with 4 replicas, and we are ready to use a new products version v1.1.1 that has a new description attribute.

Figure 5 has the current status.

This is how we have configured Argo Rollouts for this demo:

strategy:
    canary:
      steps:
        - setWeight: 10
        - pause:
            duration: 30s
        - setWeight: 50
        - pause:
            duration: 30s

We have split a cloud native canary deployment into three automatic steps:

Deploy canary version for 10%
Scale canary version to 50%
Scale canary version to 100%

This is just an example. The key point here is that we can very easily have the canary deployment that better fits our needs. To make this demo faster we have not set a pause with duration in any step, so Argo Rollouts will go through each step automatically.

Step 1: deploy canary version for 10%

We will deploy a new version v1.1.1. To do that, we have to edit the file helm/quarkus-helm-umbrella/chart/values/values-canary-rollouts.yaml under products-blue set tag value to v1.1.1.

discounts-blue:
  quarkus-base:
    image:
      tag: v1.1.1

Argo Rollouts will automatically deploy a new products revision. The canary version will be 10% of the replicas. In this demo, we are not using traffic management. Argo Rollouts makes a best effort attempt to achieve the percentage listed in the last setWeight step between the new and old version. This means that it will create only one replica in the new revision, because it is rounded up. All the requests are load balanced between the old and the new replicas.

Push the changes to start the deployment.

git add .
git commit -m "Change products version to v1.1.1"
git push origin canary

Argo CD will refresh the status after a few minutes. If you don't want to wait you can refresh it manually from Argo CD UI or configure the Argo CD Git webhook. Here you can see how to configure the Argo CD Git webhook.

Figure 6 has the current status.

kubectl argo rollouts get rollout products --watch -n gitops
NAME                                  KIND        STATUS     AGE    INFO
⟳ products                            Rollout     ॥ Paused   3m13s  
├──# revision:2                                                     
│  └──⧉ products-9dc6f576f            ReplicaSet  ✔ Healthy  8s     canary
│     └──□ products-9dc6f576f-fwq8m   Pod         ✔ Running  8s     ready:1/1
└──# revision:1                                                     
   └──⧉ products-67fc9fb79b           ReplicaSet  ✔ Healthy  3m13s  stable
      ├──□ products-67fc9fb79b-4ql4z  Pod         ✔ Running  3m13s  ready:1/1
      ├──□ products-67fc9fb79b-lz86j  Pod         ✔ Running  3m13s  ready:1/1
      └──□ products-67fc9fb79b-xlkhp  Pod         ✔ Running  3m13s  ready:1/1

In the products URL's response, you will have the new version in 10% of the requests.

New revision:

{
  "products":[
     {
        "discountInfo":{...},
        "name":"TV 4K",
        "price":"1500€",
        "description":"The best TV" <--
     }
  ],
  "metadata":{
     "version":"v1.1.1", <--
  }
}

Old revision:

{
  "products":[
     {
        "discountInfo":{...},
        "name":"TV 4K",
        "price":"1500€"
     }
  ],
  "metadata":{
     "version":"v1.0.1", <--
  }
}Figure 6: has the current statusFigure 6: step 1

Step 2: scale canary version to 50%

After 30 seconds, Argo Rollouts will automatically increase the number of replicas in the new release to 2. Instead of increasing automatically after 30 seconds, we can configure Argo Rollouts to wait indefinitely until that Pause condition is removed. But this is not part of this demo.

Figure 7 shows the current status.

kubectl argo rollouts get rollout products --watch -n gitops
NAME                                  KIND        STATUS     AGE    INFO
⟳ products                            Rollout     ॥ Paused   3m47s  
├──# revision:2                                                     
│  └──⧉ products-9dc6f576f            ReplicaSet  ✔ Healthy  42s    canary
│     ├──□ products-9dc6f576f-fwq8m   Pod         ✔ Running  42s    ready:1/1
│     └──□ products-9dc6f576f-8qppq   Pod         ✔ Running  6s     ready:1/1
└──# revision:1                                                     
   └──⧉ products-67fc9fb79b           ReplicaSet  ✔ Healthy  3m47s  stable
      ├──□ products-67fc9fb79b-lz86j  Pod         ✔ Running  3m47s  ready:1/1
      └──□ products-67fc9fb79b-xlkhp  Pod         ✔ Running  3m47s  ready:1/1

Step 3: scale canary version to 100%

After another 30 seconds, Argo Rollouts will increase the number of replicas in the new release to 4 and scale down the old revision.

Figure 8 illustrates the current status.

kubectl argo rollouts get rollout products --watch -n gitops
NAME                                 KIND        STATUS        AGE    INFO
⟳ products                           Rollout     ✔ Healthy     4m32s  
├──# revision:2                                                       
│  └──⧉ products-9dc6f576f           ReplicaSet  ✔ Healthy     87s    stable
│     ├──□ products-9dc6f576f-fwq8m  Pod         ✔ Running     87s    ready:1/1
│     ├──□ products-9dc6f576f-8qppq  Pod         ✔ Running     51s    ready:1/1
│     ├──□ products-9dc6f576f-5ch92  Pod         ✔ Running     17s    ready:1/1
│     └──□ products-9dc6f576f-kmvdh  Pod         ✔ Running     17s    ready:1/1
└──# revision:1                                                       
   └──⧉ products-67fc9fb79b          ReplicaSet  • ScaledDown  4m32s

We now have in the online environment the new version v1.1.1.

{
  "products":[
     {
        "discountInfo":{...},
        "name":"TV 4K",
        "price":"1500€",
        "description":"The best TV" <--
     }
  ],
  "metadata":{
     "version":"v1.1.1", <--
  }
}

Rollback

Imagine that something goes wrong (we know that this never happens but just in case). We can do a very quick rollback by just undoing the change.

Argo Rollouts has an undo command to do the rollback. Personally, I don't like this procedure because it is not aligned with GitOps. The changes that Argo Rollouts do does not come from Git, so Git is OutOfSync with what we have in OpenShift. In our case the commit that we have done not only changes the ReplicaSet but also the ConfigMap. The undo command only changes the ReplicaSet, so it does not work for us.

I recommend doing the changes in git. We will revert the last commit:

git revert HEAD --no-edit

If we just revert the changes in git we will go back to the previous version. But Argo Rollouts will take this revert as a new release so it will do it through the steps that we have configured. We want a quick rollback, not a step by step revert. To achieve this quick rollback, we will configure Argo Rollouts without steps for the rollback.

Because we have our Argo Rollouts configuration as values in our Helm chart, we have just to edit the values.yaml that we are using.

In the file helm/quarkus-helm-umbrella/chart/values/values-canary-rollouts.yaml under products-blue under the steps delete all the steps and only set one step - setWeight: 100.

helm/quarkus-helm-umbrella/chart/values/values-canary-rollouts.yaml should looks like:

products-blue:
  mode: online
  image:
    tag: v1.0.1
  version: none
  replicaCount: 4
  fullnameOverride: "products"
  rollouts:
    enabled: true
    canary:
      steps:
        - setWeight: 100

Execute those commands to push the changes:

git add .
git commit -m "delete steps for rollback"
git push origin canary

ArgoCD will get the changes and apply them. Argo Rollouts will create a new revision with the previous version.

As you can see in Figure 9, the rollback is done!

{
  "products":[
     {
        "discountInfo":{...},The rollback is done!
        "name":"TV 4K",
        "price":"1500€"
     }
  ],
  "metadata":{
     "version":"v1.0.1", <--
  }
}

To get the application ready for a new release we should configure again the Argo Rollouts with the steps.

Delete environment

To delete all the things that we have done for the demo you have to:

In GitHub, delete the branch canary.
In Argo CD, delete the application cluster-configuration and shop.
In OpenShift, go to project openshift-operators and delete the installed operator OpenShift GitOps.