Azure Functions Migration to Kubernetes

INTO Global

Project description:

The Challenge:

INTO University Partnerships is an independent organisation focused on expanding higher education opportunities for students worldwide. INTO wanted to migrate from Azure Functions to Kubernetes (AKS), with clusters in multiple regions. Despite having a sizeable amount of infrastructure in Azure, INTO didn’t have any Infrastructure As Code (ARM/Bicep or Terraform). ​

Our initial work required gaining understanding of the current state through a short discovery exercise, in terms of hosting the of applications  and deploying code. Initially we also needed to consider IaC adoption – considering the suitability of ARM vs Terraform against the INTO roadmap, complexity.​

What we did:

  • Setup git repositories in Azure DevOps​
  • Small PoC to provision AKS and Application Gateway configured for Ingress, deploy an App and check it works. Experiment some different configurations (The Azure CLI takes care of a lot under the covers).​
  • Create initial AKS Terraform module, deploying AKS and Application Gateway, configured as Ingress controller.​
  • Created RefData module, used for storing and looking up reference data such as environmental differences (e.g., IP address ranges), to reduce module inputs.​
  • Setting up Terraform Service Principal credentials & remote backend storage account in DevTest subscription.​
  • Created Azure DevOps Terraform pipeline, with templated (reusable) stages, jobs and steps. Consisted of and Init & Plan stage, followed by Plan & Apply stage – requiring an approval on the Apply job (using ADO environments approval gate), including AAD authentication to cluster for Role Based Access to Kubernetes API.​
  • Added cluster to Log Analytics workspace for Container insights.
  • Small PoC for Prometheus and Grafana. Concluded that switching to Prometheus has less value, considering they already use the native Azure Monitor, and we can enable Prometheus metrics into Log Analytics via deploying a ConfigMap into the cluster. Grafana is most useful if you run a Prometheus server, since a lot of the out of the box dashboards are configured for it. The Dashboards for Log Azure Monitor are few and poor quality. Sticking with Azure Monitor is the recommendation.​
  • Added templated pipeline step to deploy ConfigMap for enabling Prometheus metrics in clusters.​
  • Added templated pipeline step to enable Certificate Manager deployment (for generating SSL certificates on the Application Gateway via Let’s Encrypt). Installed Helm chart and deployment Issuer resources.​
  • Added templated step to create application namespaces​.
  • End to end testing of pipeline and Terraform to DevTest.​
  • Setup Service Principles for Production and China subscriptions​.
  • Setup backend storage for Terraform state, in Production and China subscriptions.​
  • Refactored Terraform modules and pipeline to run from a single repo to improve maintainability and simplicity.​
  • Setup Azure DevOps service connections.​
  • Deploy to Production and China subscriptions and Test.​
  • Discovered China doesn’t support the AGIC feature. So PoC to install and configure it via the Helm chart, which was successful.​
  • Implemented AGIC via Helm chart in the pipeline.​
  • Created modules for all 5 productions regions. Deployed via the pipeline and test. Needed to resolve some RBAC issues to fix App Gateway Ingress Controller.​
  • Switched to the Application deployment pipelines – fixed and improved some of their existing templates. ​
  • Added all clusters into deployment pipeline, with a ‘pre-prod’ deployment acting as a validation step.​
  • Refactored and improved PR validation pipeline.​
  • Full end-end deployment of Browse Web application into all clusters​.
  • Implemented equivalent deployment and PR pipeline changes to Search Api application and achieved full deployment into all clusters.​
  • Browse Web and Search Api act as reference pipelines for other applications. Developers can continue the role out from these​.

Benefits:

  • Enabled INTO to move from two-weekly deployments to multiple releases per day. ​
  • Improved business agility ​
  • Improved pipeline standards through reusability ​
  • Upskilled the internal team to be self sufficient ​