Earlier this week I was working with a customer on a terraform template. We building out a virtual machine scale set, which would need to connect to the internet with a well known IP. For this, we decided to leverage an outbound rule in the Load Balancer Standard, using a public IP prefix. We need a public IP prefix, as the scale set can become quiet large and we didn’t want to hit port exhaustion.
The funny thing is, we couldn’t use terraform to create a load balancer front-end using a public IP prefix. We spend about 2 hours together concocting possible solutions, we even tried injecting an ARM template in terraform, but that made the solution too complicated.
So, I decided to spend my Friday on updating the terraform Resource provider for Azure. The goal was to be able to use a public IP prefix in an outbound rule, verify that it works, and make a pull request to the Azure RP in GitHub. Spoiler alert: it worked!
Let me walk you through how you can get started making changes to terraform providers in general, and then walk you through the work I did earlier today for the Azure RP. If you prefer to follow along with the live updates I made during that day, please follow along here:
My goal for today: add support for Public IP Prefix in a Load Balancer definition in the AzureRM module for Terraform.
— Nills Franssens (@NillsF) June 14, 2019
Getting started with making changes to terraform providers
The AzureRM provider does a decent job with explaining very high level what is required. You need terraform on your system, and the Go language installed. I installed both (I already had terraform), and decided to also pimp out my VSCode with the Go extension. This all went pretty well. (btw. Make sure to setup your Path variable to include terraform and Go)
Next up, if you want to do this work on Windows – which I discovered made things a bit harder – you’ll need ‘make’ for Windows as well as Git Bash. That last part I believe is not super mandatory, as I assume WSL can do the same job, and I would have certainly loved to do this work in WSL v2. I stuck with the recommendations however, as I didn’t want to struggle with setup, I wanted to struggle with code. Finally, I decided to make Git bash the default terminal in VSCode during the duration of this work.
Finally, you’ll need a copy of the source code repo of the provider you’ll work on. I would recommend making a fork before doing a git clone so you can easily check-in changes to your own Github repo, in stead of doing a PR from your machine to the main terraform provider repo directly. Forking is easy in the Github interface, and for the cloning, you’ll need to do two steps:
- Navigate to
%GOPATH%/src/github.com
, and create a folder in here calledterraform-providers
- Open that folder, and execute
git clone
from this folder.
Now, you’re all set and done. From the Git Bash console you should be able to execute make build
to build your own local copy of the Azure Provider for terraform.
If you have a project you’re working on that you want to test this version with, you can copy-paste this executable to the .terraform/plugins/windows_arm64/
subfolder of that project, to use your own executable to connect to Azure. To test that this works, do a new terraform init
and then do a terraform plan
to verify that your executable can connect to Azure. If this works, you’re ready to make changes!
How I made the changes to the load balancer definition
Looking into how I could made the changes to the load balancer definition, I started out by having a look at the terraform source file describing the load balancer.
Looking at that file, I noticed there was a definition for public_ip_address_id
in the frontend_ip_configuration
, but not for public_ip_prefix_id
. This made it clear that this was what I needed to add.
Browsing further down the file, there are two important functions that allow terraform to translate terraform to Azure and Azure to terraform. These functions are called expandAzureRmLoadBalancerFrontendIpConfigurations
and flattenLoadBalancerFrontendIpConfiguration
. These functions essentially take the terraform object, and translate that into an object in the Azure GO SDK, that can then be used to communicate to the Azure API and vice versa.
So, if I wanted to add my public_ip_prefix_id
to the load balancer defintion, I needed to add this to the schema of the frontend_ip_configuration
and add my public_ip_prefix_id
to the expand and flatten function. That is exactly what I did, and see that work in the file.
After doing this, I compiled the source files (which worked tremendously well, without error), and was able to update my terraform files to create a load balancer frontend using a public ip prefix. Great win!
Step 12: Adding the outbound rule worked great. The outbound rule shows up in https://t.co/GajrfpnYBJ. And doing a curl on https://t.co/wUTRhpDOI5 shows we are using the outbound rule! GREAT SUCCESS! pic.twitter.com/o2JvkOpDF4
— Nills Franssens (@NillsF) June 14, 2019
How testing showed I need to make some more code changes.
With this done, I did some functional testing. I logged in to a VM, and did a curl to icanhazip.com in a while loop. Quickly did I notice that for outbound traffic, the load balancer was cycling through the Public IP Prefix (YAY) and the Public IP used for the inbound rule (NAY). Some quick bing search showed this was expected behavior. To prevent this there is an additional data field on a load balancer in Azure, that can disable a rule to be used for SNAT.
A quick look in the terraform file for a Load Balancer rule showed me this wasn’t possible using the current terraform provider for Azure. Having made it this far, I decided I’d quickly make this change as well. The logic for a load balancer rule is exactly the same as for the frontend configuration: you have your schema, a flatten and an expand function. All of which I adapted to include the disable outbound snat option. You can see those changes here.
So, another build, another couple changes to my terraform files, and testing this out worked out super well. (admittedly, I had to build twice, as in my first compile I had a space where I shouldn’t have had a space).
Step 18: The error above was an accidental space in a definition. The 'terraform plan works', the 'terraform apply' as well. pic.twitter.com/XAYcUZ7c02
— Nills Franssens (@NillsF) June 14, 2019
Now comes the hard part, writing (acceptance) tests for the changes.
Doing the changes was hard, writing tests for the changes was harder. And let me clarify, there were three hard parts here: getting my system to build a version of the provider that actually executes tests, understanding the test structure and how they work, and then writing new tests.
If you want to write test cases, you need to set the following Environment Variables:
ARM_CLIENT_ID
ARM_CLIENT_SECRET
ARM_SUBSCRIPTION_ID
ARM_TENANT_ID
ARM_ENVIRONMENT
ARM_TEST_LOCATION
ARM_TEST_LOCATION_ALT
Make sure your service principal (that’s what ARM_CLIENT_ID
represents) has contributor rights to your subscription.
Next you’ll want to do a specific make command:
make testacc TEST=./azurerm TESTARGS='-run=
TestAccAzureRMLoadBalancerRule_disableoutboundsnat
'
where TestAccAzureRMLoadBalancerRule_disableoutboundsnat
represents the test you want to do. You can also use wildcards in these tests to run multiple tests.
Now, that’s in a nutshell what you need to do to get tests to run. Next up, is understanding is how to write your own tests. The terraform docs have a decent article describing this, but I learnt most myself by actually looking at the source code.
So, I ended up writing new tests to cover my updates. I essentially wrote 3 tests to cover my changes:
TestAccAzureRMLoadBalancer_frontEndConfigPublicIPPrefix
: which tests creating and deleting a Load Balancer using a public IP prefix.TestAccAzureRMLoadBalancerOutboundRule_withPublicIPPrefix
: which tests creating an outbound rule using a public ip prefix.TestAccAzureRMLoadBalancerRule_disableoutboundsnat
: which tests creating a load balancer rule and disabling outbound nat.
Make sure to have a look at the code here. It’s not super complex, but it took me a while to get my head around it.
Once tests are written, you can test your tests immediately locally, with the make command I shared earlier. You can choose to execute a single test or execute the full test suite. The choice is up to you. (but BTW. Please remember, tests create resources and will spin the meter for you. )
So, that’s it, ready for a pull request.
So, with those changes made, I was ready for my first pull request to the Azure Provider for Terraform. I made the pull request, added some comments, and was ready for the day. After I made the pull request, I saw a CircleCI job start, do a CI and two tests, which I saw complete successfully.
Now I am playing the waiting game to get some feedback on my work. As this is my first pull request to the terraform project I do not expect this to get merged in directly. I’m looking forward to the feedback, in the hope to make my addition to terraform even more valuable.
What did I learn today
I learned a lot today, both about the inner working of the load balancer itself (I didn’t know about this cycling through public IP for inbound rules and outbound rules. I assumed that once there was an outbound rule that one took automatic precedence), and I learned a lot about how the terraform provider for Azure works. It was really neat to see how it integrates with the Go SDK for Azure, and to see that the Go SDK was complete for the work I needed to get done.
This was a fun contribution journey. Up to even more contributions?