Azure Bastion Standard Sku Autoscale?

The standard sku of Azure Bastion fixed a lot of the pain points of the basic sku. Things like setting up multiple instances and setting the port to use for Linux. The one thing I did not see was autoscale. The Microsoft doc’s state Each instance can support 10 concurrent RDP connections and 50 concurrent SSH connections. The number of connections per instances depends on what actions you are taking when connected to the client VM. For example, if you are doing something data intensive, it creates a larger load for the instance to process. Once the concurrent sessions are exceeded, an additional scale unit (instance) is required. Imagine the scenario that we are using a hub and spoke topology with a bastion sitting in our hub. We would need to setup monitoring around concurrent sessions and alert us when session connectivity was getting close, but why not autoscale it?

I was curious why this setting was missing, so I spun up a test environment with 2 RDP sessions. Remember that the default deployment has 2 bastions deployed. Looking at the metric for session count, we can see the following:

Now, I was totally confused why it kept showing 1 to .44ish every few minutes. I understand the 1 for average since its 2 sessions across 2 instances, but couldn’t understand why it kept dipping.

Here is the graph using sum as my aggregation. Same thing! At this point, I tried to split the graph on instance:

Seems to be a scale set internally running bastion if I had to guess. That 0 on vm000000 screwing my metric count up! Now that I had an understanding of the metrics, how could I scale this automatically? I could setup an alert rule that fires a webhook when the session count is above X or below Y. I just didn’t feel comfortable with these metrics as it could provision multiple scaleset instances of 0 and I wouldn’t know. I started doing some research and found an API call for getActiveSessions which would return my session count. This is ideally what I wanted, so I started going down this path. I figured I could create an Azure function or runbook that runs every so often and scales the bastion out by +1 or -1 based on some switch.

$restUri = "$((Get-AzContext).Subscription.Id)/resourceGroups/$bastionResourceGroupName/providers/Microsoft.Network/bastionHosts/$bastionHostName/getActiveSessions?api-version=2021-03-01"
$getStatus = Invoke-webrequest -UseBasicParsing -uri $restUri -Headers $authHeader -Method Post
$asyncUri = "$((Get-AzContext).Subscription.Id)/providers/Microsoft.Network/locations/$bastionResourceGroupLocation/operationResults/$($getStatus.headers['x-ms-request-id'])?api-version=2020-11-01"
$sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
while ($sessions -eq 'null' ) {
    start-sleep -s 2
    $sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
write-output "Current session count is: $($sessions.count)"

The docs made it seem like this was a sync call, but it is actually async. You need to query out operation results to pull back the session count. For more information, check out this article

Now that I have my session count, I could do a simple switch statement on setting my bastion instance count. I started with these numbers below:

$bastionObj = Get-AzBastion -ResourceGroupName $bastionResourceGroupName -Name $bastionHostName
switch ($sessions.count)
    #2 instances by default. Each can hold up to 12 sessions
    {0..22 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force  }
    {23..34 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 3 -Force  }
    {35..45 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 4 -Force  }
    {46..58 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 5 -Force  }
    Default {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force}

When I started to test the autoscale, I noticed one big problem! When setting the scaleunit count, it disconnects all sessions. That is a horrible end user experience. I am thinking this is why Microsoft did not implement autoscale 🙂

Well, next best scenario is resizing at the end of the working day to keep costs low. Add the code to authenticate into Azure via runbook or function and set it to run on a schedule. Maybe 8pm at night we resize based on user session count and before the work day starts we would resize to an instance count that fits our requirements. I’d imagine Microsoft will implement autoscale, but they need to figure out how to move existing sessions gracefully to another bastion host.

Azure Bastion Alternatives

I had a project come up where I needed 2 factor auth and no public IP with RDP access. I instantly thought Azure Bastion would be great for this. I can use conditional access and hit my private IP VMs. Well, the VM had to be Ubuntu running Gnome desktop with xRDP. Azure Bastion is tied to the OS profile where it is SSH for Linux or RDP for Windows. There is an open feedback item to allow RDP to Linux. With all of that being said, let me present… Apache Guacamole. Nothing like presenting to executives saying let’s use Guacamole to solve our issue, haha.

I found an Azure marketplace image from Bitnami that provisions a VM with http to https redirection enabled with some dummy certificates and guacamole installed.

Once you provision the image, it has a public ip already assigned with a nsg on the nic opening ports 80, 443 and 22. I’d modify that nsg to remove port 80 and lock down port 22 to your IP or remove it and just use the serial console. Now, going back to my original requirements of 2fa, there is a saml extension you can use. We can easily create a new saml application in Azure Active Directory as well. Before we do this, we want to make sure we add a new user account with admin permissions in the format of user@aadDomain, else when we browse to the UI with our saml configured, we won’t be able to log in unless we use the API with the default guacadmin account. You can certainly use the API to create new saml accounts in Guacamole, but login first using the guacadmin creds to make it easier for testing. In order to get the default guacadmin password, look here. Make sure you change it!

Login and add a new user with admin permissions. For username, put in the fqdn of the user in AAD. Do not set a password.

Once we log in with the AAD creds, we can delete the guacadmin account.

Get on the Guacamole VM and download the saml extension, tar -xf and copy the jar inside /opt/bitnami/guacamole/extensions. When guacamole is restarted, it will automatically load the jar. We don’t want to restart just yet, as we need to configure the file with the saml entries. Let’s create a new Azure Enterprise Application and select Create your own application.

Give your app a new name and hit Create.

You will be taken to your new application which you will now select Single Sign On

Select SAML

Edit the basic configuration.

First, modify the Entity ID and Reply URL. We want to put in the FQDN where end users will access it via their browser. I have a domain I mapped to the public IP of Hit save and we need to grab the Login URL from #4

Back on the VM, edit /opt/bitnami/guacamole/ file and add these 3 lines:

saml-idp-url: login url from our enterprise saml app

saml-entity-id and saml-callback-url is our fqdn mapped to the public ip

Save this file. The last step is we need a valid certificate for our domain. I already have one and replaced the server.crt and server.key in /opt/bitnami/apache/conf/bitnami/certs. There is also a tool from Bitnami that does Let’s Encrypt for you.

Restart the required services with sudo /opt/bitnami/ restart

Now, either add the AAD user to the enterprise application or toggle user assignment required to No

Have your user navigate to the FQDN and they will be redirected to auth against AAD.

A couple of things to note. I took this project one more step where you can use an ARM template and set the secrets to a key vault with your certificate. If you have a WAF in front such as Azure Front Door, assign a custom domain name with tls and setup your AAD application to use that FQDN. I have a custom script extension that preps the VM with the steps we did above. For my project, I just pushed the ARM template to Template Specs for quick and easy provisioning.