Azure Bastion Standard Sku Autoscale?

The standard sku of Azure Bastion fixed a lot of the pain points of the basic sku. Things like setting up multiple instances and setting the port to use for Linux. The one thing I did not see was autoscale. The Microsoft doc’s state Each instance can support 10 concurrent RDP connections and 50 concurrent SSH connections. The number of connections per instances depends on what actions you are taking when connected to the client VM. For example, if you are doing something data intensive, it creates a larger load for the instance to process. Once the concurrent sessions are exceeded, an additional scale unit (instance) is required. Imagine the scenario that we are using a hub and spoke topology with a bastion sitting in our hub. We would need to setup monitoring around concurrent sessions and alert us when session connectivity was getting close, but why not autoscale it?

I was curious why this setting was missing, so I spun up a test environment with 2 RDP sessions. Remember that the default deployment has 2 bastions deployed. Looking at the metric for session count, we can see the following:

Now, I was totally confused why it kept showing 1 to .44ish every few minutes. I understand the 1 for average since its 2 sessions across 2 instances, but couldn’t understand why it kept dipping.

Here is the graph using sum as my aggregation. Same thing! At this point, I tried to split the graph on instance:

Seems to be a scale set internally running bastion if I had to guess. That 0 on vm000000 screwing my metric count up! Now that I had an understanding of the metrics, how could I scale this automatically? I could setup an alert rule that fires a webhook when the session count is above X or below Y. I just didn’t feel comfortable with these metrics as it could provision multiple scaleset instances of 0 and I wouldn’t know. I started doing some research and found an API call for getActiveSessions which would return my session count. This is ideally what I wanted, so I started going down this path. I figured I could create an Azure function or runbook that runs every so often and scales the bastion out by +1 or -1 based on some switch.

$restUri = "$((Get-AzContext).Subscription.Id)/resourceGroups/$bastionResourceGroupName/providers/Microsoft.Network/bastionHosts/$bastionHostName/getActiveSessions?api-version=2021-03-01"
$getStatus = Invoke-webrequest -UseBasicParsing -uri $restUri -Headers $authHeader -Method Post
$asyncUri = "$((Get-AzContext).Subscription.Id)/providers/Microsoft.Network/locations/$bastionResourceGroupLocation/operationResults/$($getStatus.headers['x-ms-request-id'])?api-version=2020-11-01"
$sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
while ($sessions -eq 'null' ) {
    start-sleep -s 2
    $sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
write-output "Current session count is: $($sessions.count)"

The docs made it seem like this was a sync call, but it is actually async. You need to query out operation results to pull back the session count. For more information, check out this article

Now that I have my session count, I could do a simple switch statement on setting my bastion instance count. I started with these numbers below:

$bastionObj = Get-AzBastion -ResourceGroupName $bastionResourceGroupName -Name $bastionHostName
switch ($sessions.count)
    #2 instances by default. Each can hold up to 12 sessions
    {0..22 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force  }
    {23..34 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 3 -Force  }
    {35..45 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 4 -Force  }
    {46..58 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 5 -Force  }
    Default {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force}

When I started to test the autoscale, I noticed one big problem! When setting the scaleunit count, it disconnects all sessions. That is a horrible end user experience. I am thinking this is why Microsoft did not implement autoscale 🙂

Well, next best scenario is resizing at the end of the working day to keep costs low. Add the code to authenticate into Azure via runbook or function and set it to run on a schedule. Maybe 8pm at night we resize based on user session count and before the work day starts we would resize to an instance count that fits our requirements. I’d imagine Microsoft will implement autoscale, but they need to figure out how to move existing sessions gracefully to another bastion host.