Azure VM Applications

Azure has Template Specs which lets you create a self service infrastructure as code model for your end users. You can use RBAC and they can deploy versioned templates. Microsoft introduced VM applications which lets your end users do something very similar to template specs, but with applications installed inside your VM. Let’s look at a quick demo and some things to watch out for.

Assuming you have an Azure compute gallery deployed, you need to create an application then a version of that application. I pasted a snippet below to get us started.

$applicationName = 'visualStudioCode-linux'
New-AzGalleryApplication `
  -ResourceGroupName $rgName `
  -GalleryName $galleryName `
  -Location $location `
  -Name $applicationName `
  -SupportedOSType Linux `
  -Description "Installs Visual Studio Code on Linux."
$version = '1.0.0'
New-AzGalleryApplicationVersion `
   -ResourceGroupName $rgName `
   -GalleryName $galleryName `
   -GalleryApplicationName $applicationName `
   -Name $version `
   -PackageFileLink $sasVscode `
   -Location $location `
   -Install "mv visualStudioCode-linux && bash install" `
   -Remove "bash remove" `
   -Update "mv visualStudioCode-linux && bash update" 

The cmdlet I want to focus on is New-AzGalleryApplicationVersion. The parameter PackageFileLink is required. Not only is it required, it must be a readable storage page blob aka you cannot use a raw github link to a file. I tried using a public repo for an install script, but when running this cmdlet, it just hangs. Now, I will get to a workaround on that, but let’s continue. The Install and Remove parameter are required, but update is optional. With that, I thought a simple framework can be used.

if [ $1 == "install" ];
    echo "Installing...";
elif [ $1 == "remove" ];
    echo "Removing...";
elif [ $1 == "update" ]
    echo "Updating...";
    echo "Incorrect argument passed. Please use install, remove or update";

Now I can easily just call an argument for install, remove and update. Reading about VS Code, we can use snap to handle our application installation.

if [ $1 == "install" ];
    echo "Installing...";
    sudo snap install --classic code 
elif [ $1 == "remove" ];
    echo "Removing...";
    sudo snap remove code
elif [ $1 == "update" ]
    echo "Updating...";
    sudo snap refresh --classic code
    echo "Incorrect argument passed. Please use install, remove or update";

Now, going back where I said there is a workaround on referencing a public repo. This is partially true. What you can do is reference a valid url dummy file in the PageFileLink parameter then execute your commands directly in the Install, Update and Remove parameter.

   -Install "apt-get update && apt-get install ubuntu-gnome-desktop xrdp gnome-shell-extensions -y && reboot" `
   -Remove "apt-get --purge remove ubuntu-gnome-desktop xrdp gnome-shell-extensions -y && reboot" 

A couple of other things to note. Once a user starts an application install, it executes fast, but the status could use some work. If I am an end user in the portal and click install for a published application, I have to click back on the extension to see the status. The status isn’t in the VM application tab. Also, development is some what painful. If the install, update or remove section fails, it seems to go in this endless loop and a giant pain to make it stop. I couldn’t figure it out. The documentation states to uninstall the extension, which I did, but it still keeps it on the VM. It is still in preview, so I can’t complain too much. Lastly, unlike template spec’s which lets you select a spec in another subscription in the portal, this does not exist for VM Apps. You need to make sure a compute gallery is in the subscription, then it will show for the end user. This limitation does not exist when using AZ Cli, Rest or AZ Powershell, as long as you have the correct permissions to talk with the compute gallery in another subscription.

I think VM Apps has a lot of potential to make the end user experience better. Think of apps typically installed for developers to test with. We can now ensure approved and validated applications are installed which I think it is a great win!

Guest Configuration Extension Broke in Azure Gov for RHEL 8.x+

UPDATE 4/11/2022 This has been fixed!

UPATE 4/3/2022 Still broke…waiting on product to fix.

UPDATE 3/4/2022 Microsoft product group will be pushing a fix out in 2 weeks to Azure Gov. I asked what the cause was, but nothing yet.

One of the great features of Azure Policy is the capability to audit OS settings for security baselines and compliance checking. I was deploying RHEL 8.4 and noticed the Guest Assignment was always hung in the pending state. I had no issues with Ubuntu, so it had to be something happening on the RHEL vm.

I navigated to /var/lib and saw the GuestConfig folder created, but when I was inside, it was empty. Hrm, this should be populated with folders and MOF files.

[root@rhel84 GuestConfig]# pwd
[root@rhel84 GuestConfig]# ls -al
total 4
drwxr--r--.  2 root root    6 Feb 26 22:13 .
drwxr-xr-x. 41 root root 4096 Feb 26 22:13 ..

Next step was to tail the messages log to see if anything can pin point what is actually happening.

[root@rhel84 GuestConfig]# tail -f /var/log/messages | grep -i GuestConfiguration
Feb 26 22:27:36 rhel84 systemd[7442]: gcd.service: Failed at step EXEC spawning /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service: Permission denied
Feb 26 22:27:46 rhel84 systemd[7458]: gcd.service: Failed at step EXEC spawning /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service: Permission denied

Alright, a permission denied. It’s something to start looking into, but I was confused why this is happening. I headed over to Azure commercial and spun up a RHEL 8.4 vm with the same Azure Policy to execute my security baseline. Well, to my surprise, everything worked just fine. Looking at /var/lib/GuestConfig showed the Configuration folder with mof files. Looking at the Guest Assignments, it was showing NonCompliant, so I know it is OK there. I did notice the Guest Extension in commercial is using 1.26.24 and gov is using 1.25.5. I tried deploying that version with no auto upgrade in gov, but same error.

After some research, I set selinux to permissive mode and instantly the Configuration folder was created and starting pulling the mof files down. OK, now I am really puzzled. Working with Azure support, they were able to reproduce this same issue in Gov, but not in commercial. I was shocked no other cases have been open. I am not sure when this problem started happening, but this means security baselines on RHEL 8.x+ are not working.

While I wait for Microsoft to investigate more why this is happening, I tried to find a workaround. Knowing it is selinux causing the issue, I thought I could just create a policy allowing the execution of the gc_linux_service.

I tested first by making sure selinux is set to Enforcing then using chcon to set the selinux context:

[root@rhel84 GuestConfig]# getenforce
chcon -t bin_t /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service

We’re all good. No error’s in the messages log. Since this could revert by a restorecon command being ran later, I added it to the selinux policy by running:

semanage fcontext -a -t bin_t /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service
restorecon -v /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service

I will update my post once Microsoft comes back with a reason why this is only happening in Azure Gov and see what proposed solution they have. For now, i’d not depend on the Guest Extension to perform your compliance checking for RHEL 8.x until a fix has been pushed.

Azure Bastion Standard Sku Autoscale?

The standard sku of Azure Bastion fixed a lot of the pain points of the basic sku. Things like setting up multiple instances and setting the port to use for Linux. The one thing I did not see was autoscale. The Microsoft doc’s state Each instance can support 10 concurrent RDP connections and 50 concurrent SSH connections. The number of connections per instances depends on what actions you are taking when connected to the client VM. For example, if you are doing something data intensive, it creates a larger load for the instance to process. Once the concurrent sessions are exceeded, an additional scale unit (instance) is required. Imagine the scenario that we are using a hub and spoke topology with a bastion sitting in our hub. We would need to setup monitoring around concurrent sessions and alert us when session connectivity was getting close, but why not autoscale it?

I was curious why this setting was missing, so I spun up a test environment with 2 RDP sessions. Remember that the default deployment has 2 bastions deployed. Looking at the metric for session count, we can see the following:

Now, I was totally confused why it kept showing 1 to .44ish every few minutes. I understand the 1 for average since its 2 sessions across 2 instances, but couldn’t understand why it kept dipping.

Here is the graph using sum as my aggregation. Same thing! At this point, I tried to split the graph on instance:

Seems to be a scale set internally running bastion if I had to guess. That 0 on vm000000 screwing my metric count up! Now that I had an understanding of the metrics, how could I scale this automatically? I could setup an alert rule that fires a webhook when the session count is above X or below Y. I just didn’t feel comfortable with these metrics as it could provision multiple scaleset instances of 0 and I wouldn’t know. I started doing some research and found an API call for getActiveSessions which would return my session count. This is ideally what I wanted, so I started going down this path. I figured I could create an Azure function or runbook that runs every so often and scales the bastion out by +1 or -1 based on some switch.

$restUri = "$((Get-AzContext).Subscription.Id)/resourceGroups/$bastionResourceGroupName/providers/Microsoft.Network/bastionHosts/$bastionHostName/getActiveSessions?api-version=2021-03-01"
$getStatus = Invoke-webrequest -UseBasicParsing -uri $restUri -Headers $authHeader -Method Post
$asyncUri = "$((Get-AzContext).Subscription.Id)/providers/Microsoft.Network/locations/$bastionResourceGroupLocation/operationResults/$($getStatus.headers['x-ms-request-id'])?api-version=2020-11-01"
$sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
while ($sessions -eq 'null' ) {
    start-sleep -s 2
    $sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
write-output "Current session count is: $($sessions.count)"

The docs made it seem like this was a sync call, but it is actually async. You need to query out operation results to pull back the session count. For more information, check out this article

Now that I have my session count, I could do a simple switch statement on setting my bastion instance count. I started with these numbers below:

$bastionObj = Get-AzBastion -ResourceGroupName $bastionResourceGroupName -Name $bastionHostName
switch ($sessions.count)
    #2 instances by default. Each can hold up to 12 sessions
    {0..22 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force  }
    {23..34 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 3 -Force  }
    {35..45 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 4 -Force  }
    {46..58 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 5 -Force  }
    Default {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force}

When I started to test the autoscale, I noticed one big problem! When setting the scaleunit count, it disconnects all sessions. That is a horrible end user experience. I am thinking this is why Microsoft did not implement autoscale 🙂

Well, next best scenario is resizing at the end of the working day to keep costs low. Add the code to authenticate into Azure via runbook or function and set it to run on a schedule. Maybe 8pm at night we resize based on user session count and before the work day starts we would resize to an instance count that fits our requirements. I’d imagine Microsoft will implement autoscale, but they need to figure out how to move existing sessions gracefully to another bastion host.

Can’t add an Azure budget after a new subscription?

Create a budget automatically after provisioning a new subscription

I am sure you ran into the situation where you create a new subscription, but want to add an Azure budget to help monitor and control spend. As you know, it can take some time for the subscription to sync with the EA portal. Here is a snippet from stating If you have a new subscription, you can’t immediately create a budget or use other Cost Management features. It might take up to 48 hours before you can use all Cost Management features. I don’t want to wait or try and remember adding a budget the next day. Let’s use Azure tools to solve this problem to automatically create a budget for us.

When I first read that statement above, I was thinking how to keep track of the new subscription details and have it automatically create the budget. I thought, why not use an Azure storage queue? I can start a runbook that creates the subscription, pops a message on the queue and will try every so often to create the budget. If successful, remove the message from the queue, but if not, keep it on and retry a few hours later. Let’s take a look at a snippet of the relevant code below.

$storageAccount = get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName 
$ctx = $storageAccount.Context
# Retrieve a specific queue
$queue = Get-AzStorageQueue –Name $queueName –Context $ctx
#create message
# Create a new message using a constructor of the CloudQueueMessage class
$queueMessage = [Microsoft.Azure.Storage.Queue.CloudQueueMessage]::new("$subName;$ownerupn")
# Add a new message to the queue

The code above is self explanatory. Get the queue information and pop a message with the subscription name and owner. We can create another runbook that runs every few hours to process messages on the queue.

$storageAccount = get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName 
$ctx = $storageAccount.Context
$invisibleTimeout = [System.TimeSpan]::FromSeconds(60)
$queue = Get-AzStorageQueue –Name $queueName –Context $ctx

if ($queue.QueueProperties.ApproximateMessagesCount -gt 0) {
    $queueMessage = $queue.CloudQueue.GetMessageAsync($invisibleTimeout, $null, $null)
    $msg = $queueMessage.Result.AsString
    Select-AzSubscription $msg.Split(';')[0]
    New-AzConsumptionBudget -ErrorAction SilentlyContinue -ErrorVariable cmdletError -Amount 1000 -Name "$($msg.Split(';')[0])-budget" -Category Cost -TimeGrain Monthly -StartDate (Get-Date -Format yyyy-MM).ToString() -ContactEmail '', $($msg.Split(';')[1]) -NotificationKey Key1 -NotificationThreshold 90 -NotificationEnabled 

    if ($cmdletError) {
        Write-Warning "Subscription $($msg.Split(';')[0]) might still be provisioning to ea portal. Will try again in a couple of hours..."
    else {
        $queue.CloudQueue.DeleteMessageAsync($queueMessage.Result.Id, $queueMessage.Result.popReceipt)

The runbook will check if the queue has a message, process the message, select into the newly create Azure subscription and create a new budget. If it throws an exception, write a warning and keep the message on the queue to try again later. If it does create a budget, we can safely delete the message.

It’s simple and does the job. There are 10 ways to solve a challenge and this is just one of them. Hope it helps!

Reactivate an Azure Subscription via API – Gov Cloud Edition

I recently had to reactivate an Azure subscription that was cancelled, but I noticed the instructions do not work in Azure Gov Cloud. There is no button to reactivate, so I was forced to submit a ticket to Microsoft and they fixed me up. Typically, if a subscription was cancelled, it was done by mistake and the end user needs access ASAP. I didn’t want to wait hours by submitting a ticket to Microsoft in the future, so I started figuring out how I could do this self service style in Azure gov.

I started to research the AZ CLI and PowerShell cmdlets, but nothing was coming up. As a last resort, I look at the API documentation and to my surprise, I found the POST call to enable a subscription If you noticed, I linked to API version 2019-03-01-preview. The latest version of 2020-09-01 was not working in I put a code snippet below:

$azContext = Get-AzContext
$azProfile = [Microsoft.Azure.Commands.Common.Authentication.Abstractions.AzureRmProfileProvider]::Instance.Profile
$profileClient = New-Object -TypeName Microsoft.Azure.Commands.ResourceManager.Common.RMProfileClient -ArgumentList ($azProfile)
$token = $profileClient.AcquireAccessToken($azContext.Subscription.TenantId)
$authHeader = @{
    'Authorization'='Bearer ' + $token.AccessToken 

#commercial uri
#gov uri
$restUri = "$($subscriptionId)/providers/Microsoft.Subscription/enable?api-version=2019-10-01-preview"
Invoke-RestMethod -uri $restUri -Method POST -Headers $authHeader

In larger organizations, this code could be used towards Service Now automation, Azure Automation, Azure Functions, etc to get the client up and running faster. I hope this helps you with your Azure journey. 🙂

I spy, with my little eye…Encryption at Host in Azure Gov Cloud?

One of the features that has been missing from Azure gov cloud is encryption at host. The restriction of dm-crypt specific to certain Linux operating systems and the cpu overhead using bitlocker makes this a big win, not to forget federal compliances you are trying to achieve. It feels like it is some kept secret and I am not sure why? You still need to access the portal with a special link just to provision with it enabled in commercial cloud. No bicep/arm template examples and a lot of the documentation seems to be from 3rd party blogs. Well, look no further!

I published a quick arm template that enables encryption at host, but before we deploy, we need to make sure the feature is enabled. Check if it is enabled by running Get-AzProviderFeature -FeatureName "EncryptionAtHost" -ProviderNamespace "Microsoft.Compute" and if it is not registered, register it by running Register-AzProviderFeature -FeatureName "EncryptionAtHost" -ProviderNamespace "Microsoft.Compute"

Once the feature has been registered, you can create a VM using this link for gov cloud When you get to the disk section, there will be an option to enable encryption at host.

Screenshot of the virtual mahine creation disks pane, encryption at host highlighted.

Using an ARM template is as easy as adding a securityProfile with encryptionAtHost set to true

          "securityProfile": {
              "encryptionAtHost": true

For a complete sample, please go here

I haven’t seen any announcements for encryption at host for gov cloud, but then again, I don’t see many for gov cloud to begin with. Hopefully, this makes your FedRAMP and CMMC journey a little easier 🙂

Azure Run Command via API

I had a scenario where I needed an end user to be able to run a few adhoc commands via Azure automation runbook and return the results. I am a big fan of Azure Automation as it has a nice display of the jobs and how it categorizes exceptions, warnings and output. The VM is running Ubuntu, but unfortunately, you cannot run adhoc commands using the Invoke-AzVmRunCommand cmdlet. You need to pass in a script 😦 I tried to do an inline script and also export it out then reference it in the runbook, but it would just display nothing. Knowing that az cli can run adhoc commands, I figured I would research the API.

I was getting no where with the Microsoft docs as the response was not the one I was getting. One simple trick I did was run the web browser developer tools and just monitor the API call being sent from the portal. In the picture below, you can see the API call and the JSON body which has a simple command of calling date. You can copy the API call directly from the devs tools in the specific format you want.

Now that I can make the call, I noticed it is sent asynchronous. Looking at the next call in my dev tools, I saw this URI being called with some GUIDs.

I tried to research this call but I didn’t see an explanation for the guid’s in the URI. What I did figure out is that the response from invoke-webrequest has a header key called Location and azure-asyncoperation which both have a URI that matches the call Azure was using in the portal. We can do a simple while loop to wait until the invoke-webrequest populates content which has our stdout from the runcommand. It will look something like this in an Azure runbook:

Connect-AzAccount -Identity

$azContext = Get-AzContext
$azProfile = [Microsoft.Azure.Commands.Common.Authentication.Abstractions.AzureRmProfileProvider]::Instance.Profile
$profileClient = New-Object -TypeName Microsoft.Azure.Commands.ResourceManager.Common.RMProfileClient -ArgumentList ($azProfile)

$token = $profileClient.AcquireAccessToken($azContext.Subscription.TenantId)

$auth = @{
    'Content-Type'  = 'application/json'
    'Authorization' = 'Bearer ' + $token.AccessToken 

$response = Invoke-WebRequest -useBasicParsing -Uri "$($((Get-AzContext).Subscription.Id))/resourceGroups/ubuntu/providers/Microsoft.Compute/virtualMachines/ubuntu/runCommand?api-version=2018-04-01" `
-Method "POST" `
-Headers $auth `
-ContentType "application/json" `
-Body "{`"commandId`":`"RunShellScript`",`"script`":[`"date`"]}"

Foreach ($key in ($response.Headers.GetEnumerator() | Where-Object {$_.Key -eq "Location"}))
       $checkStatus = $Key.Value

$contentCheck = Invoke-WebRequest -UseBasicParsing -Uri $checkStatus -Headers $auth
while (($contentCheck.content).count -eq 0) {
$contentCheck = Invoke-WebRequest -UseBasicParsing -Uri $checkStatus -Headers $auth
Write-output "Waiting for async call to finish..."
Start-Sleep -s 15

($contentCheck.content | convertfrom-json).value.message

As you can see, I am using a managed identity and logging in with it. The runbook calls the runcommand with a POST then it hits a while loop to wait for it to finish then output the results.

Azure Devtest Labs “Install Windows Update” not working

I was building out a formula in Devtest labs the other day and added a few artifacts, including “Install Windows Updates”. My goal was to build an ARM template that deploys DTL, creates a VM, then a custom image off that VM. Everything worked great, but as a good IT pro, I double checked my work. Looking at the VM, you can drill down into the artifacts section and see the status of each artifact applied. I had green everywhere, so it looked good. Upon further inspection, I noticed the Windows Update task finished extremely quick.

Clicking on the task to get more details, I saw this:

It just displayed the updates and rebooted. Well, maybe it did install? I checked the update history on the VM and it was empty. It also displayed those updates to be installed. Curious, I went to the PowerShell file in the packages folder C:\Packages\Plugins\Microsoft.Compute.CustomScriptExtension\1.10.12\Downloads\4\PublicRepo\master\0b7a713c381a8cbecf04f92122d1c0b07324871e\Artifacts\windows-install-windows-updates\scripts\artifact.ps1 and saw the cmdlet:

I was able to run this script and reproduce the same output as above. I did some digging and it looks like this cmdlet needs additional parameters now unlike in the past.

I updated it and re-ran the script which displayed:

This is what I would expect to see. Now, the bigger problem is that this is in Microsoft’s artifacts repo If you are using the public repo for your artifacts and have this specific artifact being consumed, i’d double check to actually make sure you are indeed patching your operating system. I did submit a pull request with the fix, so hopefully they review it soon.

Edit: Microsoft approved my pull request to fix this. Shouldn’t be an issue now 🙂

Azure Gov B2C

While most people consume Azure commercial cloud, Azure gov is another beast. It seems the lack of documentation makes each project a bit more challenging.

I am currently doing a POC for using B2C in gov cloud. B2C supports local accounts which makes it great to put application end user accounts in their own tenant instead of creating my own identity provider. Typically, when you create the B2C tenant, you link it to a subscription. I did not have this option as I could only create just the tenant. It typically looks like this when all is working:

Create a new Azure AD B2C tenant selected in Azure portal

Mine looked like this:

I looked for the feature provider Microsoft.AzureActiveDirectory and it is missing all together. I popped a ticket to Microsoft and they said B2C is supported and you don’t need to link it to a subscription. I was a bit confused because a subscription is a billing boundary and if I used MFA or conditional access, how could it be billed? Well, you can’t do this. After pleading my css case, I was told that this is in preview and engineering knows about this. What stings a bit more is that the Azure feedback item has been open since 2017 😦

Either way, I kept moving forward to see what I can do with this POC. The first thing to call out is that the endpoints are not documented at the site. You must use the endpoints button in your b2c registration you created. What I noticed is that instead of and your typical tenant tld, you need to add .us.

Once I had my endpoints configured correctly, my user flows were working just fine as accounts were being created in the tenant. Now, let me create local accounts in the portal in the b2c tenant. Nope, said could not create user. I am confused why my app could do it just fine. I found the api doc for create user and tested against my commercial sub and it worked, but when I tested against the gov graph endpoint, it failed. It said the property creationType was missing. Alright, so I added creationType=LocalAccount into the json body and the api call worked. Guess this is an issue.

The last issue I found is that the tenant type is set to Preview tenant. I couldn’t find anything what this meant until I stumbled across the 2016 announcement post. Information about your tenant type is available in your B2C Admin UI. If it says “Production-scale tenant”, you are good to go. If you have an existing “Preview tenant”, you must use it ONLY for development and testing. The lack of documentation of what is preview and not is hurting. This sounds like a red flag as I can’t deploy production apps into a preview tenant.

I popped a couple of tickets to Microsoft and will update this post once I get more information. More to come!

Edit: no date yet when it will GA in gov 😦