Missing Microsoft Applications in GCC High

An awesome feature to bring some sanity to Azure VM authentication and authorization is using Microsoft Azure Windows and Linux Virtual Machine Sign-in functionality. You can quickly test this by selecting the Login with Azure AD check box during provisioning.

I wanted to add MFA and User Sign In risk checks using conditional access before a user can actually log into the VM. When setting up my policy, I could not find Microsoft Azure Windows Virtual Machine Sign-in or Microsoft Azure Linux Virtual Machine Sign-in app. I was puzzled, so I quickly checked my commercial tenant and sure enough it existed. I initially thought it was one of those not in gov cloud, but only commercial cloud situation. I created a ticket to support and they came back noting that they have seen Microsoft applications missing in GCC High tenants. The quick fix is just to manually add the missing applications. Once they told me the application Id’s are the same, we can quickly just create it.

New-AzureADServicePrincipal -AppId '372140e0-b3b7-4226-8ef9-d57986796201' #Microsoft Azure Windows Virtual Machine Sign-in
New-AzureADServicePrincipal -AppId 'ce6ff14a-7fdc-4685-bbe0-f6afdfcfa8e0' #Microsoft Azure Linux Virtual Machine Sign-In

After running those PowerShell cmdlet’s in my cloud shell, I can now successfully see the apps during conditional access creation.

Guest Configuration Extension Broke in Azure Gov for RHEL 8.x+

UPDATE 4/11/2022 This has been fixed!

UPATE 4/3/2022 Still broke…waiting on product to fix.

UPDATE 3/4/2022 Microsoft product group will be pushing a fix out in 2 weeks to Azure Gov. I asked what the cause was, but nothing yet.

One of the great features of Azure Policy is the capability to audit OS settings for security baselines and compliance checking. I was deploying RHEL 8.4 and noticed the Guest Assignment was always hung in the pending state. I had no issues with Ubuntu, so it had to be something happening on the RHEL vm.

I navigated to /var/lib and saw the GuestConfig folder created, but when I was inside, it was empty. Hrm, this should be populated with folders and MOF files.

[root@rhel84 GuestConfig]# pwd
/var/lib/GuestConfig
[root@rhel84 GuestConfig]# ls -al
total 4
drwxr--r--.  2 root root    6 Feb 26 22:13 .
drwxr-xr-x. 41 root root 4096 Feb 26 22:13 ..

Next step was to tail the messages log to see if anything can pin point what is actually happening.

[root@rhel84 GuestConfig]# tail -f /var/log/messages | grep -i GuestConfiguration
Feb 26 22:27:36 rhel84 systemd[7442]: gcd.service: Failed at step EXEC spawning /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service: Permission denied
Feb 26 22:27:46 rhel84 systemd[7458]: gcd.service: Failed at step EXEC spawning /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service: Permission denied

Alright, a permission denied. It’s something to start looking into, but I was confused why this is happening. I headed over to Azure commercial and spun up a RHEL 8.4 vm with the same Azure Policy to execute my security baseline. Well, to my surprise, everything worked just fine. Looking at /var/lib/GuestConfig showed the Configuration folder with mof files. Looking at the Guest Assignments, it was showing NonCompliant, so I know it is OK there. I did notice the Guest Extension in commercial is using 1.26.24 and gov is using 1.25.5. I tried deploying that version with no auto upgrade in gov, but same error.

After some research, I set selinux to permissive mode and instantly the Configuration folder was created and starting pulling the mof files down. OK, now I am really puzzled. Working with Azure support, they were able to reproduce this same issue in Gov, but not in commercial. I was shocked no other cases have been open. I am not sure when this problem started happening, but this means security baselines on RHEL 8.x+ are not working.

While I wait for Microsoft to investigate more why this is happening, I tried to find a workaround. Knowing it is selinux causing the issue, I thought I could just create a policy allowing the execution of the gc_linux_service.

I tested first by making sure selinux is set to Enforcing then using chcon to set the selinux context:

[root@rhel84 GuestConfig]# getenforce
Enforcing
chcon -t bin_t /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service

We’re all good. No error’s in the messages log. Since this could revert by a restorecon command being ran later, I added it to the selinux policy by running:

semanage fcontext -a -t bin_t /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service
restorecon -v /var/lib/waagent/Microsoft.GuestConfiguration.ConfigurationforLinux-1.25.5/GCAgent/GC/gc_linux_service

I will update my post once Microsoft comes back with a reason why this is only happening in Azure Gov and see what proposed solution they have. For now, i’d not depend on the Guest Extension to perform your compliance checking for RHEL 8.x until a fix has been pushed.

Azure Bastion Standard Sku Autoscale?

The standard sku of Azure Bastion fixed a lot of the pain points of the basic sku. Things like setting up multiple instances and setting the port to use for Linux. The one thing I did not see was autoscale. The Microsoft doc’s state Each instance can support 10 concurrent RDP connections and 50 concurrent SSH connections. The number of connections per instances depends on what actions you are taking when connected to the client VM. For example, if you are doing something data intensive, it creates a larger load for the instance to process. Once the concurrent sessions are exceeded, an additional scale unit (instance) is required. Imagine the scenario that we are using a hub and spoke topology with a bastion sitting in our hub. We would need to setup monitoring around concurrent sessions and alert us when session connectivity was getting close, but why not autoscale it?

I was curious why this setting was missing, so I spun up a test environment with 2 RDP sessions. Remember that the default deployment has 2 bastions deployed. Looking at the metric for session count, we can see the following:

Now, I was totally confused why it kept showing 1 to .44ish every few minutes. I understand the 1 for average since its 2 sessions across 2 instances, but couldn’t understand why it kept dipping.

Here is the graph using sum as my aggregation. Same thing! At this point, I tried to split the graph on instance:

Seems to be a scale set internally running bastion if I had to guess. That 0 on vm000000 screwing my metric count up! Now that I had an understanding of the metrics, how could I scale this automatically? I could setup an alert rule that fires a webhook when the session count is above X or below Y. I just didn’t feel comfortable with these metrics as it could provision multiple scaleset instances of 0 and I wouldn’t know. I started doing some research and found an API call for getActiveSessions https://docs.microsoft.com/en-us/rest/api/virtualnetwork/get-active-sessions/get-active-sessions which would return my session count. This is ideally what I wanted, so I started going down this path. I figured I could create an Azure function or runbook that runs every so often and scales the bastion out by +1 or -1 based on some switch.

$restUri = "https://management.azure.com/subscriptions/$((Get-AzContext).Subscription.Id)/resourceGroups/$bastionResourceGroupName/providers/Microsoft.Network/bastionHosts/$bastionHostName/getActiveSessions?api-version=2021-03-01"
$getStatus = Invoke-webrequest -UseBasicParsing -uri $restUri -Headers $authHeader -Method Post
$asyncUri = "https://management.azure.com/subscriptions/$((Get-AzContext).Subscription.Id)/providers/Microsoft.Network/locations/$bastionResourceGroupLocation/operationResults/$($getStatus.headers['x-ms-request-id'])?api-version=2020-11-01"
$sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
while ($sessions -eq 'null' ) {
    start-sleep -s 2
    $sessions = invoke-restmethod -uri $asyncUri -Headers $authHeader
}
 
write-output "Current session count is: $($sessions.count)"

The docs made it seem like this was a sync call, but it is actually async. You need to query out operation results to pull back the session count. For more information, check out this article https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/async-operations

Now that I have my session count, I could do a simple switch statement on setting my bastion instance count. I started with these numbers below:

$bastionObj = Get-AzBastion -ResourceGroupName $bastionResourceGroupName -Name $bastionHostName
switch ($sessions.count)
{
    #2 instances by default. Each can hold up to 12 sessions
    {0..22 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force  }
    {23..34 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 3 -Force  }
    {35..45 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 4 -Force  }
    {46..58 -contains $_} {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 5 -Force  }
    Default {Set-AzBastion -InputObject $bastionObj -Sku "Standard" -ScaleUnit 2 -Force}
 
}

When I started to test the autoscale, I noticed one big problem! When setting the scaleunit count, it disconnects all sessions. That is a horrible end user experience. I am thinking this is why Microsoft did not implement autoscale 🙂

Well, next best scenario is resizing at the end of the working day to keep costs low. Add the code to authenticate into Azure via runbook or function and set it to run on a schedule. Maybe 8pm at night we resize based on user session count and before the work day starts we would resize to an instance count that fits our requirements. I’d imagine Microsoft will implement autoscale, but they need to figure out how to move existing sessions gracefully to another bastion host.

Can’t add an Azure budget after a new subscription?

Create a budget automatically after provisioning a new subscription

I am sure you ran into the situation where you create a new subscription, but want to add an Azure budget to help monitor and control spend. As you know, it can take some time for the subscription to sync with the EA portal. Here is a snippet from https://docs.microsoft.com/en-us/azure/cost-management-billing/costs/tutorial-acm-create-budgets stating If you have a new subscription, you can’t immediately create a budget or use other Cost Management features. It might take up to 48 hours before you can use all Cost Management features. I don’t want to wait or try and remember adding a budget the next day. Let’s use Azure tools to solve this problem to automatically create a budget for us.

When I first read that statement above, I was thinking how to keep track of the new subscription details and have it automatically create the budget. I thought, why not use an Azure storage queue? I can start a runbook that creates the subscription, pops a message on the queue and will try every so often to create the budget. If successful, remove the message from the queue, but if not, keep it on and retry a few hours later. Let’s take a look at a snippet of the relevant code below.


$storageAccount = get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName 
$ctx = $storageAccount.Context
 
# Retrieve a specific queue
$queue = Get-AzStorageQueue –Name $queueName –Context $ctx
 
#create message
# Create a new message using a constructor of the CloudQueueMessage class
$queueMessage = [Microsoft.Azure.Storage.Queue.CloudQueueMessage]::new("$subName;$ownerupn")
 
# Add a new message to the queue
$queue.CloudQueue.AddMessage($queueMessage,$null)

The code above is self explanatory. Get the queue information and pop a message with the subscription name and owner. We can create another runbook that runs every few hours to process messages on the queue.

 
$storageAccount = get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName 
 
$ctx = $storageAccount.Context
$invisibleTimeout = [System.TimeSpan]::FromSeconds(60)
$queue = Get-AzStorageQueue –Name $queueName –Context $ctx

if ($queue.QueueProperties.ApproximateMessagesCount -gt 0) {
 
    $queueMessage = $queue.CloudQueue.GetMessageAsync($invisibleTimeout, $null, $null)
    $msg = $queueMessage.Result.AsString
    Select-AzSubscription $msg.Split(';')[0]
 
    New-AzConsumptionBudget -ErrorAction SilentlyContinue -ErrorVariable cmdletError -Amount 1000 -Name "$($msg.Split(';')[0])-budget" -Category Cost -TimeGrain Monthly -StartDate (Get-Date -Format yyyy-MM).ToString() -ContactEmail 'IT@contoso.com', $($msg.Split(';')[1]) -NotificationKey Key1 -NotificationThreshold 90 -NotificationEnabled 

    if ($cmdletError) {
        $cmdletError
        Write-Warning "Subscription $($msg.Split(';')[0]) might still be provisioning to ea portal. Will try again in a couple of hours..."
    }
    else {
        $queue.CloudQueue.DeleteMessageAsync($queueMessage.Result.Id, $queueMessage.Result.popReceipt)
 
    }
}

The runbook will check if the queue has a message, process the message, select into the newly create Azure subscription and create a new budget. If it throws an exception, write a warning and keep the message on the queue to try again later. If it does create a budget, we can safely delete the message.

It’s simple and does the job. There are 10 ways to solve a challenge and this is just one of them. Hope it helps!

I spy, with my little eye…Encryption at Host in Azure Gov Cloud?

One of the features that has been missing from Azure gov cloud is encryption at host. The restriction of dm-crypt specific to certain Linux operating systems and the cpu overhead using bitlocker makes this a big win, not to forget federal compliances you are trying to achieve. It feels like it is some kept secret and I am not sure why? You still need to access the portal with a special link just to provision with it enabled in commercial cloud. No bicep/arm template examples and a lot of the documentation seems to be from 3rd party blogs. Well, look no further!

I published a quick arm template that enables encryption at host, but before we deploy, we need to make sure the feature is enabled. Check if it is enabled by running Get-AzProviderFeature -FeatureName "EncryptionAtHost" -ProviderNamespace "Microsoft.Compute" and if it is not registered, register it by running Register-AzProviderFeature -FeatureName "EncryptionAtHost" -ProviderNamespace "Microsoft.Compute"

Once the feature has been registered, you can create a VM using this link for gov cloud https://portal.azure.us/?feature.enabledoubleencryption=true&feature.enablehostbasedencryption=true When you get to the disk section, there will be an option to enable encryption at host.

Screenshot of the virtual mahine creation disks pane, encryption at host highlighted.

Using an ARM template is as easy as adding a securityProfile with encryptionAtHost set to true

          "securityProfile": {
              "encryptionAtHost": true
          },

For a complete sample, please go here https://raw.githubusercontent.com/jrudley/vmencathost/main/azuredeploy.json

I haven’t seen any announcements for encryption at host for gov cloud, but then again, I don’t see many for gov cloud to begin with. Hopefully, this makes your FedRAMP and CMMC journey a little easier 🙂