I had a project that required HPC pack, so I went right to CycleCloud to provision my new cluster. When I came to the Secrets and Certificate section, I was going to use a Key Vault to store my certificate and password. After selecting the MSI Identity dropdown, it was empty.
Using KeyVault is a hard requirement as I don’t want to be passing PFX and passwords into CycleCloud. After unsuccessfully trying to figure out why my MSI is MIA, I popped a support ticket. After explaining the situation, the rep was able to reproduce it, but eventually came back with a solution. CycleCloud runs a job every hour that will discover new Azure resources. We can force this update by ssh’ing into the CycleCloud node and as root run the following command:
/opt/cycle_server/cycle_server run_action Run:Application.Timer -eq Name plugin.azure.monitor_reference
Select your Subscription in the CycleCloud UI and click the Tasks tab. You will see a task running that is collecting reference data from Azure.
Once this task is finished, navigate back to creating the HPC Pack cluster and the dropdown should populate the managed identity.
Either have patience or run the command above to force the discovery. 🙂
I’ve been doing a lot of HPC work using Azure CycleCloud. It can quickly deploy an entire HPC cluster in minutes with the benefit of autoscale on the compute node side. I will have a couple of posts about some things that gave me grey hair, but let’s first look at the Jetpack installation error that CycleCloud kept showing me.
I know that CycleCloud makes use of Jetpack after doing some research and was totally confused why Jetpack would be throwing an error about Alma Linux not being supported when it is the default image used for a Slurm deployment. I was trying to find some pattern, so I would delete the Slurm cluster and reprovision it. I SSH’ed into the node and verified jetpack installed just fine from the /opt/cycle logs dir. I started looking around and noticed a custom script extension is used to start the install of something.
Drilling down into the cse, you can see the output is referencing jetpackd.
At this point, on my new cluster, everything is green. No errors, so I waited for it to throw the jetpack install error again. About 30 minutes in, the error came back. Totally confused, I started looking at the logs again. I browsed to the package install path on the node and noticed that the CSE was replaced with a CSE I automatically deploy. I started thinking, is there some kind of dependency on the package the CSE initially downloads onto the VM? Well, after reviewing the logs of my CSE that is deployed, guess what, it said Fatal error: Unrecognized OS: AlmaLinux. The The issues field in the CycleCloud UI is reporting the output of the CSE. Jetpack installation error is misleading because Jetpack installed just fine. It was Crowdstrike which does not like Alma Linux. I switched over to CentOS and the error went away. Long story short, if you deploy CSE’s, know that CycleCloud will display the stderr if a problem happens and it’ll fall under a Jetpack installation error.