I’ve been doing a lot of HPC work using Azure CycleCloud. It can quickly deploy an entire HPC cluster in minutes with the benefit of autoscale on the compute node side. I will have a couple of posts about some things that gave me grey hair, but let’s first look at the Jetpack installation error that CycleCloud kept showing me.
I know that CycleCloud makes use of Jetpack after doing some research and was totally confused why Jetpack would be throwing an error about Alma Linux not being supported when it is the default image used for a Slurm deployment. I was trying to find some pattern, so I would delete the Slurm cluster and reprovision it. I SSH’ed into the node and verified jetpack installed just fine from the /opt/cycle logs dir. I started looking around and noticed a custom script extension is used to start the install of something.
Drilling down into the cse, you can see the output is referencing jetpackd.
At this point, on my new cluster, everything is green. No errors, so I waited for it to throw the jetpack install error again. About 30 minutes in, the error came back. Totally confused, I started looking at the logs again. I browsed to the package install path on the node and noticed that the CSE was replaced with a CSE I automatically deploy. I started thinking, is there some kind of dependency on the package the CSE initially downloads onto the VM? Well, after reviewing the logs of my CSE that is deployed, guess what, it said Fatal error: Unrecognized OS: AlmaLinux. The The issues field in the CycleCloud UI is reporting the output of the CSE. Jetpack installation error is misleading because Jetpack installed just fine. It was Crowdstrike which does not like Alma Linux. I switched over to CentOS and the error went away. Long story short, if you deploy CSE’s, know that CycleCloud will display the stderr if a problem happens and it’ll fall under a Jetpack installation error.