Repair Azure Linux VM Broken boot partition

What is this post about?

Couple of days ago some colleagues are complaining about unbearable performance of SAP Application which is currently host with OpenSUSE on Azure VM, shortly after my manager call me in on Monday morning and tells me to reboot production environment of SAP.

Even though I know something will hit the fan because I surely did remember last time we reboot that machine, is 700+ days before.

As expected VM stuck at closing db process for an hour and a half, after briefing my manager, he said "Hard reset." and it died, completely.

I tried to stop it but NooOoOOooOOoo they don't listen to me!

So this post is about how I spent 26 hours only depend on Microsoft messy documentation and try and error.

Just in case I won't step into this mud again.

CAUTION

Perform such things like hard reset during a linux shutdown process will highly likely to damange your boot partition.

Diagnose Problem

On Azure Portal find your VM and check the boot log by click the Serial Console on the left.

If message like down below appears, you might have yourself a broken boot partition.

Failed to start File System Check on /dev/disk/by-uu...d-121f462d7e8d

Solution

Prerequisite

Install az cli extension

Refer Installation Guide

Install vm-repair Extension

If you haven't install this extension before.

$ az extension add -n vm-repair

If you already installed this extension, it's always a good idea to check extensions update.

$ az extension update -n vm-repair

Setup Repair VM

$ az vm repair create -g {{MyResourceGroup}} -n {{myVM}} --repair-username {{username}} --repair-password {{password!234}} --verbose

What this command will act is copy system partition files and mount it to a new VM which is automaticaly created.
Wait for couple of minutes and you can ssh into your new VM.

Start Repair

Repair Command

SSH into your repair VM, use $ lsblk to check the device Id of your broken parition and run:

$ fsck /dev/{{device_name}}

Y on all questions.

Unattened Script

Azure also provided a automatic repair script but I haven't tried it.

$ az vm repair run –g {{MyResourceGroup}} –n {{MyVM}} -–run-on-repair --run-id 2 --verbose

Complete and Restore

Delete Repair VM

Use command down below to delere your repair VM.

$ az vm repair restore -g {{MyResourceGroup}} -n {{MyVM}} --verbose

This will replace the partition we just fixed to broken partition.

And depends on your needs you can choose to keep or delete your repair VM, but it will still be charged even it's not booted.

Boot up VM

Now you can boot up your VM with fixed boot partition.