The 6 Most Helpful Backup Alarms
In a perfect world backups would have a 100% success rate and backup speeds would always be over 1 GB/s. Unfortunately, IT infrastructure is ever-changing and there are a number of variables that can cause a backup to either fail or not achieve expected performance. That is why it is important to have alarms that notify you or even take automatic action as changes in the environment inevitably effect backups. There are over 200 alarms in Veeam ONE but below are my six favorites that every customer should use.
1. Automatically Remove VM Snapshots Based on Age
Veeam monitoring goes beyond typical capacity planning and protected machine alarms. In order to successfully troubleshoot an issue, it is crucial to understand the network, storage and infrastructure components. For example, a common reason for slower performance on backups is because the source storage on VMware datastore is busy or is running out of space. Veeam understands this and digs into the virtual infrastructure itself. You certainly can set an alarm about datastore utilization, but you also can set an alarm that automatically deletes VM snapshots that are older than "x" days.
Removing snapshots that are old is a great way to save space on the underlying storage for the VMs. While creating this alarm you can define if you would simply like to be notified of snapshots older than "x" days, or you can instruct Veeam to actually delete them.
In addition, any image based backup software occassionally might forget to delete the snapshot used for backup. Not everyone or everything is perfect. You can set a simliar alarm that deletes orphaned snapshots automatically.
2. Power on VMs Automatically
At first glance, you might think, "Brad, I don't want to power on any VM that's powered off." And I would completely agree with you. What is great about this alarm is you can assign it to a specific part of your virtual infrastructure rather than the entire vCenter or even cluster. You can define a specific subset by tag to identify mission critical VMs that should never be turned off, and if they are this rule will automatically turn them back on.
3. Backup VMs that Missed RPO Window
An alarm that checks that all VMs have been backed up within the defined RPO is great, but an alarm that will automatically backup any VM that's fallen out of its RPO is even better! Veeam ONE offers the capability to define your RPO for either your whole virtual infrastrucutre, or you can assign this alarm to a subset of VMs like we did in the previous alarm.
Veeam ONE will go out and look for any VMs defined that have not been backed up in the last 24 hours and trigger a backup. You can assign this alarm to a whole vCenter, cluster or to a tag that is for mission critical workloads.
4. Compare Backup File Size Growth for Possible Ransomware Activity
It wouldn't be a backup monitoring blog without mentioning ransomware. It is important to keep your infrastructure hardened as attackers know to target backup servers in an enterprise environment. At the bare minimum, I suggest enabling two-factor authentication on the Veeam server and Veeam repositories. In a scenario where the attacker has already penetrated the network and started encrypting files, monitoring backup file size growth can be a great way to look for ransomware. Encryption will cause significantly larger backup files as there are more block changes and less dedupe and compression.
The above example looks for backup files that have grown by 150% or more. It is analyzing the last three backup file sizes for comparison, and it is doing this for all the jobs. If extreme backup file size growth is discovered an email will be sent out. You could also enable a script to run that would spin up the backup in an isolated environment to search for encrypted files.
5. Scan for Possible Ransomware Activity.
Another helpful capability to detect ransomware, is an alarm that looks for high CPU usage and high write rates on the underlying datastores.
High CPU usage is a great indicator that files on the VM are being encrypted since encryption is a CPU intensive task. It might be common in your environment to have VMs with high CPU usage though, so you can suppress the alarm during snapshot deletion or creation activities.
6. SQL Permissions or Volume Shadow Copy Issues
One of the most common reasons for SQL backup failures are because the permissions have changed. With Veeam ONE you and the SQL DBAs can be notified if SQL backups are failing due to a permissions issue.
Another common reason for SQL backups failing is because VSS (Volume Shadow Copy) which is used to quiesce the database during backup operations is not running. You can create an alarm that notifies you and the SQL DBAs if the SQL Writer service for VSS has stopped.
It is crucial to have a backup monitoring tool that goes beyond just the backup components. Often times slow performance or failed backup jobs can be a sign of a network, application, storage or permissions issue just to name a few. Without a tool that can dig deeper into the root of the issue, the problem is likely to reoccur. I see many customers choose to purchase Veeam without Veeam ONE just to change their mind within a few months.