How Snapshots Make Better Backup

Posted 05/15/2018 by Roy Child

In his recent blog post, “Why Snapshots Break Backup,” George Crump said that snapshots can’t stand on their own as backups and that they need to complement backup to ensure recoverability. And while many businesses have realized they need snapshots to solve backup and recovery challenges for their critical data, without a tool that integrates snapshots into the data lifecycle, they end up creating new problems for themselves. However, snapshots can make your backups better.

Save your production servers for production

One of the key benefits of snapshots are that they take the backup load off production systems. Snapshots complete in seconds, so your critical production apps no longer see significant load over a number of hours. Instead you transfer that load to a separate proxy server, which runs backups on behalf of production. And you can consolidate backups for multiple production servers through the same proxy to gain efficiency.

Offloading backup brings challenges

It turns out that’s easier said than done. Using separate tools for snapshots and backups lead to a disconnect in your backup environment that has big implications for administration and recovery. Your backups happen on the proxy server, so your backup tool only knows what the proxy knows. The proxy server, and therefore your backup tool, doesn’t know about your production server, where it keeps the important files, or any databases or application that might be running there.

Why does that matter? You have to keep track of these differences. As your production environment changes and grows, you have to keep both your snapshot and backup tools up to date. And when it comes time to recover data, you have a couple of extra challenges. First, you have to know exactly what files to restore for any databases. You also need a DBA on hand to get the database running again and back to the right point in time, since you can’t use the feature you bought in your backup tool.

Second, you want to make sure you’re restoring the right files to the right place, so you need to know exactly where your data came from originally - remember, your backup tool doesn’t know anything about production. Tracking is going to become critical to your recovery strategy.

Remember, this adds up quickly. Every database, disk, VM, file system, server, storage system and proxy you snap makes the list longer and harder to search. Then remember that you have to update the tracking data as the environment changes. And if you have to do large-scale recovery, unwinding all the relationships to perform recovery could add hours to the overall downtime.

Something else that often becomes an afterthought is visibility, especially with scripted solutions. Your backup tool has reports and alerts to tell you when things aren’t working properly, and just as important to show when they are. Your snapshot management solution needs the same.

Because these challenges and gaps are only due to the disconnect between tools, it doesn’t matter how good your snapshot and backup tools are on their own; you will have to deal with this.

One tool to manage snapshots and backups

When the backup tool also manages snapshots, all of this complexity goes away. The tool tracks all the proxy information. It manages all the integration elements - OS, hypervisor, databases - to ensure recoverability and lets you restore a database as a database, even to a point in time, without pulling a DBA off something else. It lets you standardize on one recovery process for snapshots and backups. And it gives you the operational visibility you need without duplicating reports or making you build capabilities.

It’s important to note that even if your backup tool manages snapshots, it can still break your backups if your solution doesn’t support all your applications or all your storage. You’re going to have to use scripts and manual processes to plug any gaps in your tools, bringing you right back to the problems I just described.

Scripting is not the answer

I’ve seen this firsthand. When I was an IT engineer, we had an application that needed to take coordinated, application-consistent backups across a pair of servers. There was far too much data and complexity to run backups in production and stay in the window. We needed snapshots, and at the time there was no snapshot management tool for our storage. I had to script.

The scripted solution worked a lot like I described above. The snapshots were backed up through a proxy server, with no awareness of the SQL databases or the source servers. Recovery was always a huge pain, because on top of looking up the relationships to make sure they were recovering the data, the admin had to recover the database and files on both servers to the same point in time.

We had to maintain this process for many years, and it was a burden. We had to update configuration files frequently when the application team added disks, and more than a couple of times someone forgot an update and broke backups. We also had server and storage migrations that needed even bigger configuration changes and heavy testing. We changed backup products and had to completely rework the process.

Largely because of the complexity, we only ever used the solution on two of these server pairs, and even with a small footprint it took a significant amount of time to maintain and troubleshoot the process. Honestly, we wouldn’t even have bothered if we couldn’t meet our backup windows any other way.

If I could build this process today using Commvault IntelliSnap® technology, it would be a snap (pun intended). I would have no scripts to write, no configuration files to manage, SQL awareness and nothing to track. My reporting and alerting would cover both backups and snapshots. Recovery would be simple. Storage migrations would virtually be a non-event.

In short, snapshots would become so integrated with backup that you might not even notice they were there at all, other than not loading production anymore. And that level of integration is how snapshots make backup better. In this era of “good enough” point solutions, it’s easy to overlook the hidden costs and challenges that come along with them, until you run into them at the worst possible time.