Insight and analysis on the data center space from industry thought leaders.
The End of Backup
As many IT professionals have discovered too late, backups are also unreliable, a situation made even worse by the fact that bad backups typically aren’t discovered until there’s a need to restore.
March 22, 2017
Andres Rodriguez is CEO of Nasuni.
No one loves their backup, and when I was CTO of the New York Times, I was no exception. Traditional data protection has only survived this long because the alternative is worse: losing data is one of those job ending—if not career ending—events in IT. Backup is like an insurance policy. It provides protection against an exceptional and unwanted event.
But like insurance policies, backups are expensive, and they don’t add any additional functionality. Your car doesn’t go any faster because you’re insured and your production system doesn’t run any better with backup. As many IT professionals have discovered too late, backups are also unreliable, a situation made even worse by the fact that bad backups typically aren’t discovered until there’s a need to restore. At that point, IT is really out of luck.
Fortunately, backup as we have known it is ending. Significant improvements in virtualization, synchronization and replication have converged to deliver production systems that incorporate point-in-time recovery and data protection as an integral component. These new data protection technologies are no longer engaged only when a system fails. Instead, they run constantly within live production systems.
With technology as old and entrenched as backup, it helps to identify its value, and then ask whether we can get the same result in a better way. Backup accomplishes two distinct and crucial jobs. First, it captures a point-in-time version or snapshot of a data set. Second, it writes a copy of that point-in-time version of the data to a different system, location or preferably both. Finally, when IT wants to restore, IT must find the right version and then copy the data back to a fresh production system. When backup works, it protects us against point-in-time corruptions such as accidentally deleted files, ransomware attacks or complete system meltdowns.
Ensuring that backups restore as advertised, in my experience, requires periodic testing and a meticulous duplication of failures without affecting production systems. Many IT teams lack the resources or the cycles to make sure their backups are really functioning, and backups can fail in ways that are hard to detect. These failures may have no impact on a production system until something goes wrong. And when the backup fails to restore, everything goes wrong.
Modern data protection relies on a technology trifecta: virtualization, synchronization and replication. Together, they address the critical flaws in traditional backups.
Virtualization separates live, changing data from stable versions of that data.
Synchronization efficiently moves the changes from one stable version to the next to a replication core.
Replication then spreads identical copies of those versions across target servers distributed across multiple geographic locations.
Essentially, this describes the modern “cloud,” but these technologies have been around and evolving for decades.
Because this approach merges live data with protected versions of the data, it dramatically increases the utility and the overall reliability of the system. Operators can slice into the version stream of the data in order to fork a DevTest instance of their databases. They can create multiple live instances of the production front-end to synchronize files across geographies and provide instant recoverability to previous versions of a file. Most importantly, because modern data protection does not need to copy the data as a separate process, it eliminates the risk of this process failing inadvertently and silently. In this way, data protection becomes an integrated component of a healthy production system.
Two distinct approaches to this new breed of data protection have emerged: SAN and NAS. The SAN world relies on block-level virtualization and is dominated by companies like Actifio, Delphix and Zerto. With blocks, the name of the game is speed. The workloads tend to be databases and VMs, and the production system’s performance cannot be handicapped in any significant way. The SAN vendors can create point-in-time images of volumes at the block level and copy them to other locations. Besides providing data protection and recovery functionality, these technologies make it much easier to develop, upgrade and test databases before launching them into production. And while this ability to clone block volumes is compelling for DevTest, in the world of files, it changes everything.
The NAS world relies on file-level virtualization to accomplish the same goal. Files in the enterprise depend on the ability to scale, so it’s critical to support file systems that can exceed the physical storage footprint of a device. Vendors have used caching and virtualization in order to scale file systems beyond the limitations of any one hardware device. NAS contenders capture file-level versions and synchronize them against cloud storage—a.k.a object-storage—backends. Essentially, these are powerful file replication backends running across multiple geographic locations, operating as a service backed by the likes of Amazon, Microsoft and Google. The benefit of these systems is not only their unlimited scale but the fact that the files are being protected automatically. The file-versions are synchronized into the cloud storage core. The cloud, in this sense, takes the place of the inexpensive but ultimately unreliable traditional media for storing backups. This shift to cloud is not only more reliable, but it can dramatically reduce RPO (restore point objectives) from hours to minutes.
And there is more. Unlike SAN volumes, NAS file volumes can be active in more than one location at the same time. Think file sync & share, but at the scale of the datacenter and branch offices. This approach is already being used by media, engineering and architecture firms to collaborate on large projects across geographies. It can also simplify disaster recovery operations from one site to another, as any active-passive configuration is a more restricted, more basic version of these active-active NAS deployments.
We are entering a new era for data protection. Simply put, backup is ending, and, based on my experience as a CTO and the many conversations I've had with our customers, that is a good thing. We are moving away from a process that makes data restores a failure mode scenario to one where data is protected continuously. In doing so, we are not only taking the risk out of data protection. We are launching exciting new capabilities that make organizations more productive.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Penton.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
About the Author
You May Also Like