Insight and analysis on the data center space from industry thought leaders.
The Opportunities and Challenges of Long-term Data and Large-Scale Analytics
The secret to realizing the opportunities of petabyte-scale storage, and to overcoming its challenges, is to evolve from storing all data in the same way to storing different kinds of data in different ways, writes Jeff Flowers of SageCloud.
April 11, 2014
Jeff Flowers is CEO of SageCloud.
Data is growing at an unprecedented rate – some experts estimate by 60 to 70 percent per year. While companies look to turn big data into valuable insight, they are struggling with the best ways to effectively store, manage and use the data without increasing IT spend.
The Opportunity
While petabyte-scale data sets can be difficult to manage, these massive data sets also contain invaluable strategic information. This data – if readily accessible – can give businesses unique insight into patterns and trends not otherwise identifiable. For example, financial managers who seek to understand shifts in trading dynamics, or government policymakers who wish to better adapt to changing economic conditions, can benefit from the analysis of exceptionally large information sets, particularly when viewed over long periods of time.
The Challenges
The challenges for those who manage large data sets can be categorized into three areas: limited budgets; the demands of big data analytics and the emergence of new data types.
Limited Budgets
First, most IT professionals have flat budgets but are seeing a 60 to 70 percent growth in data per year. The disconnect means that storage requirements are using an ever-increasing portion of budget at the expense of other needs.
Data Analysis Demands
Alongside this growth, users of big data solutions are demanding faster access to more information over longer periods, challenging IT professionals to deliver information quickly at the very time budgets are under increasing scrutiny.
New Data Sources
Finally, the number of devices that collect and produce data is also increasing, with much of the new data in unstructured form including photos, video and audio. Unstructured data can be prohibitively expensive to store on traditional file systems. As a result, some organizations are choosing not to store and use the new data in order to avoid the cost – at the expense of leveraging the valuable analytic information it contains.
Adding to storage costs is the overhead of vendor lock-in, a dynamic in which buyers are wedded to the proprietary storage systems and high-margin pricing of a single vendor. This is a significant pain point for many buyers, and one leading to a shift towards an open storage architecture in which open-standards hardware and storage management software are purchased separately at competitive prices.
The Solution: Multi-tier, Object Storage and Open Standards
Realizing Cost Savings through a Multi-tier Strategy
The secret to realizing the opportunities of petabyte-scale storage, and to overcoming its challenges, is to evolve from storing all data in the same way to storing different kinds of data in different ways.
Although a multi-tiered strategy is already commonplace, until recently the model has been missing a true “cold storage” tier for retaining high volumes of data at the lowest possible TCO (total cost of ownership).
The emergence of disk-based, cold storage solutions changes the economics of the storage model dramatically. Companies using these systems can store data at one-tenth the cost of I/O-intensive storage, with the tradeoff of data access in 30 seconds instead of milliseconds. Still, with more than 70 percent of data accessed infrequently anyway, cold data storage is emerging as the most practical way for enterprises to manage data growth and take advantage of large-scale analytics.
Leveraging Object-based Storage
The solution to the rise of unstructured data is object-based storage, which allows businesses to store large volumes of structured and unstructured data on a common platform. Unlike file storage, object-based storage is infinitely scalable at a fraction of the cost.
Separation of Software and Hardware
While software is a big part of the solution to managing very large data sets, the role of open-standards hardware is also critical. Organizations like the Open Compute Project are driving new standards resulting in the availability of affordable, commodity hardware. The shift enables enterprises to purchase best-of-breed software and hardware separately for the first time – removing the pain point of vendor lock-in that is common with legacy storage systems.
Realizing the Opportunities of Long-term Data
With new technologies making the storage, access, and analysis of big data more achievable every year, enterprises are able to glean value from huge sets of seemingly unrelated data. By embracing the new technologies, these growing data sets can provide new insight into day-to day-business activities, revitalize the way that we manage our businesses and improve the bottom line.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
About the Author
You May Also Like