Insight and analysis on the data center space from industry thought leaders.
How Active Archives Address AI’s Growing Energy and Storage Demands
Active archiving provides an efficient solution to manage AI’s data demands, balancing storage access, energy use, and cost in data centers.
The explosive growth of AI has created the need for new approaches to energy usage, data management, and information aggregation. Active archiving can help solve many of these challenges, enabling organizations to exploit the full power of large AI datasets.
AI applications thrive with access to as much data as possible. However, today’s solutions to data management and storage have led to data centers being overwhelmed with expensive, energy-intensive high-performance networking and storage hardware.
As AI deployments continue, it’s clear that this game-changing technology will continue to consume vast amounts of energy. A single query into a large language model (LLM) such as ChatGPT generates one hundred times more carbon than a Google search.
Additionally, LLMs require training, which can consume up to 10 GWh for a single model. It is not just Google, Azure, and AWS that create LLMs. Many companies, governments, and organizations are working on their own models.
Often enough, most of this data may not be frequently accessed after the first few weeks. However, some historical data needs to be more easily and readily accessible. It makes sense to arrange multiple tiers of storage based on access frequency, latency, and cost rather than retain it all on expensive, energy-intensive primary storage. Such architectures must support data management issues that naturally crop up when diverse service levels are required.
Not Every AI Data Set Requires High Performance Storage
Active archive intelligent data management software allows data to be stored in numerous locations and spread across multiple storage devices and tiers while keeping that data readily accessible whenever needed to support user needs, including AI workflows.
There are quite a few AI data sets that may be considered “cold” as they are infrequently accessed or used compared to active data sets that are regularly utilized and updated as part of ongoing AI workflows. Among these cold data sets may be historical data that is no longer being used or trained on; long-term compliance data to meet regulatory or legal requirements; data used for experimental purposes or preliminary training; unused or rejected data; and synthetic data used for testing, benchmarking or research outside of the AI production workflow.
Efficient management of cold data within an active archive is essential to optimize storage and energy resources to ensure that even data with merely potential future value can be maintained cost-effectively for indefinite periods of time.
The Impact of AI on Archives
Archives were once considered repositories of data that would only be accessed occasionally, if at all. The advent of modern AI has changed the equation. Almost all enterprise data could be valuable if made available to an AI engine. Therefore, many enterprises are turning to archiving to gather organizational data in one place and make it available for AI and GenAI tools to access.
Massive data archives can be stored in an active archive at a cost-efficient price and at very low energy consumption levels, all while keeping that data readily available on the network. Decades of archived data can then be analyzed as part of an LLM or other machine learning or deep learning algorithm.
Intelligent Data Management Software
An intelligent data management software layer is the foundation of an active archive. This software layer plays a vital role in automatically moving data according to user-defined policies to where it belongs for cost, performance, and workload priorities.
High-value data that is often accessed can be retained in memory. Other data can reside on SSDs, lower tiers of disks, and within a tape- or cloud-based active archive. This allows AI applications to mine all that data without being subjected to delays due to content being stored offsite or having to be transferred to where AI can process it.
Maintaining Storage Sustainability
As a result of the AI boom, data centers are becoming larger, denser, and more power intensive; in fact, the industry already accounts for almost 2% of total electricity usage in the US. These trends are likely to continue as more graphics processing units (GPUs) are added to serve the needs of high-performance computing (HPC), GenAI and other demanding applications.
The storage of cold and infrequently accessed data within an active archive significantly affects both power usage and CO2e emissions. According to Brad Johns Consulting, in a study where 100 PBs of data are to be maintained over ten years, keeping 40% of that data on HDD systems while moving 60% to an automated data tape library system results in a 58% reduction in CO2e emissions while e-waste is reduced by 53%.
Read more of the latest data storage news
The demand for enterprise storage capacity will undoubtedly accelerate in the years ahead. Massive AI-fueled growth has highlighted the need for effective data management from the edge to the core data center and the cloud.
Efficient data management of huge quantities of data is at the heart of AI success. If organizations driving AI initiatives are to realize their potential for productive and beneficial outcomes, they must be able to process, analyze, correlate, and reach conclusions based on vast amounts of information. Once data volumes exceed a few PBs, an active archive can provide AI applications with the right mix of access, performance, energy efficiency, and affordability.
The infrastructure for AI must be laid on a foundation of well-planned data storage and workflows. Otherwise, poorly planned data management negatively impacts costs, data security, cyber resiliency, legal compliance, customer experiences, decision-making, energy consumption, and even brand reputation.
In this AI era, effective data management is a necessary part of the core competencies that organizations must achieve for effective digital transformation. And this is where the active archive solution benefits the modern AI-based enterprise.
About the Authors
You May Also Like