Insight and analysis on the data center space from industry thought leaders.
Big Data: What It Means for Data Center Infrastructure
Big Data, with all its computing and storage needs, is driving the development of storage hardware, network infrastructure and new ways of handling ever-increasing computing needs. The most important infrastructure aspect of Big Data analytics is storage, writes Krishna Kallakuri of DataFactZ.
September 5, 2013
Krishna Kallakuri is a founding partner, owner and vice president of DataFactZ. He is responsible for executing strategic planning, improving operational effectiveness, and leading strategic initiatives for the company.
Krishna-Kallakuri
KRISHNA KALLAKURIDataFactZ
Today, we collect and store data from a myriad of sources such as Internet transactions, social media activity, mobile devices and automated sensors to name a few. Software always paves the path for new and improved hardware. In this case, Big Data, with all its computing and storage needs, is driving the development of storage hardware, network infrastructure and new ways of handling ever-increasing computing needs. The most important infrastructure aspect of Big Data analytics is storage.
Capacity
Data over the size of a petabyte is considered Big Data. The amount of data increases rapidly, thus the storage must be highly scalable as well as flexible so the entire system doesn’t need to be brought down to increase storage. Big data translates into an enormous amount of metadata, so a traditional file system cannot support it. In order to reduce scalability, object oriented file systems should be leveraged.
Latency
Big Data analytics involves social media tracking and transactions, which are leveraged for tactical decision making in real-time. Thus, Big Data storage cannot appear latent or it risks becoming stale data. Some applications might require real-time data for real-time decision making. Storage systems must be able to scale-out without sacrificing performance, which can be achieved by implementing a flash based storage system.
Access
Since Big Data analytics is used across multiple platforms and host systems, there is a greater need to cross-reference data and tie it all together in order to give the big picture. Storage must be able to handle data from various source systems at the same time.
Security
As a result of cross-referencing data at a new level to yield a bigger picture, new considerations for data level security might be required over existing IT scenarios. Storage should be able to handle these kinds of data level security requirements, without sacrificing scalability or latency.
Cost
Big Data also translates into big-prices.The most expensive component of Big Data analytics is storage.Certain techniques like data de-duplication, using tape for backup, data redundancy and building custom hardware, instead of using any market available storage appliances, can significantly bring down costs.
Flexibility
Big Data typically incorporates a Business Intelligence application, which requires data integration and migration. However, given the scale of Big Data, the storage system needs to be fixed without any need of data migration needs and simultaneously flexible enough to accommodate different types and sources of data, again without sacrificing performance or latency. Care should be taken to consider all the possible current and future use-cases and scenarios while planning and designing the storage system.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
About the Author
You May Also Like