Insight and analysis on the data center space from industry thought leaders.

Effective AI Requires A Healthy Diet of Intelligent Data

Most data lakes will be useless by 2018, because they are filled with raw data few people have the skills to use, according to one estimate.

4 Min Read
DataCenterKnowledge logo in a gray background | DataCenterKnowledge

Amit.jpg

Amit Walia is President of Informatica.

In the current technology landscape, nothing elicits quite as much curiosity and excitement as artificial intelligence (AI). And we are only beginning to see the potential benefits of AI applications within the enterprise.

The growth of AI in the enterprise, however, has been hampered because data scientists too often have limited access to the relevant data they need to build effective AI models. These data specialists are frequently forced rely solely on a few known sources, like existing data warehouses, rather than being able to tap into all the real-time, real-life data they need. In addition, many companies have great difficulty efficiently and affordably determining the business context and quality of massive amounts of data instantly. Given these difficulties, it’s easy to understand some of the historical barriers to AI acceleration and adoption.

At the end of the day, data only becomes useful for AI—or for any other purpose—when you understand it. Specifically, this means understanding its context and relevance. Only then can you use it confidently and securely to train AI models. The only way to achieve this is with a foundation of “intelligent data.”

Over the years, we’ve moved beyond the collection and aggregation of data to drive specific business applications (data 1.0), and organizations have been able to create well-defined processes that allow anyone to access data as its volume, variety and velocity continue to explode (data 2.0). But this simply isn’t enough. We’ve now reached a point where intelligent data is needed to truly power enterprise-wide transformation (data 3.0).

As an example, consider the challenges a company would face trying to redefine its traditional relationship with its customer base. Say the goal is to sell a disposable product by subscription rather than over the counter. Guiding such a disruptive change requires input from a multitude of data sources (databases, data warehouses, applications, big data systems, IoT, social media and more); a variety of data types (structured, semi-structured and unstructured) and a variety of locations (on-premises, cloud, hybrid, and big data).

The data lake is becoming the repository of choice for the vast collection of disparate data required for transformative efforts like this. But without intelligent data, these lakes are of little value. Gartner estimates that, through 2018, a shocking 90 percent of data lakes will be useless because they are filled with raw data that few individuals have the skills to use.

In contrast, with intelligent data, data scientists can conduct a Google-like search on words like “customer” and instantly discover all the potential sources of relevant data. Intelligent data saves an enormous amount of valuable time that data scientists would otherwise have to spend collecting, assembling and refining the data they need for their models. It also delivers the most reliable results.

So how do you ensure that your data is truly intelligent? By building an end-to-end data management platform that itself uses machine learning and AI capabilities, driven by extensive metadata to enhance the overall productivity of the platform. Metadata is the key that unlocks the value of data.

There are four distinct metadata categories to look at if you want to ensure that you’re delivering comprehensive, relevant and accurate data to implement AI:

1. Technical metadata – includes database tables and column information as well as statistical information about the quality of the data.

2. Business metadata – defines the business context of the data as well as the business processes in which it participates.

3. Operational metadata – information about software systems and process execution, which, for example, will indicate data freshness.

4. Usage metadata – information about user activity including data sets accessed, ratings and comments.

AI and machine learning applied on this collection of metadata not only help identify and recommend the right data. That data can also be automatically processed—without human intervention—to render it suitable for use in enterprise AI projects.

Digital Transformation is forcing organizations to look at data differently; it’s a matter of becoming “the prey or the predator.” Today, there’s real-time, always-available access to data and tools that enable rapid analysis. This has propelled AI and machine learning and allowed the transition to a data-first approach. The AI renaissance is flourishing because of digitization, data explosion, and the transformative impact that AI has on the enterprise.

Obviously, there are countless data inputs that may shape the decisions of an AI application, so organizations need to sort through what is relevant and impactful and what is just noise. Before your organization adopts an AI driven approach to data management, consider the following questions:

•         What do you want to achieve from AI enabled technologies?

•         Do you have the right strategy around data to help drive AI driven decisions?

•         Do you have the right skill sets?

Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating.

 

 

 

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like