Data management historically within most enterprise IT environments has tended to be sloppy. Multiple copies of the same data reside in both production applications and various application development projects. Customer data has been rendered in multiple different ways and nobody is quite sure if any of the data has been successfully backed up any time recently.
Artificial intelligence (AI) is about to change all that.
There’s a symbiotic relationship between the need for best practices for data management and a desire to build AI applications. An AI application is only as good as accurate as the data used train it. Machine and deep learning algorithms not only need to access reliable sources of data, it turns out the more data they access, the more proficient they become at whatever task they are being trained to accomplish. To achieve that goal, organizations have been investing heavily in, for example, data scientists. The problem is that too many data scientists, which generally make at least six figures or more, are spending most of their time manually addressing data management issues. Solving that problem requires organizations to invest in new data management software and storage systems that, as it happens, take advantage of machine learning algorithms to both cleanse data and make sure that data is constantly available.
A recent survey of 100 business executives conducted by Corinium Digital, an online community focused on analytics, and commissioned by Paxata, Accenture Applied Intelligence and Microsoft, highlights the extend of the challenge. A total of 70 percent of respondents say their IT/data management teams struggle to support analytics applications. Well over a third (38 percent) say data quality is a problem. More than half (54 percent) say they’re only somewhat confident in the quality of the results, while 15 percent described their confidence as being low.
In effect, machine learning algorithms will need to be first narrowly applied to data management to later facilitate the usage of machine and deep learning algorithms eventually across every enterprise application imaginable.
Recognizing this emerging requirement, storage system vendors are racing to apply algorithms up and down their portfolios to automate the management of massive amounts of data that will need to be stored both on-premises and in the cloud. For example, Commvault, a provider of storage systems, has created an alliance with LucidWorks, a provider of search and discover applications infused with algorithms, under which Commvault will apply algorithms curated by LucidWorks to optimize backup and recovery. The two companies are also working together to apply algorithms to identify and apply policies to sensitive data as well as automate workflows.
Patrick McGrath, director of solutions marketing for Commvault, says many of the so-called data lake projects that organizations have launched are being driven in anticipation of being able to deploy advanced analytics applications against massive amounts of data. In fact, it’s not uncommon to pull “dark data” that has been archived into those data lakes to apply algorithms against historical data sets.
“Academia has been doing that for years,” says McGrath. “Now enterprises are discovering they need to do the same thing.”
HPE, meanwhile, went out and acquired an entire company to gain access to algorithms optimized for storage management. Nimble Storage, which HPE acquired last year, was an early pioneer in the usage of algorithms within the context of a predictive analytics application designed specifically for storage systems. Now HPE is applying that predictive analytics capability across HPE 3PAR storage systems as well.
Vish Mulchand, senior director of product management and marketing for HPE Storage, says machine learning algorithms will have a huge impact on storage reliability.
“One survey suggests 54 percent of all IT outages are related to storage,” says Mulchand.
Such outages in the context of a mission-critical AI application operating at unimagined levels of scale could be even more problematic than they are today.
Machine learning algorithms are also being applied to ensure data discovery and quality tools. Syncsort, a provider of data management software, last year acquired Trillium Software in part to gain access to algorithms that make it simpler to identify patterns in data, says Keith Kohl, vice president of product management for Syncsort.
As organizations begin to experiment with advanced data science projects, many of them are beginning to rethink the relationships between data scientists, developers, architects and storage administrators, notes Kohl. Known as DataOps, many organizations are trying to apply many of the same DevOps processes used to make application developers more agile to data management.
“There’s starting to be a much bigger governance conversation,” says Kohl.
Another IT vendor tackling that same issue is Rubrik, which is making a case of unifying all aspects of data management spanning everything from backup and recovery to security incident management. A major part of that effort involves applying advanced algorithms against metadata to automate data management functions, says Soham Mazumdar, chief architect for Rubrik.
“Machine learning is going to play a role at every level,” says Mazumdar.
The paradox many organizations will soon find themselves in is that despite whatever investments they are making in developing AI applications today, deploying those applications in a production environment will most certainly require them to adopt data and storage management technologies that make extensive use of AI to manage the massive amounts of data that AI models depend on. Initially, most AI models will be deployed in the cloud. But even cloud service providers such as Google are now making it apparent that AI models will be distributed all the way out to the network edge. As that process occurs, significantly more sophisticated approaches to managing the data pipelines on which those AI models depend will be required.
It’ll be interesting to see how data management evolves to meet that challenge. Very few IT organizations today would win the equivalent of a “Good Housekeeping Seal of Approval” in terms of how they manage data. But it’s also starting to become apparent that the organizations that don’t get their data management houses in order might not be around to see how AI applications have transformed the world as we once knew it.