As enterprise IT organizations invest more in artificial intelligence (AI) projects to advance digital business transformation initiatives many of them are discovering the way they manage data needs to fundamentally change.
Historically, IT teams have tended to manage data as an extension of the application employed to create it. That approach naturally led to large numbers of data silos. Worse yet, much of the data is often conflicting because different applications either might have rendered a company name in a different way or simply may not have been updated with the most recent transaction data.
The AI models that organizations create, however, require massive amounts of accurate data to be trained properly. That requirement is driving organizations to create huge data lakes on cloud computing platforms that can be accessed more easily by multiple AI models. As more data is added to the data lake the machine learning algorithms that make up the AI model are continuously updated.
The challenge many organizations face, however, is the processes and systems they have in place for managing data are somewhat antiquated. IT teams are starting to embrace DataOps as a methodology for automating the management of data across multiple platforms based on a set of best practices that are consistently implemented across an organization. Similar in concept to the best DevOps practices organizations have embraced to streamline application development, the goal is to reduce as much of the friction that plagues data management today as quickly as possible.
New Management Platforms
Not surprisingly, providers of storage systems are now rolling new platforms that make it simpler to automatically move data between on-premises IT environments and the data lakes that reside on a public cloud.
Lenovo, for example, has added a Lenovo ThinkSystem DM5100F offering that includes support for object storage based on the S3 application programming interface (API). Originally developed by Amazon Web Services, the S3 API has become a de facto standard for accessing cloud storage services based on object storage systems. IT teams can still store data as files or as block storage locally, but the support for S3 will reduce friction when moving data into and out of a public cloud, says Kamran Amini, vice president and general manager, server, storage and software-defined infrastructure for the Lenovo Data Center Group.
“Data management is a critical part as we look at how we can serve enterprise customers and be able to lead in insight for AI and analytics around that data,” says Amini.
Of course, there’s always multiple ways to manage data. Nutanix, for example, has made available software that also includes support for the S3 API. Nutanix Objects is designed to make it simpler to unify the management of compute and storage at scale across a hybrid cloud computing environment made up of systems running on premises and in the cloud, says Greg Smith, vice president of product marketing for Nutanix.
“A lot more organizations are now trying to manage petabytes of data,” says Smith.
AI’s ROI
Regardless of approach, if data is the so-called new oil that fuels digital business transformation the way that oil is turned into fuel needs to change. AI couples with other forms of advanced analytics are only reliable as the data made accessible. If more rigor isn’t applied to how data is managed, the return on investment in AI will be suboptimal at best. It could in some cases even turn out to be catastrophic because the only thing worse than being wrong is to be wrong at scale.
In fact, with so much now riding on how data is managed in the enterprise, the need to modernize data management has never been more pressing.