Data is the fuel that drives digital business processes, but most organizations today don’t have an efficient way of managing it across all the platforms on which they have deployed applications.
At its core a data fabric architecture loosely describes any platform that reduces the friction associated with access and sharing of data in a distributed network environment. As such, vendors that have historically positioned themselves as providers of everything from storage systems to data management platforms are all now claiming to varying degrees to provide data fabrics that span multiple computing platforms.
Data Fabrics and Digital Business Transformation
The level of urgency driving the need for data fabrics has increased for two primary reasons:
- Digital business processes generally span multiple applications. If the data residing in those applications conflicts, the digital business initiatives that depend on those applications will ultimately fail.
- Organizations of all sizes are trying to address that issue by building data lakes in the cloud that normalize the data they have in a common repository that can be accessed by multiple applications.
Those data lakes are also the foundation upon which organizations are training the artificial intelligence (AI) models that many of them are relying on to automate their digital processes.
The challenge IT teams face is that it’s unlikely there will be just one single data lake, notes Howard Dresner, founder and chief research officer for Dresner Advisory Services. Each business unit within an organization often launches its own data lake initiative.
As a consequence, IT organizations will need to employ some type of data fabric to move data not just from on-premises IT environments into data lakes residing in the cloud, but also between data lakes that will reside in multiple clouds.
Also read: Index Suggests Many Organizations Are Falling Behind Digital Business Curve
Data Fabrics in Play
The latest generation of data fabrics are taking advantage of Kubernetes clusters that can run anywhere to make it simpler to deploy them across a heterogeneous environment. Hewlett-Packard Enterprise (HPE), for example, has launched a HPE Ezmeral data fabric that is based on technologies it gained by acquiring MapR Technologies in 2019. That data fabric creates a global namespace that is accessible via application programming interfaces (APIs) that can be accessed by containerized and non-containerized applications. A data mirroring capability makes it possible to move data within or between clusters using bi-directional, multi-master tables or event stream replication.
HPE recently announced it is making its data fabric available as a standalone offering in addition to being an integrated component of the HPE Ezmeral Container Platform and HPE Ezmeral Machine Learning Operations (MLOps). The goal is to make it easier for organizations to consistently employ a data fabric from the edge to the cloud, says Anil Gadre, vice president of Ezmeral for HPE. “It’s about reducing the friction,” he adds.
In a similar vein, Diamanti makes available an integrated data plane and control plane dubbed Spektra and Ultima, respectively. The company recently announced that Spektra is now available on the Google Cloud Platform (GCP) in addition to Amazon Web Services (AWS) and Microsoft Azure. Spectra is also now available for on-premises IT environments that can deploy it on servers from Lenovo, Dell Technologies, Hewlett-Packard Enterprise (HPE), or x86 infrastructure provided by Diamanti.
That approach makes it simpler to manage data at the point where it is being created as well as ultimately stored, says Brian Waldon, vice president of product for Diamanti. That’s a critical issue because applications will need to access data residing in multiple platforms.
Meanwhile, earlier this month, NetApp launched an entire managed service designed on a Kubernetes-based data fabric. The NetApp Astra managed service protects, recovers, and moves application workloads deployed on Kubernetes without requiring a download, installation, management or upgrade of any software.
Previously known as Project Astra, NetApp’s managed service handled backup, cloning, disaster recovery, data life cycle operations, data optimization, compliance and security on behalf of an organization. They can even restore an application to a different Kubernetes cluster in the same or a different region, or move entire applications, along with their data, from one Kubernetes cluster to another.
The goal is to offload data management tasks that previously would have required an internal IT team to manage, says Eric Han, vice president of product management for public cloud services at NetApp. It’s not precisely clear how much data will be moving between platforms, but it is one of several data management capabilities that IT organizations will require as the enterprise becomes more extended. “It will become more federated,” adds Han.
Also read: Rush to AI Exposes Need for More Robust DataOps Processes
Data Fabrics and DataOps
Ultimately, there will soon be no shortage of data fabric options at a time when the amount of data being generated is not only increasing exponentially but so, too, are the types. Unstructured data types such as video, for example, now increasingly permeate enterprise. The challenge for IT teams is finding a way to unify the management of massive amounts of highly distributed data that, with each passing day, wind up being created and then stored in more silos than ever.
Of course, picking a data fabric is just the beginning of a major effort to reconstruct how data is managed across the extended enterprise. Organizations will also need to modernize their internal data management processes. However, as is the case with any IT initiative, the best way forward is to lay the strongest data fabric foundation possible.
Read next: Best Data Visualization Tools & Software for 2021