The concept of the data fabric emerged in 2015 from NetApp. It was later redefined after three years as implementations matured. And as the rest of the data storage industry pushed their own data fabric solutions, the initial concept started to lose its original meaning.
While it is not an uncommon occurrence for emerging concepts to change during their formative development over time, the lack of clarity can create confusion for those in need of the technology. Here we’ll discuss how data fabrics are evolving – and how they can help distributed enterprises better manage their far-flung data operations.
See the Top 7 Data Management Trends to Watch in 2022
What is a Data Fabric?
In a 2018 talk by NetApp’s data fabric chief architect, Eiki Hrafnsson, he outlined the Data Fabric 1.0 vision as “essentially being able to move your data anywhere; whether it’s on-prem, the enterprise data center, or to the public cloud.”
In a theatrical and entertaining tech demo, NetApp engineers debuted this technology in 2015 by rapidly transferring 10GB of encrypted data between AWS and Azure cloud, all from a simple drag-and-drop interface.
This addressed a real change toward fluid data transfer between mediums, something like a storage network for the Big Data and cloud era. However, years later, this kind of performance is now generally expected, causing a shift in the development of the data fabric and what it could be used for.
According to Gartner, a data fabric is:
“ … a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of integrated and reusable data across all environments, including hybrid and multicloud platforms.”
Comparatively, IBM defines a data fabric as:
“ … an architectural approach to simplify data access in an organization to facilitate self-service data consumption. This architecture is agnostic to data environments, processes, utility, and geography, all while integrating end-to-end data-management capabilities. A data fabric automates data discovery, governance, and consumption, enabling enterprises to use data to maximize their value chain.”
While both definitions borrow from the original concept, the idea of what a data fabric is has become more complex in order to keep up with current data trends.
Also read: Enterprise Storage Trends to Watch in 2022
Data Fabric 2.0
NetApp reassessed their idea of the data fabric in the years following its debut, redefining the concept thusly: “The NetApp Data Fabric simplifies the integration and orchestration of data for applications and analytics in clouds, across clouds, and on-premises to accelerate digital transformation.”
In other words, the scope and functionality expanded to better integrate existing enterprise applications with data sources, making the programs agnostic to the source media.
NetApp claims this fabric architecture carries numerous benefits:
- It creates a better posture to resist vendor lock-in by liberating data and offering the freedom of choice between cloud providers or on-premises, switching at any time you like.
- It empowers data management, increases mobility by knocking down silos, facilitates cloud-based backup and recovery, and may also improve data governance, the company says.
- Data fabrics enhance data discovery by granting full-stack visibility with their suite of visualization tools.
Other companies such as Talend have their own data fabric analytical tools, many of which extend the fabric to both internal and external consumers and contributors through the use of APIs.
Data Fabric Challenges
Most companies today house their data in multiple locations and in a variety of formats; therefore, data fabrics can’t always have access to all data. Moreover, the distributed nature of the data often leads to poor data quality, which can skew data analysis when aggregated.
According to a study in the Harvard Business Review, a mere 3% of companies’ data adhere to the study’s standard of data quality. The study also found that nearly half of all newly-created records contain a critical error.
According to Talend, creating a unified data environment can alleviate these quality control issues by giving IT greater control and flexibility over the end product. Their tools, the company says, build better data stewardship, more effective data cleansing, and better compliance and integrity through data lineage tracing.
Data Fabrics and Data Management
Tools like data fabrics can make the job of data quality control easier, but if they’re wielded incorrectly, then the company may find itself spending more to make up for the issues with data or analyses.
How we interact with our data is only half the bigger picture. The other half is how we create it. Data tends to be created on the fly, and to serve a limited, time-sensitive purpose. A data fabric can help IT wrangle bad or outdated data more quickly, but ideally, we should also be mitigating these issues on the front end as data is created.
If you’re curious to see a demonstration of data fabric tools to see how you might leverage them in your company, check out this hour long talk from NetApp’s chief data fabric architect.
Read next: Top Big Data Storage Tools