Airbyte has added a connector development kit to an open source data integration platform that cuts the time to add a data source down to less than two hours.
Previously, IT teams would have had to employ a REST application programming interface (API) that would have required two days of effort, says Airbyte CEO Michel Tricot. “Our next goal is to get it down to 15 minutes,” he adds.
Ultimately, the goal is to democratize data engineering with an open source platform that Airbyte will make additional commercial capabilities available for next year. The company recently raised $5.2 million in seed funding to help achieve that goal.
The Airbyte Connector Development Kit (CDK) also standardizes the way connectors are built, maintained, and scaled for an extract, transform and load (ETL) tool that is already employed by more than 600 organizations, including Safegraph, Dribbble, Mercato, GraniteRock, Agridigital, and Cart.com.
Also read: Reliance on Open Source Software Set to Increase
The Builder Community
Organizations have embraced Airbyte because they don’t have to wait for a provider of commercial ETL tools to create connectors for various data sources. Instead, the community collaboratively builds and supports the connectors it deems most critical. The community has, thus far, certified 50 connectors, each of which is encapsulated in a Docker container that makes it possible to deploy them anywhere.
Those connectors make it simpler to move and consolidate data from different sources that range from data warehouses, data lakes, or databases running in on-premises IT environments or in public clouds. Tools such as Airbyte are used by data engineers to set up data pipelines between these platforms as part of a digital business transformation initiative that might, for example, require data to be regularly shared across multiple platforms. Those requirements have led some organizations to define a set of best data operations (DataOps) best practices to automate the movement of data at scale across a highly distributed cloud environment using one of several data fabric platforms that have recently emerged.
Also read: Rush to AI Exposes Need for More Robust DataOps Processes
Developing DataOps
ETL tools have been around in many forms for decades. Historically, organizations employed these tools to programmatically move data into a backup and recovery platform. With the rise of the cloud, IT teams began using these same tools to move data into a public cloud. Since then, DataOps processes have started to emerge that has created increased demand for data engineers that have the engineering skills required to build data pipelines. It’s not clear, however, to what degree the construction of data pipelines might one day be automated. Tools for building and managing data pipelines using graphical tools that are accessible to the average IT administrator have started to emerge.
In the meantime, the number of platforms that have varied types of data that need to be integrated with other platforms continues to exponentially expand. In the short term, connecting all those data sources will require a data engineer, which is why the function for now remains one of the hottest job markets in the IT sector. The degree to which that will continue to be the case once machine learning algorithms are employed more within various classes of data management platforms to automate processes remains to be seen.
Read next: AIOps Trends & Benefits for 2021