The Apache Software Foundation (ASF) announced that Apache Gobblin, the open-source distributed Big Data integration framework, has reached top-level project status. According to the foundation, achieving top-level level means that the project graduated from the Apache Incubator and has demonstrated that its community and products have been well-governed under the ASF’s meritocratic process and principles.
The project is dedicated to both streaming and batch data ecosystems. It can integrate hundreds of terabytes and thousands of datasets per day by making it easier to ingest, replicate, and organize lifecycle management processes across different types of environments. The project also simplifies data lake creation by supporting simple transformations and enabling organization within the lake through compaction, partitioning, and deduplication.
Users can also benefit from the life cycle and compliance management of data within the lake that includes data retention and fine-grain data deletions, the ASF explained in a blog post. Apache Gobblin supports deployment models from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth,” said Shirshanka Das.
The founder and CTO at Acryl Data, a member of the Apache Gobblin Project Management Committee. Gobblin originated at LinkedIn 2014, was open-sourced in 2015, and entered the Apache Incubator in 2017. Apache Gobblin software is released under the Apache License v2.0 and overseen by a self-selected team of active contributors.