Apache Airflow — An All-in-one Scheduler For Seamless Workflows
As part of the open source community, Polidea team developed and implemented an extensive set of operators for the Airflow system to work with different cloud service providers.
Every job demands all sorts of tools. Sometimes it gets to be the work in itself to click through multiple tabs and make sure each box is checked or to simply oversee the to-do list. The IT world deals with this abundance in its own specific way… by building another tool. When dealing with Big Data and a plethora of tasks for different servers, it’s best to operate with one dependable platform combining all the workflows. Apache Airflow is one of the most highly-recommended schedulers that executes one task after another in a precise way, set up as a code. As part of the Apache Open Source software projects, it is developed by the whole community of skilled software engineers which makes it more bullet-proof than any other. Polidea’s team proudly contributed to connecting Airflow to specific services of cloud providers which now makes the platform even more versatile.
There was a need to develop a quite extensive set of operators for the Airflow system to work with different cloud service providers. Operators, being the entries defining the specific routine, determine what exactly needs to be done in its whole complexity. Sets of operators are put into directed acyclic graphs (DAGs) to create a certain workflow that induces given actions smoothly even if they are to be performed on separate machines.
The implementation had to be done as a high-quality independent engineering-driven process. On top of that, the work demanded the approval of the Apache community.
Scope of the project
- operators development
- automated testing
- code quality reviews
- engaging with the community
JarekPrincipal Software Engineer
Working on an Open Source project such as Apache Airflow is very demanding but also equally rewarding when you realize how many businesses use it every day and how fantastic people you interact with are.
Polidea’s team, as a self-reliant group of experienced engineers, became contributors to the Open Source Airflow Project and provided 70+ operators for the Airflow DAGs. All in line with the highest standards of open-source projects.
Along the way, there was a number of side achievements accomplished, that resulted from the main objective. First of all, Polidea’s team took part in many Apache community discussions offering resourceful experience and expertise. These inputs resulted in high-level strategies for the evolution of the whole project.
What is more, our engineers localized and fixed bugs in the core Airflow which led to the overall improvement of the whole system. These observations combined with the groundwork of operators development allowed the team to coin a number of improvement proposals to the Airflow itself.
Detailed commentary for all elements is the core of any collaborative work. General documentation for the Airflow platform also needed a little touch, which was happily provided by Polidea’s experts. As Airflow committers, Polidea also takes part in the Season of Docs initiative. The current and future members of the community involved in this open-source initiative will certainly benefit from precise descriptions.
And last but not least, Polidea’s team created a toolset in order to improve the effectiveness of developing more operators which will make any future endeavors much easier.
As the operators were implemented successfully, it became effortless for the end client to work with the different external services. The process of building multidimensional workflows of data turned out to be faster than ever before. It became more tangible to actually use the cloud services combined in one integrated platform.
Polidea’s experts have attracted the attention of the community through a significant contribution to product improvement and became recognizable as reliable assets. There are four contributors amongst Polidea’s team, two of whom were invited by the community to become official Apache Airflow committers, for their knowledge and engagement occurred to be outstanding. Building authority on actual knowledge and shared know-how allowed them to convince the crowd to changes important for the customer.
The whole set of useful and solid operators released to the market, along with the number of labor-saving tools to build more of them, allowed to reach higher quality and effectiveness in creating seamless pipelines of tasks on multiple channels.