November 19, 2020   |   4min read

Why Apache Airflow? Interview with Asseco Business Solutions

Today we talk to Konrad Łyda, Machine Learning Engineer & Team Lead at Asseco Business Solutions. He is responsible for delivering deep learning and machine learning models to production in high quality. He’s always staying on top of the newest discoveries in the machine learning world and trying to implement them in his work to solve business problems successfully.

Please describe shortly what your company does.

Asseco Business Solutions is a company which has been developing and deploying business management software for more than 20 years. We develop and implement our own products across many business sectors. I work in the Computer Vision department, responsible for delivering solutions based on machine learning models, which are then used by other teams developing mobile or web applications.

In this interview, I will speak from the perspective of my department—Image Recognition and Machine Learning Department, as it’s my “mini company” inside the larger company :)

Prior to implementing Airflow, have you been using any other orchestration tool?

Airflow was the first solution that we introduced to author, schedule and monitor workflows. We researched the topic and wrote down all pros and cons of Airflow and similar tools. At that point, we had found Airflow as the best suit for our needs.

Why have you decided to implement Apache Airflow? What was the problem you were facing and tried to solve?

In our daily work, we prepare many machine learning models based on data. We found out that some tasks are done manually but can be easily replaced with code. However, we didn’t want to end up with a giant house of cards built from random scripts. We wanted a solution that will help us organize these pieces of code into workflows.

Because we are all programmers, we looked for a workflow solution that is based on Python. At the very beginning, we rejected solutions where workflows are built based on some XML / YAML files or by “clicking”. That is why we dropped Oozie, NiFi and Stack Storm. Only Luigi and Airflow were left on the battlefield.

Airflow is a well documented, open-source solution with a large community, many documented use cases and out-of-the-box integrations available. These factors convinced us that it’s a solution for us.

Does your company use any other workflow solution right now and if so, why?

Nope, apart from Airflow my department doesn’t use any other solution of this type.

What type of deployment of Airflow have you used, and why?

We wanted to start simple. Therefore, we used the already available on GitHub Dockerfile of Apache Airflow for Docker’s automated build. We have changed it a little and deployed on a self-hosted server. Two docker containers (one with Postgres database, one with web server and a scheduler) using Local Executor have been doing the job for a year and a half now. However, recently we saw some performance dropdown on the current server and decided that moving to our self-hosted Kubernetes cluster would be the best solution for us. At first we thought about moving to KubeFlow, but because of the continued development of Airflow, it was possible to deploy Airflow in Kubernetes, which has dedicated operators. Therefore, taking into account that we had workflows already built in an “Airflow spirit”, we decided to do some tweaks to our pipelines and stick to Apache Airflow.

Do you have any internal support for Airflow users in your organization?

No, our team of machine learning engineers is also responsible for keeping our pipelines working and healthy. As the solution itself is well documented, it is not a problem to manage it by ourselves.

Two hands pointing at the latop screen.

What was the challenge in the integration process? Have you faced any technical difficulties?

I think there were no such issues. Of course, we had to develop some operators and sensors tailored for our needs, but the fundamentals offered by Airflow are very stable to build on top of them.

How did you do it—did you seek any help, assistance from a development studio?

Most of the time, we just used documentation and Stack Overflow. Sometimes, we were looking directly in Airflow source code because some features or configuration tricks weren’t mentioned in the docs, but all in all we managed to develop the whole solution by ourselves.

What effects has the Airflow implementation had on your business?

It’s been helping us to train and deploy machine learning models fast directly to production. We’ve been saving a lot of time using automated pipelines. Even though sometimes we need to tweak some parts of the workflow, the overall time saved is impressive.

Of course, there were situations when we found out that our pipeline could be improved a lot because we’d missed some built-in feature or we had simply developed it “our way”. It happens in all software development projects all the time.

To whom would you recommend Airflow?

First, you should know Python to build pipelines in Airflow. Then, if you see that in the company some tasks are done manually every day unnecessarily (and many of those tasks consist of moving data from one place to another while performing standard and predictable processing steps) Airflow is a tool for you.

In our case of machine learning pipelines, where there is a continuous loop of data prepared (- train - validate - store - repeat), Airflow has been a great choice for automating our daily routine.

Ula Rydiger

Content Marketing Manager

Konrad Łyda

Machine Learning Engineer & Team Lead, Asseco Business Solutions

Did you enjoy the read?

If you have any questions, don’t hesitate to ask!