September 15, 2020 | 4min read
Apache Airflow Use Case—An Interview with DXC Technology
Amr Noureldin is a Solution Architect for DXC Technology, focusing on the DXC Robotic Drive, data-driven development platform. Amr has over 12 years of experience with working on both open-source technologies and commercial projects.
DXC Technology (NYSE: DXC) helps global companies run their mission critical systems and operations while modernizing IT, optimizing data architectures, and ensuring security and scalability across public, private and hybrid clouds. With decades of driving innovation, the world’s largest companies trust DXC to deploy our enterprise technology stack to deliver new levels of performance, competitiveness and customer experiences.
We are helping customers gain data-driven insights, automate operations, design effective products and services, and leverage complex software at scale. And we apply these capabilities to some of the most challenging issues and opportunities our customers face.
Our work on Robotic Drive is especially interesting as we accelerate the development, testing, and validation of ADAS/AD capabilities to support Level 2 plus – Level 5 autonomous functions. It’s the largest known exabyte-scale development solution, leveraging industry proven on-premise and cloud infrastructure, methodologies, tools and accelerators for a highly automated AD development process. We are working with some of the leading automakers to help accelerate autonomous function development.
For example, for our work with one of the leading automakers, DXC Autonomous Driving built the complete platform for autonomous vehicles data-driven development from scratch in just three months. This involved hardware and infrastructure, configuration, and customization based on the customer’s requirements. The platform is designed to store >200 PB of data, computational power > 100,000 cores—more than 200 GPUs are developing it. The platform is mainly based on OpenShift and Mapr.
The project requires massive data storage, there are numerous applications that need to interact with different data sets and formats, which means that we needed a stable orchestration engine. This brings us to Apache Airflow.
Airflow is an evolving technology that has been widely used in multiple large-scale projects in the past. What’s nice about Airflow is that you have a helpful community that can support you at all times. Since Airflow is an open-source technology, we could implement its customizations for the different use cases to achieve the desired scale and functionality. Furthermore, our automotive partners already had some experience with Airflow and felt more comfortable for us to use it as well.
When we started designing the architecture for Airflow deployment on the project we had two options: deploy it on bare metal together with some Hadoop services, or deploy it on top of Kubernetes. We chose the latter as the scalability of Airflow was crucial to support the orchestration of numerous daily tasks.
For most of our projects we can just deploy Airflow out of the box. Nevertheless, we had an urge to customize it for specific fine-grained elements. Autonomous driving is an innovative project that requires agility to hit roadblocks, identify possible solutions and quickly bring them into production. For example, extending the authentication of Airflow REST API to integrate with LDAP Active Directory, or integration with Hadoop cluster enabling Airflow to retrieve more meaningful log messages from running jobs.
Certainly there were some challenges, and that is how we got to so many innovative solutions. Without challenges there wouldn’t be innovation.
The scalability of the scheduler was one thing we managed to overcome and solve successfully, another was the lack of high-availability of the Airflow deployment, especially with a scheduler component—when it’s not running properly things quickly fall apart. And this puts emphasis on the importance of a strong monitoring infrastructure in the background. That’s why we’re looking forward to Airflow 2.0.
Yes, we have multiple teams that are working with Airflow.
We organize regular knowledge-exchange sessions to support our people to learn and develop new skills. We get together to tackle those challenges collaboratively. When it comes to Airflow, the nice thing is that you can develop and deploy it easy and fast.
No, it was only our DXC Autonomous Driving team working on the implementation.
This is where Airflow has helped with the orchestration of different applications on one of the projects we were working on. This orchestration is allowing different users to easily identify what is happening at which stage and see what are the issues that need to be addressed.
When selecting a tool we are looking to have both a well-functioning tool and one that we can customize to be able to quickly address any bottlenecks. For example, we are running one of our projects at a very large scale and the fact that we can’t reach a particular scale level is something we are looking to resolve. Airflow is open source, so it gives us a chance to discuss and improve the solution through discussions with the community.
There are many use cases where Airflow can help. For example, on small projects it can help people to easily schedule and track the activities related to those projects regularly. It can also help larger projects with recursive or sequential tasks that need to be done on a large scale.
Senior Communication Specialist
Solution Architect, DXC Technology