October 22, 2020 | 13min read
How to Open Source? A Guide for New Contributors and Maintainers
It is quite easy to jump to statements like “it’s up to developers to learn how to open source.” But such remarks also mean that we need no schools, right? If we expect that everyone will learn on their own, then there’s no reason to have schools. Of course, everyone in the right mind knows that this is a fool’s statement. In this article, we share a few lessons that should help both maintainers and new contributors.
Some observations from having merged thousands of pull requests in the past few years:— Sindre Sorhus (@sindresorhus) May 21, 2019
- Almost no one writes a good pull request title
- More than half don't know about the `Fixes #112` syntax
- ~30% don't run tests locally before submitting a PR
- ~40% don't include docs/tests
We think that maintainers should support contributors and educate them about how open source works. However, as a new contributor, you should think twice before asking questions that spring to mind.
It’s good to remember that no one is obligated to answer your question in the blink of an eye. It’s even possible that your inquiry will not be answered for days. Not to mention the possibility of being forgotten in a continuous stream of other issues. So before reaching for help to the community, it’s good to show some engagement in solving the problem by:
- Check the documentation, it’s okay to not understand everything, but it’s also good to have some reference point for further discussion.
- Google it! Look if your question was raised before on Stack Overflow, Github issues, or in a blog post.
- Search community communication channels like mailing lists, slack, discord, etc.
If this doesn’t help, ask the question. However, now you should be able to reference some already existing information (docs, stack overflow posts, etc.) that makes your issue more prominent because it underlines that there could be a real lack of information or real problem.
Those simple prerequisites may not only solve your problem but will reduce the time maintainers/reviewers have to spend on your request. This means that they will handle more issues, and the whole community will move on faster. Do not be egoistic. Remember that we all thrive on making the world better, but the number of hours in a day is limited.
While being empathetic and happy to help, the community usually asks new community members to first get acquainted with a few rules usually described in CONTRIBUTING files.
Imagine that you have added an awesome new fix to the school bus. But you have never tested the bus. And now dozens of children are riding this bus. How would you feel if your fix broke the bus and resulted in a bus crash? Now, imagine that a bus is an open-source software, and instead of a few children, you have hundreds of developers. You will not kill the developers, but imagine what your untested change may do in production systems, i.e., self-driving cars.
Tests are not fun. Tests are often boring. But this is the only way to make sure that an open-source project works as expected. Contributors come and go. Code and tests stay. That allows safe continuity of development.
It is worth remembering that CI systems are supposed to catch regressions and not provide the fast feedback required for the development. That’s why you should consider testing your code locally before pushing it to a remote repository.
I think every developer had a moment of frustration when documentation of a tool was missing some crucial information, simple explanation, or an example. Having experienced this once, you should know why docs are essential. Unfortunately, we often forget to document our code. When it’s a closed, sourced project, then it’s only your problem or the problem of your successors. In the case of open source, it may be an issue for many users who will have no one to ask for advice.
On the other hand, every developer had a moment of pleasure when using a new tool that went so smoothly that we were positively surprised. That’s in many cases due to good documentation. So remembering those two feelings—write documentation. Both as comments and docstrings in code and as separate text files. The documentation-only changes are crucial for the project and are as important as code changes. It’s not unusual to become a project maintainer without writing a single line of code.
First of all, people may judge you on your contribution. Think about your contribution as creating a personal brand. It doesn’t have to be perfect and have a lot of value. But it must be neat. The so-called “clear code” is a good starting point. You don’t have to implement the fancy design patterns. Just try to keep everything clear, readable, and easy to understand. If you have doubts, contribute what you have and ask for the community’s opinion. The most important thing is your willingness to cooperate and adjust to the suggestions you get. We live with people, and writing code is nothing more than just a sophisticated way of communication.
That’s why it’s important to remember to write good PR and commit messages. Good communication increases understanding and will to review the PR. Another thing to do is reference existing, connected issues, PRs, or links that may help people get the context of your changes.
But in the end, every accepted contribution is your accomplishment. Every single typo you fix is a significant contribution. It’s something you should be proud of. OSS projects get better slowly, not in great gaps. And learning to fix minor things every time you face them will make you a better developer and real open-source citizen.
Schools give us a place where we feel safe to make errors (at least that’s an assumption). And in this process of failure, learning, repetition, we are gaining experience and knowledge. The same goes for the open-source communities. Project maintainers and contributors should strive to create a place where anyone can feel safe to make mistakes. That is a big statement. Not to mention its vagueness up to what it means to make a community “safe” or “welcoming.” In our opinion, the simplest definition of a welcoming community is an empathetic community where no one is going to be shouted at for asking questions. But that’s another big word—community. So what does it mean?
As project maintainers embedded in the community, we should encourage people and proactively offer them help. We should also encourage other community members to do so. A good chunk of everyone’s time in the community should be helping others to become the community members and be proactive in it.
Community members need to show people that they are very welcome to participate in the community and that they will not be left alone with some PR they commit to do. The first contact with the community for many users seems to be reporting a bug where often the reporter even has a solution ready, but they believe it is not in their power to change anything in the open source because there are ‘right people’ designated to do the job. While, in fact, they ARE the right people.
Maintainers have the burden of code reviews and pointing out where the code could be improved. To bring more contributors and not overload, they should have an important thing in mind—automation.
It is much more important in open-source projects than in commercial ones. In open-source projects, you have no opportunity to personally guide every single new contributor on what is essential, what are the rules and best practices. People come and go, and there is no benefit in explaining it all individually. But there must be rules to be followed because if people are contributing the code to your project, you have to be sure that the quality, standards, and best practices adopted by your community are followed.
Suppose there are only a handful of maintainers and lots of contributors. How do you handle the overhead of explaining what else needs to be done in a scalable way? Document it? Of course. You definitely should. But do not expect that all casual contributors will spend time trying to understand all the best practices. Many people prefer the “let’s try and see” approach, and this is perfectly fine. The documentation is great if you have more time and you expect to be contributing more. However, you should never expect your developer documentation to be thoroughly read and understood by those first-time contributors, especially if your project is mature and has many best practices.
The best answer to that is—automation. Automation of tests, which everyone nowadays understands as an absolute necessity, I hope. Automation of all the various checks that you would usually have to spend time reviewing.
Did your community agree on TABs vs. SPACES? It should. And if you do not allow TABs—automating it is trivial. Do you want to keep specific file formatting? Apply code formatter. Linting? Apply linter. Ensure that those checks are part of your CI builds and that when they fail, the whole build fails, and your contributors get a clear message on what is wrong.
Pretty much every standard and practice you agree on in your community can be not only automatically checked locally before the commit reaches the CI but (surprise!) it can often be automatically fixed before the commit even reaches the CI.
Achieving the “check everything before it even reaches CI” state is a dream for both maintainers and contributors. What if you, as a contributor, do not have to say to yourself, “Oh, no! It failed again in CI after 20 minutes.” What if, as a maintainer, you do not even see a PR that has not passed all the checks? No attention lost, no time wasted—you only focus on reviewing the commits you KNOW are following all the best practices. This should be your ultimate goal—to make sure that your contributors have all the tools and environment that perform the checks and corrections, and that the tools are seamless to integrate and install by them, and that they actually want to use them.
What should be properties of those tools and checks:
- They should be very easy to install and use. Ideally, they should seamlessly integrate into everyone’s development framework so that they don’t have to think about them once installed.
- Whenever you add a new check, everyone should start using the checks without even knowing it. Once you have the tools installed, they should automatically upgrade themselves and use all the checks that are the latest set of “best practices.”
- The errors should result in clear, actionable problems to solve. Many tools like pylint, mypy, and flake have additional options that make the errors stand out (colorful display!) and provide more information—like showing the code surrounding the error to give more context to the error.
- If you have unclear messages with no actionable error messages, you might lose the whole benefit of automation of checks. The same time you’d spend explaining the practices, you’d lose on explaining the error messages and guiding people on how to solve it. Even if your tool is not very helpful, you can always scan for those “unfriendly” messages and add an extra explanation that clearly explains how to fix the problem. Watch out for those.
- The same checks that are run on CI should be run on developer’s machines. It’s super-important that they should be able to reproduce the same errors they see in the CI in their local environment—and it has to be effortless for them. Also, the local checks should be optimized to only run the checks for the files/areas impacted by the user’s change—where on CI, a “full check” should be executed. Running checks locally should take seconds rather than minutes, where CI checks might take minutes without a problem.
- When errors are displayed by the CI environment, the tools should be written to encourage people to run the local equivalent and install the tool to not think about it anymore. All the further checks should be done locally, automatically whenever they attempt to make a new commit or push their branch to the repository.
- There should be plenty of pre-defined checks that you should be able to effortlessly add to your project. Most of the best practices are shared between projects, and there is no need to reinvent the wheel. It is great if such a tool allows you to stand on the shoulders of other people who already added their checks.
In our experience with Airflow, we use a fantastic tool that fulfills all the requirements—pre-commit. There are plenty of ready-to-use checks. It installs effortlessly. It tells you on CI what you should do to install and run it locally. It integrates with git’s pre-commit hooks, and it optimizes the checks on local machines automatically. It takes advantage of as many processors as your development machine has and limits checks to only those files that you changed. More than that—plenty of checks not only detect problems but fix them for you automatically! For example, inserting licenses in files that need them, removing trailing whitespace, regenerating indexes in your markdown files. It also detects if you have not left debugging instructions in your code, checks if your XML, Yaml, RST files are properly structured….and many more. In Airflow, we have more than 50 checks now—touching nearly every best practice we agreed on. And we do not even remember all of those. They are silently checked for us by our helpful build-bots.
If you want to keep your best practices applied to your project and want to scale your project—invite more, even casual, contributors, make sure you adopt the “automate-everything” approach.
And yes, of course, all those CI runs should also automatically run all the unit, integration possibly system tests before your PR gets into the hands of the reviewer. But I consider this as a given in the modern development workflow. If you have no automated unit tests in your open-source project, you have many more problems than how to handle new contributors. Go fix that first.
This should be your ultimate goal—to make sure that your contributors have all the tools and environment that perform the checks and corrections, and that the tools are seamless to integrate and install by them, and that they actually want to use them.
If you want to write good software documentation, there’s one thing you need to know. There isn’t one thing called documentation; there are four—tutorials, how-to guides, reference, and concepts.
It requires four different approaches to their creation, as they represent four specific purposes and functions. Ideally, each type of documentation should be clearly separate. Each of them is needed in a different situation and for a different audience.
Reference documentation is one type of documentation experienced users imagine they need. It precisely describes each component it focuses on. Everyone is happy when it is automatically generated from the source code into a user-friendly form. An example of such documentation is a list of classes/packages/modules with a description, OpenAPI specification, or a command list available in your tools or in the CLI. You refer to such documentation for detailed information like an encyclopedia or dictionary. It contains dry descriptions of all machinery.
Tutorials are intended for the novice user who wants to get to know your project. This documentation is a lesson for a person who has no experience with your product and does not know how to start. This should take you through a simple project that shows the necessary activities that the user should do to use it. Many projects have very poor tutorials, which makes it difficult for new people to start working with it. If you want new people to be happy to use your product, you should pay special attention to this documentation type. You can think of it as the first lesson with a teacher.
Concepts documentation are a chance to take a broader view and expose it from a higher level, and even from different perspectives. This may explain things that are not unique to your product but are required to use it. You can imagine the concepts document being read in your spare time and not on the code. It is often included in sections called “Concept,” “Background,” or “Overview,” but it is often also available as separate documents such as PEP. Reading this documentation may not always allow you to use the product, but it will make it easier to ask further questions. You can imagine it as a conference presentation. They can also explain why things are so—design decisions, historical reasons, technical constraints. That touches on one or more topics, but you will probably need to read some more guides if you want to use this knowledge in practice.
In the world of techies, it’s documentation that is often forgotten when all people are goal-oriented, but you should take care of it as there isn’t always someone who can tell you a broader story of a feature. Sometimes this documentation is also located on the blogs of many contributors. However, a good project should have this kind of documentation on its side as well.
How-to guides are strictly goal-oriented. The audience who reads this document knows about the problem they have but looks for a solution. They’re also fun and easy to write, but you must remember to write them down if you want your change to be used by broad audiences.
By trying to stick to this categorization in your product documentation, you can make it easier for you to find the information and make it easier for you to find gaps in the documentation or keep the documentation in good condition.
If there’s one thing we would love you to remember from this blogpost then it is:
“Be empathetic and value the community over code”
This should guide you in the open-source world no matter if you are a maintainer or contributor. Open source is all about people because the code without community will die. That’s the last, crucial lesson about open source.
Principal Software Engineer
You might also like
August 04, 2020
Société Générale and Their Journey to Open Source
Our PM, Karolina, had the chance to talk to two men from Société Générale before the Airflow Summit 2020 about the company's transition to Apache Airflow. Here's what she learned.