share

ENGINEERING

38min read

How to Build Swift Compiler-Based Tool? The Step-by-Step Guide

Why would you want to use the Swift compiler as a library?

We’ve recently open-sourced Sirius obfuscator, the tool for code obfuscation of iOS, macOS and tvOS apps written in Swift language. One of the main challenges during its development was to parse and analyze the source code. The task of properly identifying the symbols for renaming is harder than it looks and we’ve decided to leverage the Swift compiler itself for the most demanding parts.

What we’ve learned along the way is how to use the functionalities offered by the Swift compiler in the third-party development tools. We believe this might come in handy when solving other problems related to the software development that can be expressed by the analysis or the transformation of the source code. Whether you’re planning on creating a code generation tool, a linter, a refactoring engine, a code visualizer with various metrics, a migrator between multiple versions of your library or a transpiler, there’s always a need for a powerful parser and analyzer of the source code.

Using the Swift compiler as a library is not the only solution available—far from that! There’s SourceKit that backs successful projects like Sourcery, and there’s also lib/Syntax, created with a great effort to simplify the refactoring and the transformation of source code. These tools might serve your needs way better than using the Swift compiler internals since they both expose the public APIs that’s more stable and easier to work with. Unfortunately, they did not provide the functionalities required for the code obfuscation. That’s why we’ve decided to take the road less traveled.

And it’s all for the best! Now we can share what we’ve learned. In this two-part article I’ll guide you through using the Swift compiler as a library with the Sirius obfuscator as the example. The first part (one that you’re currently reading) will guide you through expanding the compiler and integrating your tool with the build and test infrastructure. The second part will show you how to use compiler internals and how to take advantage of a number of functionalities provided by LLVM, Clang and Swift itself. But let’s start with the basics: downloading and building the compiler with the idea of expanding it.

How to build Swift compiler for extending?

The most up-to-date instructions on how to download the Swift compiler and its dependencies are available on the GitHub page. The quick version consists of:

$ brew install cmake ninja
$ git clone git@github.com:apple/swift.git
$ ./swift/utils/update-checkout --clone-with-ssh

Using these commands, however, will give you the master branch. This might be what you are after, but if you want to ensure that the tool you’re building is working seamlessly with the source files written in a release version of the Swift language (for example, one that is distributed with particular Xcode version), you’d be better served by checking out the dependencies at the particular tag. For Xcode 9.4 that uses Swift 4.1.2 (the stable one at the moment of writing) the tag name is swift-4.1.2-RELEASE, so you should write:

$ brew install cmake ninja
$ git clone --branch swift-4.1.2-RELEASE git@github.com:apple/swift.git
$ ./swift/utils/update-checkout --clone-with-ssh
$ ./swift/utils/update-checkout --tag swift-4.1.2-RELEASE

After you’ve checked out the proper version of the source code, now it’s time to build it. Before that, however, it’s good to decide on the IDE that you want to use for the development. There are multiple options available, originating from the fact that Swift compiler uses CMake as the build system. CMake supports generating project files for various IDEs, including Xcode, Eclipse and more. However, if you’re familiar with the Apple’s Xcode (and I suppose you are since we’re talking Swift here), I strongly recommend sticking to it. The support for C++11 that the compiler is written in is great. You can build a debuggable, development-friendly Swift and LLVM with the Xcode projects generated using:

$ ./swift/utils/build-script --clean --xcode --release-debuginfo --debug-swift

One gotcha, however, is that building for Xcode generates the standard library and the shims of the system SDKs only for one architecture: the macOS x86_64. The limitations of Xcode prevent it from cross-compiling for the iOS, watchOS and tvOS. If you want the ability to work with files that import any non-macOS framework (like UIKit), you must compile Swift for the second time and copy the resulting shims:

$ ./swift/utils/build-script --ios --tvos --watchos --release-debuginfo --debug-swift
$ rm -r -f swift/build/Xcode-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-x86_64/Debug/lib/swift
$ cp -r swift/build/Ninja-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-x86_64/lib/swift swift/build/Xcode-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-x86_64/Debug/lib/swift
$ rm -r -f swift/build/Ninja-RelWithDebInfoAssert+swift-DebugAssert

Congratulations! You’ve got the Xcode project ready to work with all the Apple platforms. You can now open it with:

$ open swift/build/Xcode-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-x86_64/Swift.xcodeproj

The Xcode will ask you whether to auto generate the schemes. I recommend NOT to do it, because the projects consist of so many targets that there’ll be way too much schemes to work with comfortably. The better approach is to create the schemes only for the targets that you’ll actually use. Most probably, as we’re expanding the Swift compiler, these will be your targets: command line tools and libraries. Before adding them, however, let’s have a quick overview of the agenda for building the compiler based tool.

What are the components of the compiler-based tool?

The basic interface to all the goodies that the compiler internals can offer is the command line. It’s easy to create, easy to integrate with the users’ workflow (including integration with the CI/CD scripts or the Xcode build settings) and with the existing Swift build architecture. It’s also the most natural environment for the tool that relies on the compiler, since the compiler has the terminal interface as well. Therefore I recommend starting the development with the creation of the command line tool. In Sirius obfuscator we’ve made three of them: obfuscator-symbol-extractor, obfuscator-name-mapper and obfuscator-renamer.

The CLI is the path to the compiler’s internal, but most probably you will need to write some additional logic for your tool to achieve the greatness you’ve envisioned. While you could include this logic in the interface layer, it’s better to move it into the separate library that your CLI will link. You could even provide multiple libraries, if needed, to further modularize your code. Therefore the second step that we took while working on Sirius was to create the swiftObfuscation library.

The third and fourth steps are all about testing. Tests are crucial for the compiler-based tools because the Swift source code and all the other formats it’s processed into (AST, SIL, LLVM IR) is an enormous domain, full of edge cases and possibilities for regressions. The unit tests will help you keep them away from your library, while the integration tests will ensure that the tool works fine end-to-end. In our case, we’ve chosen the unit tests for the library and the integration tests for the command line tools.

How to create Swift compiler?

After these four steps, you’ll have a complete setup for working on your compiler-based tool ready. The best thing is that you don’t need to write a lot yourself while doing it. Everything can be properly configured by integrating with the Swift compiler build system.

How to add a command line tool to Swift compiler?

There’s already the existing infrastructure for creating command line tools that are linking the compiler libraries. In fact, the swift executable itself is defined this way. It means you can leverage a lot by following the structure and conventions of the Swift compiler source code. And this is exactly what we did for Sirius obfuscator.

How to add command line tool to Swift compiler?

As I’ve mentioned before, the Swift compiler uses CMake as the build tool. Therefore, defining the new component can be done by creating the proper directory structure and the build file. All the command line tools are located in the tools directory. There’s a CMakeLists.txt file that uses a custom function called add_swift_tool_subdirectory to enumerate the subdirectories which should be treated as the tools’ source. Each one of these directories contains at least two files: a C++ implementation file with the main function that serves as the entry point for the resulting executable and the build file that defines the sources, dependencies and the name of the command line tool. It’s enough to provide your own directory and two files: .cpp and CMakeLists.txt, and voilà! You’ve just created a new command line tool. The executable will be located in the bin directory of the Swift compiler along all the other binaries. If you use the compilation settings as they are provided in this article, it will be Debug/bin.

If you’re working with Xcode, there’s one gotcha. There will be no new target created automatically for your. You’ll need to re-generate the Xcode project using (notice the last --skip-build flag, it’ll save you hours):

$ swift/utils/build-script --xcode --release-debuginfo --debug-swift --skip-build

After opening the updated project, you’ll see the target for the new tool. You can now create a schema and run it. As you’ll see, whatever you’ve put into the main function will get executed. Whatever you’ve linked in the CMakeLists.txt file will be stated as your target dependency. You can set the debugger and pass arguments via the schema configuration. The project is ready to be worked on. If you’ve got lost, you can check the reference setup in the Sirius obfuscator repo.

One important thing, though: please remember that the Xcode project is merely a derivative generated from the actual build structure. Adding files in Xcode will let you build them correctly, but they will not be included in the CMake structure. The re-generation of the Xcode project will remove them from targets, and they will not be included by any other build system (like Ninja).

Once you’ve got your command line tool running, it’s time to define the interface. Fortunately, you do not have to do it by hand. LLVM provides the CommandLine library that takes care of parsing the arguments and creating the help message. The only thing you need to do is to declare the options that your tool requires. You can also specify the options format, such as whether it should take the additional data, and if so, the type or possible values of that data. The API is declarative and a pleasure to use, so it’s a shame not to use it!

The most important thing to remember is to include the initializing macro (PROGRAM_START and / or INITIALIZE_LLVM) at the top of the main function. Also, it might be a good idea to hide the options that don’t belong to your tool using llvm::cl::HideUnrelatedOptions function. After that, it’s enough to call llvm::cl::ParseCommandLineOptions to get the options properly read and parsed — or to get an error if there’s some required argument missing. The last practice recommended is to use the ExitOnError mechanism for returning early on an error. Again, the Sirius obfuscator source code might come in handy as a reference.

Once the interface is defined, we can move to writing the actual logic. It’s best to put it into the library, not only for the separation of the interface and logic, but also to enable unit testing. Fortunately, creating the library is as simple as creating the command line tool. In fact, it’s very similar!

How to add a library to Swift compiler?

The library differs from the command line tool in two main ways. Firstly, there’s no implementation file for main function that serves as execution entry point, since the library is not built to be executed, but to be linked to and used by other modules. Secondly, the separation of the implementation and the interface is crucial for creating and maintaining the library API. The separation is expressed through the build system conventions: the .h header files are located in the swift/include/swift directory, and the .cpp implementation files are under swift/lib.

How to add a library to Swift compiler?

To leverage the Swift build architecture, you need to create the directories for your new library in these two locations. To let the CMake know about your new library, the implementation directory must be referenced in swift/lib/CMakeLists.txt with the aforementioned add_subdirectory function call. After creating the source files, you might proceed to write the build file for the library that defines the implementation files (but NOT header files!), the dependencies and the library name. Now you may regenerate the Xcode project, and that’s all! The target for your library is properly configured and visible in the IDE.

After you build the new library, it will be generated in the lib directory (Debug/lib if you’re using the settings from this article) as the .a file that might now be linked to the executable. You’re free to add the library name as the dependency of your command line build file, regenerate the Xcode project and import it into the tool. There are no additional gotchas in this step, just remember to restrain from manipulating the target membership or adding/removing files in the Xcode project, as it’s just a derivative of the underlying CMakeLists.txt files structure.

Your library might (and probably should!) add some dependencies on the compiler internals. You can do it by referencing them in the LINK_LIBRARIES section of the library’s build file. If you get lost, you can check the reference setup in the Sirius obfuscator repo. At this point, you might dive into the feature-happy coding, but I recommend spending a little more time on the setup so that your library and tool could be easily tested.

How to unit test the library?

As the library has no execution entry point, just the public interface, testing it directly is best addressed by the unit tests. The unit tests in the Swift compiler build architecture are simply executables that are linking the library of interest. They use the Google Test framework for defining and running the tests. The framework provides the assertions, mocking (via separate Google Mock) and various flavors of tests to create. The use of macros (such as TEST_F or EXPECT_EQ) helps to keep the testing code clean and maintainable.

How to add new unit tests to Swift compiler?

Since the test target is simply another executable target, the setup is very similar to the command line tool’s one. It consists of creating a subdirectory in swift/unittests, adding it to the build system with the familiar add_subdirectory call and defining the target in the usual way: by function call (this time it’s add_swift_unittest) in the unit tests’ CMakeLists.txt file.

Yet again, the Xcode project must be regenerated and can not be used for target management. There is one other gotcha that you need to remember. Since the library under test is statically linked with the executable, you must rebuild the unit tests target before you run it so that all the new changes in the library are included. It’s done automatically when using Xcode, but if you’re running tests from the command line, you need to make sure that the freshest version of the library is linked.

The unit tests greatly help with the development of the features and keeping your library behavior stable, but they are don’t ensure that the whole tool is working properly end-to-end. Also, if your library depends heavily on the compiler internals that are hard to mock in tests (such as AST nodes), it might be difficult to rely solely on unit testing to cover all the use cases. There is another supplementary solution, which also happens to be the last step in our recipe: providing the suite of integration tests.

How to test the command line tool end-to-end?

The integrations tests are defined differently than the targets that we’ve been configuring until now. And for the good reason: they use the separate infrastructure for running, called lit (LLVM Integrated Tester). It’s a Python tool that’s responsible for consuming the test definitions and running them according to a configuration file called lit.cfg.

As a separate terminal tool, lit can access the subjects under test only through their command line interface: the actual execution of tools. There is a huge number of options and custom features allowed (after all, anything you can do in the terminal, you can ask lit to do for you), but the core flow is always the same. Lit calls the tool passing the input and proper flags, and then asserts on the output. In Sirius obfuscator we’ve always provided the expected output and diffed the fixture with the result of the tool execution. It was invaluable for catching the regressions and reproducing the edge cases.

How to add new integration tests to Swift compiler?

As usual, there’s a convention defining the location of the integration tests in the Swift compiler file structure. This time the directory to peek into is called swift/test. It’s best to put your integration test files and any other resources (like the fixtures) in its subdirectory. No CMakeLists.txt is necessary, but there is another way to integrate with lit. You can use the configuration file called lit.cfg. It contains all the custom commands that you can use when writing the test cases. The most important thing to add to it is the path to the tool under test. I do not recommend hardcoding it since there is a handy inferSwiftBinary function available that takes under consideration the possible variances in paths resulting from the build settings. By writing config.my-tool = inferSwiftBinary('my-tool) you can register the command line tool so that it’s available in tests using lit’s comment-based format, such as RUN: %my-tool. The sample tests from Sirius obfuscator might come in handy as a reference.

The lit.cfg allows for a lot of freedom and customization (again, see the example in Sirius repo). You’ll probably find it as crucial to your workflow as we have, regardless of the tool in question.

Now the fun begins!

This concludes our guide showing how to set up a compiler-based tool that calls Swift compiler internals under the hood. Now it’s time to actually use these internal libraries! The choice of the ones useful for you depends on the details of the particular tool, so in the second part of this blogpost we’ll use the Sirius obfuscator as the example and dive into what was used in it. While the case may seem more specific, it may still serve as the illustration of the approach to take when calling the Swift compiler libraries directly.

PART 2

How can you leverage the Swift compiler internals?

Now it’s time to focus on how to use the Swift compiler internal libraries, treating Sirius obfuscator as the case study.

Apart from the main problem of analyzing the source code, there was also the second, smaller issue: the actual renaming. It’s just a text substitution in the source file, but still, there are some concerns about the potential bugs with offset calculation and the desired performance. Also, the biggest advantage of using the Swift compiler internals is the access to a large body of functionalities that are already battle-tested in production. To put it simply: there’s a lot of code that you don’t need to write.

Therefore the guiding principle was always to find and apply some part of Swift compiler, LLVM or Clang before we tried to write the functionality ourselves. The example that was already mentioned in the first part was the command-line interface creation with llvm/Support/CommandLine.h. Many more great tools are available, and we’ve taken advantage of a number of them. Let’s start with the Abstract Syntax Tree (AST) generation.

The quick overview of Swift compilation process

Before we move into details on how to use the parts of the Swift compiler that are responsible for transforming the source code into AST, let’s quickly go through the compilation process on the whole. On the highest level we could think about it as a series of steps that convert the Swift program from the one representation to another. Each representation is designed for different purpose. Along the way, we’re moving further and further away from the source code until we reach the actual machine code for the required architecture.

The quick overview of Swift compilation process

It all starts with lexing, which is a process of splitting the source code into tokens according to the language grammar. It’s similar to what you do when you take a text in English and identify nouns, verbs, pronouns, adjectives, question marks etc. In Swift, the example tokens are language keywords, brackets of various kinds, the literals, identifiers and colons. Lexer does the syntactic analysis, which doesn’t present the role that the particular token plays, but identifies whether it’s a bracket, an identifier, a dot, etc. You can look through the source code (header and implementation in the swiftParse) to get the details on how it’s working.

The tokens are then consumed by the parser. Its goal is to create the Abstract Syntax Tree which identifies the role of a particular token or group of tokens. Is this identifier a parameter name or a variable name? Which scope does this method belong to? Is this a declaration or a call to the function? There are also way more specific questions and constructs that must be expressed in the AST, like casting or implicit conversion or optional access to some property. Consult the source code (header and implementation in the swiftParse) for deeper understanding of the parser’s functionality. You might also enjoy going through the swiftAST library that contains all the AST building blocks, a.k.a. nodes, to get some idea about the concepts expressed in the abstract syntax tree.

The important characteristic of the Swift AST that we took advantage of in Sirius obfuscator is that because it’s derived from tokens, it’s directly relatable to the source code. It means that you can get the reference to the place in the source file that the particular AST node represents. Without that information the renaming of identifiers, the reason for which Sirius was made, would be impossible.

The AST that the parser generates from the tokens lacks some information related to the type system. That’s because these relationships are hard to identify. Mostly because many of them require all of the source code to be parsed before we can determine whether the particular construct, such as method overloading, is valid, and what it refers to. Fixing it is the responsibility of the semantic analysis step, which uses the type checking engine to find out all the relationships between the types.

For Sirius obfuscator, this was the crucial step and the main reason we’ve decided to use the Swift compiler internals. Resolving types is a hard, computationally expensive, iterative process full of edge cases. You probably remember the infamous “expression was too complex to be solved in reasonable time” error that haunted the earlier versions of Swift language (and can still show up from time to time!). The expression solver is what semantic analysis is revolving around. If you are interested in how it’s working, please check out the sources and the documentation with the detailed explanations of the type checker design. I also encourage you to read the blogpost by Slava Pestov, the core Swift team member, on how the types are represented in AST.

After the AST reaches its final form, it’s time for the next steps. They were not used in Sirius obfuscator, and I’m less familiar with them, so I’ll stick to a quick overview.

The SIL generation step takes AST and creates, unsurprisingly, SIL, which is the Swift Intermediate Language. It’s a form of the Swift program that is type checked and designed for the ease of optimization. There’s the documentation on the format and on the implementation. Since the optimization is one of the crucial elements of this step, there are multiple loops of analysis, transformation and optimization done on SIL before we move further with the compilation. If you are interested, there’s the document on SIL optimizations and another one on the optimizer’s design. There’s also an interesting blogpost by the aforementioned Slava Pestov on types in SIL.

After the SIL reaches its final, most optimized form, it’s time for more processing. Here comes the LLVM Intermediate Representation generator. It consumes SIL and creates the LLVM IR, which is the assembly-like language that LLVM is working on. Once it’s generated, we’re leaving the Swift compiler world and move to the world of LLVM. In fact, everything that we’ve described until now can be seen as “frontend” from the LLVM perspective. If it’s not obvious to you why it’s the case, please refer to the introductory article on the architecture of LLVM-based compilers.

Now, some further analyses, transformation and optimization are performed. The main LLVM documentation is a great place to start if you’re interested in the details. Suffice to say, even skimming through the list of analysis and transform passes done by LLVM will give you the idea of how much work is done at this stage. Also, it’s not really Swift-specific, as LLVM is the common backend for multiple production-ready compilers, including Rust and Objective-C.

After the LLVM IR is optimized, it’s time for the last step: machine code generation. What’s created here is the actual binary that can be loaded and executed. It’s generated in the format that is proper for the particular architecture. The compilation is finished, and so is our short overview. If some of the terms or concepts were not clear, I encourage you to look through the Swift compiler lexicon, which is a great reference guide with clear and concise definitions. There’s a similar one for LLVM.

The parts of the above process that Sirius obfuscator was leveraging were located between the initial Swift source code and the type checked AST. This is also where I’ll dive into more details, also as far as using these parts for code analysis and transformation is concerned.

How to compile Swift code into the AST?

The starting point on our road to AST is called CompilerInstance. It’s a class from swiftFrontend library that exposes the performSema method which, as the name suggests, performs parsing and semantic analysis on the source code. You can provide the paths to the source files using the CompilerInvocation class that serves as the bag of configuration options for the CompilerInstance. Apart from the source files, there are also the paths to frameworks that are imported in the Swift source code, the path to SDK, the architecture in the target triple format and many other options. Basically, anything that can be passed to the compiler as a flag becomes a field in the CompilerInvocation itself or one of many structures it gathers. You can find the example usage in Sirius obfuscator.

How to compile Swift code into the AST?

There are two gotchas here that we’ve learned the hard way. First, make sure that you are setting all the options that would otherwise be set by Xcode. Even if the flag looks useless to you, there’s a possibility that omitting it will lead to some strange bugs that are extremely difficult to debug, since they require reading through a huge amount of the Clang, LLVM and Swift compiler internals. The second gotcha is concerned with NOT linking the system frameworks directly, but letting the compiler infer them from the SDK path. While it could have been just a misconfiguration on our side or a lack of some linker option, we’ve noticed that the Clang module importer functions differently depending on whether the system framework path was set explicitly or inferred. It resulted in a difference in the generated AST that led to the bugs which were very difficult to squash.

After the compiler instance is configured, we’re just one method call away from the AST generation. Call performSema and it’s done! Now you’ve got the AST ready to be traversed. One important thing to notice is that there might be some errors that appeared during the semantic analysis. However, it won’t cause the method to throw an exception or return an error code. The only sign of potential troubles will be in the diagnostics. You can consume them using the DiagnosticConsumer instance, for example PrintingDiagnosticConsumer that just prints the errors and warnings to the terminal. By checking whether it encountered any error you can ensure that the resulting AST is complete. It’s crucial that you do. If there’s anything missing from AST, you’ll base your logic on the incomplete information. In Sirius obfuscator case, it has led to the lack of renaming or wrong renaming, which in turn could easily cause either compilation or runtime error. Adding and checking the DiagnosticConsumer is the best way of avoiding that!

The resulting AST is available in the form of the vector of SourceFile instances that can be obtained from CompilerInstance->getMainModule()->getFiles() call. Each SourceFile can be thought of as the container for the AST nodes that represent code from the particular source file. The great thing is that although these containers are not connected, the AST nodes are. After encountering the symbol from another file, the relationship is expressed in the form of a pointer to the node from other container. But we’re getting a little ahead of ourselves. Let’s first understand how the AST can be traversed.

How to walk the Swift AST?

The AST nodes are represented in the memory as objects. The relationships between them are represented as pointers from one node to the other. So traversing the AST means following the pointers. There’s nothing magical about it. You could write this code yourself. It is, however, a lot of code to write, especially because various node types have their own names for accessing the pointers that you might be interested in following. Sticking to our guiding principle of leveraging the existing Swift compiler functionalities as much as possible, we’ve quickly discovered that there’s a ASTWalker abstract class that does a lot of the heavy work for us. It exposes a number of callback for various events that might happen when traversing AST. You can use them by subclassing ASTWalker, but there is a simpler way. Fortunately, there’s a class that serves as the wrapper over the ASTWalker that does even more work for us and exposes API that is easier to use. It’s called SourceEntityWalker.

In terms of usage, SourceEntityWalker is very similar to ASTWalker. It’s also an abstract class and you need to subclass it to get the functionality. The most important feature for us that differentiated SourceEntityWalker from ASTWalker was that when SourceEntityWalker encountered a declaration, it calculated and reported its place and range in the actual source file, in a format that could then be directly used for making changes. But wait, you might say, I’m getting lost, what is this declaration thing you’ve mentioned? To understand it, please follow a lightning-quick introduction to Swift AST.

There are three main types of nodes in the Swift AST: declarations (subclasses of the Decl type), expressions (subclasses of Expr type) and statements (subclasses of the Stmt type). They correspond to three concepts that are prevalent throughout the Swift language. Declarations are basically the identifiers in code. Function names, struct names, parameters’ names are all declarations. Expressions are things that return value, such as function calls or literals. Statements are parts of the language that define the control flow, but are not returning value, such as if or do-catch blocks.

These three types of nodes can be seen in ASTWalker signatures, for example in methods like bool walkToDeclPre(Decl *D) or Stmt *walkToStmtPost(Stmt *S). SourceEntityWalker, being essentially a wrapper over ASTWalker, is also using these objects. This time, however, they are already parsed and reported with a lot of additional information. For example, see how the declarations are provided:

bool visitDeclReference(ValueDecl *D, CharSourceRange Range, TypeDecl *CtorTyRef, ExtensionDecl *ExtTyRef, Type T, ReferenceMetaData Data);

There is a lot of parameters here, but the crucial one that I mentioned above is the CharSourceRange Range. This is basically a pointer to a place in the source file containing the declaration. There is also its length, so if we decided that we should make a change, such as renaming, we could easily know where the new name should be put and where it should end. So useful! It’s not a coincidence that SourceEntityWalker is a part of swiftIDE library designed for performing the source code transformations.

So, to walk the Swift AST it is best to subclass either ASTWalker or SourceEntityWalker. While working on Sirius obfuscator we were mainly interested in finding and analyzing all the identifiers in Swift source code, so we focused almost exclusively on declarations. One gotcha, however, which we’ve encountered was that there were some declarations not reported by SourceEntityWalker and they must have been extracted from the containing expressions. This has guided the design of the Sirius AST traversing process.

How to walk the Swift AST?

SourceEntityWalker is the source of AST nodes. The first step is to obtain the declarations. Some of them are available directly, some must be extracted from the expressions. The class responsible for traversing expressions in search of declarations is called Processor. The declarations it returned, along with those taken directly from SourceEntityWalker, are passed to instances of Collector subclasses.

The Collector is responsible for generating the symbols from the particular AST node. Symbol is the string that uniquely identifies the declaration. There are two building blocks of each Collector: Includer and Symbol Generator. The Includer is responsible for determining whether the declaration processed should be considered for renaming. If not, no symbols will be generated. Otherwise, the declaration is passed to Symbol Generator, which contains the logic for creating the symbol. In other words, it “knows” how to build this string that uniquely identifies the declaration.

The generated symbols are also passed to the instances of Excluder subclasses. They are responsible for excluding some symbols from renaming. In a sense, they are very similar to Includers, but with one crucial difference. While Includers make decision based on the local information (they have the access only to the information that’s available in the particular declaration), the Excluders are looking at symbols from the global perspective. They know about the external factors, such as the Sirius obfuscator configuration file, and about some special cases that might be found only after parsing the whole source code.

After all the symbols are collected, the last step is to pass them to the instances of Symbol Updater subclasses. They are designed for making changes to all the symbols, such as removing symbols that should be excluded according to Excluders output, but also rewriting some other ones if a particular special pattern was found (again, identifiable only after the whole source code is parsed). After Symbol Updaters finish their work, what we’re left with is the set of all the symbols that should be renamed, including also their place in the Swift source code files.

The last elements in this process are the Extractors, which are simply helper objects containing the logic of extracting information from the particular AST node. For example, they “know” how to find a functions name in the FuncDecl instance.

If you’ve found this quick guide through the Sirius architecture interesting and would like to talk it more, I’d love to! Now it’s time to move away from the Sirius implementation details and proceed to another Swift compiler internal library that you might find useful when creating your own tool: swiftIDE.

How to perform the change in the Swift source code?

I’ve already mentioned swiftIDE when the SourceEntityWalker was introduced. There are way more functionalities there, especially related to performing changes in the source code. To understand them, please give a warm welcome to the two new actors on the stage: SourceManager and SourceEditConsumer.

SourceManager is a part of swiftBasic responsible for taking care of the buffers that contain the Swift source code read from the source files. It’s obtainable from the SourceFile instance with SourceFileInstance.getASTContext().SourceMgr call. While I had no need for using SourceManager directly, it’s the crucial component for performing the renaming, since it guards the access to the actual data that we want to change.

If SourceManager is the data, than SourceEditOutputConsumer is the logic that operates on it. Coming from swiftIDE, it uses the LLVM / Clang infrastructure under the hood to make the actual changes in the buffers contain the source code. You set it up with the reference to the SourceManager instance, the identifier of the buffer it should work on and the reference to the stream that writes to file. After that, it takes care of writing to file, calculating the offsets when replacing texts and managing the buffer’s memory. From the consumer’s side, using it is as simple as calling:

SourceEditOutputConsumerInstance.accept(
  SourceManagerInstance, CharSourceRangeInstance, TextToInsertInTheRange
);

That’s all! Couldn’t be more straightforward, really. You may always look at the example usage in Sirius obfuscator for reference. The actual writing to the filesystem is done at the SourceEditOutputConsumerInstance deallocation. After that, you can peek to the files to see them changed.

First time I saw it working, I was thrilled. The actual transformation of Swift source code, based on the information from AST and performed with use of swiftIDE, is something that opens so many doors. Before we discuss the possibilities however, let’s go on a short tour of a few other libraries that you might find useful when writing your own compiler based tool. See how many functionalities you do not need to write!

Other useful libraries found in Swift compiler

YAML and JSON serialization / deserialization

If you want to pass some data to your tool or return some output, you’ll most likely find yourself in the need of a data format. Formats, however, require to be encoded and decoded according to their (possibly nontrivial) grammar. That’s why most of the large codebases and some of the standard libraries treat serializers and deserializers as an important functionality to provide. Swift compiler is no different, especially since it uses LLVM that has an excellent support for reading and writing YAML textual data format.

All you need to do to make your object deserializable from a string containing YAML is to implement the MappingTraits. It’s part of llvm::yaml scope (see the documentation) which provides a great, declarative way of defining the relationship between the YAML data and the object fields. Sample trait for a Symbol structure may look like this:

struct Symbol {
  std::string Identifier;
  SymbolType Type;
};

void MappingTraits<Symbol>::mapping(IO &Io, Symbol &Object) {
  Io.mapRequired("identifier", Object.Identifier);
  Io.mapRequired("type", Object.Type);
}

That’s all! No additional setup required. Now it’s enough to call:

StringRef Yaml; // this is the reference to the json string
Input Input(Yaml);
Symbol Deserialized;
Input >> Deserialized; // the operator uses MappingTraits internally
if (auto ErrorCode = Input.error()) { /* handle error if needed */ }

If the provided path is valid, here comes the Symbol instance! And the great thing is that as JSONs are valid YAMLs, one MappingTrait is enough to parse both this:

identifier: "identifier"
type: "parameter"

and this:

{
  "identifier": "identifier",
  "type": "parameter"
}

Another great thing is that if you want your instances to serialize to the YAML string, there’s nothing more you need to do. The MappingTraits declaration simply goes both ways. The sad news is that it doesn’t support JSON serialization, since llvm::yaml can only write, unsurprisingly, YAML. Apparently, there was a need for the JSON support in the Swift compiler, which led to the creation of swift::json scope that provides the mechanism for JSON serialization. It’s almost identical to the LLVM’s one, just the name of the trait differs. This time it’s ObjectTrait. The actual implementation, however, is very similar:

void ObjectTraits<Symbol>::mapping(Output &Out, Symbol &Object) {
  Out.mapRequired("identifier", Object.Identifier);
  Out.mapRequired("type", Object.Type);
}

The serialization call is also very similar:

Symbol Symbol; // this is the instance we want to serialize
std::string OutputString;
llvm::raw_string_ostream OutputStringStream(OutputString);
Output Output(OutputStringStream);
Output << Symbol; // the operator uses ObjectTraits internally
std::string Json = OutputStringStream.str(); // this is the JSON string

One thing to notice is that there are some differences when defining the mappings for the collections (like vectors or sets) and enums. As always, I believe that Sirius obfuscator source code is a nice place to look for the reference.

File I/O

Other common functionality that your tool might require is the access to the file system. This is also something that the Swift compiler and LLVM simply must provide, since they’re using it all the time. What you might find surprising, however, is how simple it is. The basic method for reading from file consists of just one line:

llvm::MemoryBuffer::getFile(Path);

This is enough to read from file to the memory that can then be easily converted to string and used any way you like, including the serialization and deserialization methods.

The API for writing to a file is also a pleasure to use. You just need to create a llvm::raw_fd_ostream instance which is a stream that writes to disk. Then you can use the << operator:

std::string TextToWriteToFile;
std::error_code Error;
llvm::raw_fd_ostream File(Path, Error, llvm::sys::fs::F_None);
if (Error) { /* handle error */ }
File << TextToWriteToFile; // operator writes to the stream which writes to the file
File->close();

Great! But it’s just a fraction of what’s available in the LLVM. There are also great helper functions for working with paths, such as llvm::sys::path::replace_path_prefix, for iterating over directories and their content (see for example llvm::sys::fs::recursive_directory_iterator) and for making changes on the file system, such as llvm::sys::fs::create_directories, llvm::sys::fs::copy_file or llvm::sys::fs::remove. There’s little need for using the C++11 STL. Speaking of which, the last thing I wanted to point out is the LLVM’s error handling, a great example of how to introduce the modern patterns into the C++ world.

Error handling

The main place to look for the error-handling code is llvm/Support/Error.h, which is also well documented in the LLVM Programmer’s Manual. The basic idea is that instead of returning the std::error_code from a function that might fail, you’re returning the llvm::Error. The advantage of the latter is that it MUST be handled before being deallocated. If not, there is an assertion raised.

But what if you want to return a value from a function? Should you pass it out through a pointer? No need! There’s an Expected<T> object for similar cases, which is basically equivalent to the Result<T> known from other programming languages. It has all the same advantages as the Error type, which means that it must be handled under the threat of assertion, but it can also carry a value.

I must say that I’m really impressed by how elegant this pattern is. There is the actual mechanism for enforcing that the possible error must be handled. It’s baked in the error handling itself, but without the usual try-do-catch dance that some other languages (like Swift) require. It’s based on the return value, but it is also impossible to ignore or forget. Great job, LLVM community!

Let’s finish our quick tour on this high note. Of course it’s just a tip of the iceberg, both in terms of the specialized Swift libraries and more general LLVM goodies (just look at all the data structures they provide!). I hope it’ll inspire you to check out the Swift compiler sources and try to hack around yourself. There’s so much to explore there.

So many possibilities!

While exploring and having fun with Swift compiler is a great thing in itself, let’s not completely forget about the bigger picture. The whole point of both the first part of this blogpost and the second one that we’re now concluding is to show you how easy it is to build your own Swift compiler-based tool and how many functionalities it might offer.

So the one thing I ask you for right now is to stop for a minute and think of the possibilities! The code obfuscation that we did in Sirius is just one idea. Another one is the code analysis tool that shows you the various metrics for the complexity and maintainability of your source code. Yet another one is the security analyzer that is looking for the vulnerabilities in your source code and proposes the enhancements. Yet another one is the code migration tool that, having been provided with the migration configuration for the API changes in a particular framework, is able to find the identifiers from that framework and upgrade them to the new version. And yet another one is a transpiler that, instead of using AST for code analysis, uses it for the generation of code in other programming language. These are just few examples for your inspiration. Many, many more things are possible with the compiler based tools.

Also, you’re always welcome to contribute to Sirius obfuscator! We’d be very happy to introduce you to any implementation details and help you get up to speed with the development.

Thank you for reading! I hope you’ve found this quick tour of the selected Swift compiler internals useful. In case of any comments or a need for further explanation, please contact us!

share


KrzysztofSenior Software Engineer

LEARN MORE

Contact us if you have any questions regarding the article or just want to chat about technology, our services, job offers and more!

POLIDEA NEWSLETTER

Sign in and expect sharp insights, recommendations, ebooks and fascinating project stories delivered to your inbox

The controller of the personal data that you are about to provide in the above form will be Polidea sp. z o.o. with its registered office in Warsaw at ul. Przeskok 2, 00-032 Warsaw, KRS number: 0000330954, tel.: [0048795536436], email: [hello@polidea.com] (“Polidea”). We will process your personal data based on our legitimate interest and/or your consent. Providing your personal data is not obligatory, but necessary for Polidea to respond to you in relation to your question and/or request. If you gave us consent to call you on the telephone, you may revoke the consent at any time by contacting Polidea via telephone or email. You can find detailed information about the processing of your personal data in relation to the above contact form, including your rights relating to the processing, HERE.

Data controller:

The controller of your personal data is Polidea sp. z o.o. with its registered office in Warsaw at ul. Przeskok 2, 00-032 Warsaw, KRS number: 0000330954, tel.: [0048795536436], email: [hello@polidea.com] (“Polidea”)

Purpose and legal bases for processing:

 

Used abbreviations:

GDPR – Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016
on the protection of natural persons with regard to the processing of personal data and on the free movement
of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

ARES – Polish Act on Rendering Electronic Services dated 18 July 2002

TL – Polish Telecommunications Law dated 16 July 2004

1)        sending to the given email address a newsletter including information on Polidea’s new projects, products, services, organised events and/or general insights from the mobile app business world |art. 6.1 a) GDPR, art. 10.2 ARES and art. 172.1 TL (upon your consent)

Personal data:name, email address

2)       statistical, analytical and reporting purposes |art. 6. 1 f) GDPR (based on legitimate interests pursued by Polidea, consisting in analysing the way our services are used and adjusting them to our clients’ needs, as well as developing new services)

Personal data:name, email address

Withdrawal of consent:

You may withdraw your consent to process your personal data at any time.

Withdrawal of the consent is possible solely in the scope of processing performed based on the consent. Polidea is authorised to process your personal data after you withdraw your consent if it has another legal basis for the processing, for the purposes covered by that legal basis.

Categories of recipients:

Your personal data may be shared with:

1)       authorised employees and/or contractors of Polidea

2)       persons or entities providing particular services to Polidea (accounting, legal, IT, marketing and advertising services) – in the scope required for those persons or entities to provide those services to Polidea

 

Retention period:

1)       For the purpose of sending newsletter to the given email address – for as long as the relevant consent is not withdrawn

2)       For statistical, analytical and reporting purposes – for as long as the relevant consent is not withdrawn

Your rights:

 

Used abbreviation:

GDPR – Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016
on the protection of natural persons with regard to the processing of personal data and on the free movement
of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

According to GDPR, you have the following rights relating to the processing of your personal data, exercised by contacting Polidea via [e-mail, phone].

1)       to access to your personal data (art. 15 GDPR) by requesting sharing and/or sending a copy of all your personal data processed by Polidea

2)       to request rectification of inaccurate personal data
(art. 16 GDPR) by indicating the data requiring rectification

3)       to request erasure of your persona data (art. 17 GDPR); Polidea has the rights to refuse erasing the personal data in specific circumstances provided by law

4)       to request restriction of processing of your personal data (art. 18 GDPR) by indicating the data which should be restricted

5)       to move your personal data (art. 20 GDPR) by requesting preparation and transfer by Polidea of the personal data that you provided to Polidea to you or another controller in a structured, commonly used machine-readable format

6)       to object to processing your personal data conducted based on art. 6.1 e) or f) GDPR, on grounds relating to your particular situation (art. 21 GDPR)

7)       to lodge a complaint with a supervisory authority,
in particular in the EU member state of your habitual residence, place of work or place of the alleged infringement if you consider that the processing
of personal data relating to you infringes the GDPR
(art. 77.1 GDPR)

No obligation to provide data:

Providing your personal data is not obligatory, but necessary for Polidea to provide you the newsletter service

Refusal to provide the above data will result in inability to receive the newsletter service.

Profiling

In the process of providing the newsletter service, we make decisions in an automated way, including profiling, based on the data you provide.

 

“Profiling” means automated processing of personal data consisting of the use of your personal data to evaluate certain personal aspects relating to you, in particular to analyze or predict aspects concerning your personal preferences and interests.

 

The automated decisions are taken based on the analysis of clicked and viewed content. They affect the targeting of specific newsletter content to selected users registered to receive the newsletter service, based on the anticipated interests of the recipient.