-
Notifications
You must be signed in to change notification settings - Fork 122
TutorialOrig
Written by Nico Weber (9/2008)
Updated by Justin LaPre (10/2009)
Update by Larry Olson (12/2011)
From Clang's website
The goal of the Clang project is to create a new C, C++, Objective C and Objective C++ front-end for the LLVM compiler.
What does that mean, and why should you care? A front-end is a program that takes a piece of code and converts it from a flat string to a structured, tree-formed representation of the same program — an Abstract Syntax Tree or AST.
Once you have an AST of a program, you can do many things with the program that are hard without an AST. For example, renaming a variable without an AST is hard: You cannot simply search for the old name and replace it with a new name, as this will change too much (for example, variables that have the old name but are in a different namespace, and so on). With an AST, it’s easy: In principle, you only have to change the name field of the right VariableDeclaration AST node, and convert the AST back to a string. (In practice, it’s a bit harder for some codebases.
Front-ends have existed for decades. So, what’s special about clang? I think the most interesting part is that clang uses a library design, which means that you can easily embed clang into your own programs (and by “easily”, I mean it. Most programs in this tutorial are well below 50 lines). This tutorial tells you how you do this.
So, do you have a large C code-base and want to perform non-trivial analysis? Would you like to have ctags that works better with C++ and at all with Objective-C? Would you like to collect some statistics about your program, and you feel that grep doesn’t cut it? Then clang is for you.
This tutorial will offer a tour through clang’s preprocessor, parser, and AST libraries.
A short word of warning: Clang is a work in progress. Although its API surface is more stable than it was in 2008 when the first version of this tutorial was written, it does not have a stable API, so this tutorial might not be completely up-to-date.
Clang works on all platforms. In this tutorial I assume that you have some Unix-based platform, but everything works on Windows, too.
The official release of Clang is at version 3.0. You can get it here. You can download and add that into your various binary, include, and library paths.
Alternatively, you can get the latest from SVN by following these instructions
A hint for folks on Unix like systems who are pulling straight from SVN and not the official released build. After getting the source for llvm and clang and configuring it per the instructions, run the following:
make happiness
# Note: You'll almost certainly need to run the next command under sudo
make install
make happiness
will checkout the latest llvm and clang from svn, build them, and run the resulting binaries through a test suite. The result of this command should look like:
Testing Time: 202.10s
Expected Passes : 9678
Expected Failures : 74
Unsupported Tests : 13
If there are any lines that say:
Unexpected Failures: 3
or something like that, then run the command again, often times a fix will already be waiting and the last update just happened to miss it. Otherwise, check the mailing lists as there may be a bug.
make install
will install the built libs, binaries, and include files.
- Compilers: Principles, Techniques, and Tools by Aho, Lan, Sethi, and Ullman Pay attention to the first two chapters, especially discussions of Lexical Analysis and Syntax Trees. Skim anything else that looks interesting up to the Chapter on Syntax Directed Translation (chapter 5 in the 1st edition). Note: All of this is what Clang is doing for you.
- Compiler Construction: Principles and Practice by Kenneth C. Louden Pay attention to the first 3 chapters up to section 3.3 again.
A front-end consists of multiple parts. First is usually a lexer, which converts the input from a stream of characters to a stream of tokens. For example, the input while
is converted from the five characters ‘w’, ‘h’, ‘i’, ‘l’, and ‘e’ to the token kw_while
. For performance reasons, clang does not have a separate preprocessor program, but does preprocessing while lexing.
The Preprocessor
class is the main interface to the lexer, and it’s a class you will need in almost every program that embeds clang. So, for starters, let’s try to create Preprocessor
object. Our first program will not do anything useful, it only constructs a Preprocessor
and exits again.
The constructor of Preprocessor
takes no less than 6 arguments: A DiagnosticsEngine
object, a LangOptions
object, a TargetInfo
object, a SourceManager
object, a HeaderSearch
object, and finally a Module Loader
object. Let’s break down what those objects are good for, and how we can build them.
First is DiagnosticsEngine
. This is used by clang to report errors and warnings to the user. A DiagnosticsEngine
object can have a DiagnosticsConsumer
, which is responsible for actually displaying the messages to the user. We will use clang’s built-in TextDiagnosticPrinter
class, which writes errors and warnings to the console (it’s the same DiagnosticsConsumer
that is used by the clang binary).
Next up is LangOptions
. This class lets you configure if you’re compiling C or C++, and which language extensions you want to allow. Constructing this object is easy, as its constructor does not take any parameters.
The TargetInfo
is easy, too, but we need to call a factory method as the constructor is private. The factory method takes a “host triple” as parameter that defines the architecture clang should compile for, such as “i386-apple-darwin”. We will get and pass the default host triple (getDefaultTargetTriple()
), which contains the host triple describing the machine llvm was compiled on. But in principle, you can use clang as a cross-compiler very easily, too. The TargetInfo object is required so that the preprocessor can add target-specific defines, for example __APPLE__
. You need to delete this object at the end of the program.
SourceManager
is used by clang to load and cache source files. Its constructor takes a DiagnosticsEngine
for errors and a FileManager
which helps it manage files on disk and in cache.
The constructor of HeaderSearch
which also requires a DiagnosticsEngine
for errors and a FileManager
which helps it manage files on disk and in cache. HeaderSearch configures where clang looks for include files.
Finally, a ModuleLoader
is an abstract class whose concrete implementation helps resolve module names. In this case, we'll create a CompilerInstance
as the default ModuleLoader
.
So, to build a Preprocessor object, the following code is required: clang::DiagnosticOptions diagnosticOptions; clang::TextDiagnosticPrinter *pTextDiagnosticPrinter = new clang::TextDiagnosticPrinter( llvm::outs(), diagnosticOptions); llvm::IntrusiveRefCntPtrclang::DiagnosticIDs pDiagIDs;
clang::DiagnosticsEngine *pDiagnosticsEngine =
new clang::DiagnosticsEngine(pDiagIDs, pTextDiagnosticPrinter);
clang::LangOptions languageOptions;
clang::FileSystemOptions fileSystemOptions;
clang::FileManager fileManager(fileSystemOptions);
clang::SourceManager sourceManager(
*pDiagnosticsEngine,
fileManager);
clang::HeaderSearch headerSearch(fileManager, *pDiagnosticsEngine);
clang::TargetOptions targetOptions;
targetOptions.Triple = llvm::sys::getDefaultTargetTriple();
clang::TargetInfo *pTargetInfo =
clang::TargetInfo::CreateTargetInfo(
*pDiagnosticsEngine,
targetOptions);
clang::CompilerInstance compInst;
clang::Preprocessor preprocessor(
*pDiagnosticsEngine,
languageOptions,
pTargetInfo,
sourceManager,
headerSearch,
compInst);
Note that this is quite verbose. Since we're using a CompilerInstance anyway, I've rebuilt these tutorials using a CompilerInstance object and its helper methods. They make this setup a bit simpler.
Now that you've written your first tutorial, you need to compile it. To do that, use llvm-config
. Pass it the -fno-rtti
flag otherwise you'll get a link error. Also, pass it which backend libraries to use along with a list of clang libraries. See the checked-in makefile
in the project.