Skip to content

Commit

Permalink
improved description of "low-tech" AD vs current
Browse files Browse the repository at this point in the history
Co-authored-by: Kyle Daruwalla <[email protected]>
  • Loading branch information
MariusDrulea and darsnack authored Mar 20, 2024
1 parent fac8128 commit d989600
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion gsoc.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ The AD engine will be used for the typical DNN architectures.

### Description

The family of AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the LLVM intermediate representation (IR) output of the first compiler pass. They are very complex, takes many months or years to develop and requires specialized knowledge for this. Maintaining these packages is also big pain point: as the original developers often engage in other projects, over the years the community is left with these hard-to-maintain packages. These packages have their advantages of course, but we shall see them more like premium AD packages. They can be used, but we shall always have a baseline AD package which does the job and it's easy to maintain and improve.
The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. In this project, we will develop a "lower-tech" tape-based AD engine, in the spirit of Tracker.jl, which will be easier to maintain while offering fewer features than the existing, complex engines.

In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow and Jax. Despite their simplicity, taped-based ADs are the main tool in such succesfull deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl.

Expand Down

0 comments on commit d989600

Please sign in to comment.