Skip to content

Commit

Permalink
Fixing typos in documentation (#2355)
Browse files Browse the repository at this point in the history
* A typo fix in a docstring in src/layers/attention.jl

* A typo fix in src/layers/basic.jl

* Another typo in src/layers/basic.jl
  • Loading branch information
poludmik authored Nov 17, 2023
1 parent 043b28f commit b0679f2
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion src/layers/attention.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ const IntOrDims{N} = Union{Int, Dims{N}}
The multi-head dot-product attention layer used in Transformer architectures [1].
Returns the transformed input sequnce and the attention scores.
Returns the transformed input sequence and the attention scores.
[1] Vaswani et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
Expand Down
4 changes: 2 additions & 2 deletions src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -270,12 +270,12 @@ end
Maxout(f, n_alts)
This contains a number of internal layers, each of which receives the same input.
Its output is the elementwise maximum of the the internal layers' outputs.
Its output is the elementwise maximum of the internal layers' outputs.
Instead of defining layers individually, you can provide a zero-argument function
which constructs them, and the number to construct.
Maxout over linear dense layers satisfies the univeral approximation theorem.
Maxout over linear dense layers satisfies the universal approximation theorem.
See Goodfellow, Warde-Farley, Mirza, Courville & Bengio "Maxout Networks"
[https://arxiv.org/abs/1302.4389](https://arxiv.org/abs/1302.4389).
Expand Down

0 comments on commit b0679f2

Please sign in to comment.