Fixing typos in documentation (#2355)

* A typo fix in a docstring in src/layers/attention.jl * A typo fix in src/layers/basic.jl * Another typo in src/layers/basic.jl
FluxML · Nov 17, 2023 · b0679f2 · b0679f2
1 parent 043b28f
commit b0679f2
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 3 deletions.
diff --git a/src/layers/attention.jl b/src/layers/attention.jl
@@ -7,7 +7,7 @@ const IntOrDims{N} = Union{Int, Dims{N}}
 
 The multi-head dot-product attention layer used in Transformer architectures [1].
 
-Returns the transformed input sequnce and the attention scores.
+Returns the transformed input sequence and the attention scores.
 
 [1] Vaswani et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
 

diff --git a/src/layers/basic.jl b/src/layers/basic.jl
@@ -270,12 +270,12 @@ end
     Maxout(f, n_alts)
 
 This contains a number of internal layers, each of which receives the same input.
-Its output is the elementwise maximum of the the internal layers' outputs.
+Its output is the elementwise maximum of the internal layers' outputs.
 
 Instead of defining layers individually, you can provide a zero-argument function
 which constructs them, and the number to construct.
 
-Maxout over linear dense layers satisfies the univeral approximation theorem.
+Maxout over linear dense layers satisfies the universal approximation theorem.
 See Goodfellow, Warde-Farley, Mirza, Courville & Bengio "Maxout Networks" 
 [https://arxiv.org/abs/1302.4389](https://arxiv.org/abs/1302.4389).