megagonlabs · shirayu · Sep 29, 2022 · Sep 29, 2022
diff --git a/.github/workflows/typos.yml b/.github/workflows/typos.yml
@@ -0,0 +1,21 @@
+---
+# yamllint disable rule:line-length
+name: Typos
+
+on:  # yamllint disable-line rule:truthy
+  push:
+  pull_request:
+    types:
+      - opened
+      - synchronize
+      - reopened
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: typos-action
+        uses: crate-ci/[email protected]
diff --git a/README.md b/README.md
@@ -91,7 +91,7 @@ If you need to install `ja_ginza_electra` along with `pytorch_model.bin` at the
 $ pip install -U ginza https://github.com/megagonlabs/ginza/releases/download/latest/ja_ginza_electra-latest-with-model.tar.gz
 ```
 
-If you hope to accelarate the transformers-based models by using GPUs with CUDA support, you can install `spacy` by specifying the CUDA version as follows:
+If you hope to accelerate the transformers-based models by using GPUs with CUDA support, you can install `spacy` by specifying the CUDA version as follows:
 ```console
 pip install -U "spacy[cuda110]"
 ```
@@ -287,7 +287,7 @@ Please read the official documents to compile user dictionaries with `sudachipy`
 - Important changes
   - Upgrade spaCy to v3
     - Release transformer-based `ja-ginza-electra` model
-    - Improve UPOS accuracy of the standard `ja-ginza` model by adding `morphologizer` to the tail of spaCy pipleline
+    - Improve UPOS accuracy of the standard `ja-ginza` model by adding `morphologizer` to the tail of spaCy pipeline
   - Need to insrtall analysis model along with `ginza` package
     - High accuracy model (>=16GB memory needed)
       - `pip install -U ginza ja-ginza-electra`

diff --git a/_typos.toml b/_typos.toml
@@ -0,0 +1,9 @@
+# Files for typos
+# Instruction:  https://github.com/marketplace/actions/typos-action#getting-started
+
+[default.extend-identifiers]
+
+[default.extend-words]
+
+[files]
+extend-exclude = ["requirements.txt"]
diff --git a/docs/index.md b/docs/index.md
@@ -315,7 +315,7 @@ Contains information from mC4 which is made available under the ODC Attribution
 - 重要な変更
   - プラットフォームをspaCy v3に変更
   - transformersモデルを採用して飛躍的に精度を向上した解析モデルパッケージ`ja-ginza-electra`をリリースしました。
-  - 従来型の解析モデルパッケージ`ja-ginza`のpiplelineに`morphologizer`を追加し、UD品詞解析精度を向上しました。
+  - 従来型の解析モデルパッケージ`ja-ginza`のpipelineに`morphologizer`を追加し、UD品詞解析精度を向上しました。
   - transformersモデルの追加に伴いGiNZA v5インストール時は`ginza`パッケージとともに解析モデルパッケージを明示的に指定する必要があります
     - 解析精度重視モデル (メモリ容量16GB以上を推奨)
       - `pip install -U ginza ja-ginza-electra`

diff --git a/ginza/compound_splitter.py b/ginza/compound_splitter.py
@@ -127,7 +127,7 @@ def morph(dtoken):
                     print(list(enumerate(doc.user_data["sub_tokens"])), file=sys.stderr)
                     raise e
 
-                # work-around: retokenize() does not consider the head of the splitted tokens
+                # work-around: retokenize() does not consider the head of the split tokens
                 if not compounds:
                     for t in doc:
                         if t.i < token_i or token_i + len(sub_tokens) <= t.i: