Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos and add typo checker to workflows #252

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/workflows/typos.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
# yamllint disable rule:line-length
name: Typos

on: # yamllint disable-line rule:truthy
push:
pull_request:
types:
- opened
- synchronize
- reopened

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: typos-action
uses: crate-ci/[email protected]
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ If you need to install `ja_ginza_electra` along with `pytorch_model.bin` at the
$ pip install -U ginza https://github.com/megagonlabs/ginza/releases/download/latest/ja_ginza_electra-latest-with-model.tar.gz
```

If you hope to accelarate the transformers-based models by using GPUs with CUDA support, you can install `spacy` by specifying the CUDA version as follows:
If you hope to accelerate the transformers-based models by using GPUs with CUDA support, you can install `spacy` by specifying the CUDA version as follows:
```console
pip install -U "spacy[cuda110]"
```
Expand Down Expand Up @@ -287,7 +287,7 @@ Please read the official documents to compile user dictionaries with `sudachipy`
- Important changes
- Upgrade spaCy to v3
- Release transformer-based `ja-ginza-electra` model
- Improve UPOS accuracy of the standard `ja-ginza` model by adding `morphologizer` to the tail of spaCy pipleline
- Improve UPOS accuracy of the standard `ja-ginza` model by adding `morphologizer` to the tail of spaCy pipeline
- Need to insrtall analysis model along with `ginza` package
- High accuracy model (>=16GB memory needed)
- `pip install -U ginza ja-ginza-electra`
Expand Down
9 changes: 9 additions & 0 deletions _typos.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Files for typos
# Instruction: https://github.com/marketplace/actions/typos-action#getting-started

[default.extend-identifiers]

[default.extend-words]

[files]
extend-exclude = ["requirements.txt"]
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ Contains information from mC4 which is made available under the ODC Attribution
- 重要な変更
- プラットフォームをspaCy v3に変更
- transformersモデルを採用して飛躍的に精度を向上した解析モデルパッケージ`ja-ginza-electra`をリリースしました。
- 従来型の解析モデルパッケージ`ja-ginza`のpiplelineに`morphologizer`を追加し、UD品詞解析精度を向上しました。
- 従来型の解析モデルパッケージ`ja-ginza`のpipelineに`morphologizer`を追加し、UD品詞解析精度を向上しました。
- transformersモデルの追加に伴いGiNZA v5インストール時は`ginza`パッケージとともに解析モデルパッケージを明示的に指定する必要があります
- 解析精度重視モデル (メモリ容量16GB以上を推奨)
- `pip install -U ginza ja-ginza-electra`
Expand Down
2 changes: 1 addition & 1 deletion ginza/compound_splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def morph(dtoken):
print(list(enumerate(doc.user_data["sub_tokens"])), file=sys.stderr)
raise e

# work-around: retokenize() does not consider the head of the splitted tokens
# work-around: retokenize() does not consider the head of the split tokens
if not compounds:
for t in doc:
if t.i < token_i or token_i + len(sub_tokens) <= t.i:
Expand Down