Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add latexocr docs and fix some typos #13532

Merged
merged 1 commit into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ hide:

PaddleOCR收集整理了自从开源以来在issues和用户群中的常见问题并且给出了简要解答,旨在为OCR的开发者提供一些参考,也希望帮助大家少走一些弯路。

其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题,在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。
其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题,在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2-paddleocr)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。

同时PaddleOCR也会在review issue的过程中添加 `good issue`、 `good first issue` 标签,但这些问题可能不会被立刻补充在FAQ文档里,开发者也可对应查看。我们也非常希望开发者能够帮助我们将这些内容补充在FAQ中。

Expand Down Expand Up @@ -234,9 +234,7 @@ A:训练集精度90,测试集70多的话,应该是过拟合了,有两个

#### Q: 对于小白如何快速入门中文OCR项目实践?

A:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:PaddleOCR

AI 快车道课程:<https://aistudio.baidu.com/aistudio/course/introduce/1519>
A:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:PaddleOCR AI 快车道课程:<https://aistudio.baidu.com/aistudio/course/introduce/1519>

## 2. PaddleOCR实战问题

Expand Down
Original file line number Diff line number Diff line change
@@ -1,20 +1,5 @@
# LaTeX-OCR

- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Pickle File Generation](#3-1)
- [3.2 Training](#3-2)
- [3.3 Evaluation](#3-3)
- [3.4 Prediction](#3-4)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)

<a name="1"></a>
## 1. Introduction

Original Project:
Expand All @@ -25,21 +10,19 @@ Using LaTeX-OCR printed mathematical expression recognition datasets for trainin

| Model | Backbone |config| BLEU score | normed edit distance | ExpRate |Download link|
|-----------|----------| ---- |:-----------:|:---------------------:|:---------:| ----- |
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|

<a name="2"></a>
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md) to clone the project code.

Furthermore, additional dependencies need to be installed:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. Model Training / Evaluation / Prediction

Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Please refer to [Text Recognition Tutorial](../../ppocr/model_train/recognition.en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.

Pickle File Generation:

Expand Down Expand Up @@ -90,10 +73,8 @@ Prediction:
python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml -o Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True Global.infer_img='./doc/datasets/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
```

<a name="4"></a>
## 4. Inference and Deployment

<a name="4-1"></a>
### 4.1 Python Inference
First, the model saved during the LaTeX-OCR printed mathematical expression recognition training process is converted into an inference model. you can use the following command to convert:

Expand All @@ -109,23 +90,16 @@ For LaTeX-OCR printed mathematical expression recognition model inference, the f
python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/" --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
```

<a name="4-2"></a>
### 4.2 C++ Inference

Not supported

<a name="4-3"></a>
### 4.3 Serving

Not supported

<a name="4-4"></a>
### 4.4 More

Not supported

<a name="5"></a>
## 5. FAQ


```
Original file line number Diff line number Diff line change
@@ -1,48 +1,27 @@
# 印刷数学公式识别算法-LaTeX-OCR

- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 pickle 标签文件生成](#3-1)
- [3.2 训练](#3-2)
- [3.3 评估](#3-3)
- [3.4 预测](#3-4)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)

<a name="1"></a>
## 1. 算法简介

原始项目:
> [https://github.com/lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)



<a name="model"></a>
`LaTeX-OCR`使用[`LaTeX-OCR印刷公式数据集`](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)进行训练,在对应测试集上的精度如下:

| 模型 | 骨干网络 |配置文件 | BLEU score | normed edit distance | ExpRate |下载链接|
|-----------|------------| ----- |:-----------:|:---------------------:|:---------:| ----- |
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|

<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
请先参考[《运行环境准备》](../../ppocr/environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](../../ppocr/blog/clone.md)克隆项目代码。

此外,需要安装额外的依赖:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. 模型训练、评估、预测

<a name="3-1"></a>

### 3.1 pickle 标签文件生成
从[谷歌云盘](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)中下载 formulae.zip 和 math.txt,之后,使用如下命令,生成 pickle 标签文件。

Expand All @@ -63,7 +42,7 @@ python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR

### 3.2 模型训练

请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](../../configs/rec/rec_latex_ocr.yml)。
请参考[文本识别训练教程](../../ppocr/model_train/recognition.md)。PaddleOCR对代码进行了模块化,训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)。

#### 启动训练

Expand All @@ -83,7 +62,6 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
python3 tools/train.py -c configs/rec/rec_latex_ocr.yml -o Global.eval_batch_step=[0,{length_of_dataset//batch_size*22}]
```

<a name="3-2"></a>
### 3.3 评估

可下载已训练完成的[模型文件](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar),使用如下命令进行评估:
Expand All @@ -96,7 +74,6 @@ python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_mode
python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
```

<a name="3-3"></a>
### 3.4 预测

使用如下命令进行单张图片预测:
Expand All @@ -106,12 +83,10 @@ python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml -o Architecture.Ba
# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/datasets/pme_demo/'。
```

<a name="4"></a>
## 4. 推理部署

<a name="4-1"></a>
### 4.1 Python推理
首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar) ),可以使用如下命令进行转换:
首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar),可以使用如下命令进行转换:

```shell
# 注意将pretrained_model的路径设置为本地路径。
Expand Down Expand Up @@ -140,7 +115,7 @@ python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.
```
&nbsp;

![测试图片样例](../datasets/pme_demo/0000295.png)
![测试图片样例](../../datasets/images/pme_demo/0000295.png)

执行命令后,上面图像的预测结果(识别的文本)会打印到屏幕上,示例如下:
```shell
Expand All @@ -155,22 +130,18 @@ Predicts of ./doc/datasets/pme_demo/0000295.png:\zeta_{0}(\nu)=-{\frac{\nu\varrh
- 如果您修改了预处理方法,需修改`tools/infer/predict_rec.py`中 LaTeX-OCR 的预处理为您的预处理方法。


<a name="4-2"></a>
### 4.2 C++推理部署

由于C++预处理后处理还未支持 LaTeX-OCR,所以暂未支持

<a name="4-3"></a>
### 4.3 Serving服务化部署

暂不支持

<a name="4-4"></a>
### 4.4 更多推理部署

暂不支持

<a name="5"></a>
## 5. FAQ

1. LaTeX-OCR 数据集来自于[LaTeXOCR源repo](https://github.com/lukas-blecher/LaTeX-OCR) 。
1 change: 1 addition & 0 deletions docs/algorithm/overview.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ On the TextZoom public dataset, the effect of the algorithm is as follows:
Supported formula recognition algorithms (Click the link to get the tutorial):

- [x] [CAN](./formula_recognition/algorithm_rec_can.en.md)
- [x] [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.en.md)

On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:

Expand Down
1 change: 1 addition & 0 deletions docs/algorithm/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
已支持的公式识别算法列表(戳链接获取使用教程):

- [x] [CAN](./formula_recognition/algorithm_rec_can.md)
- [x] [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.md)

在CROHME手写公式数据集上,算法效果如下:

Expand Down
25 changes: 0 additions & 25 deletions docs/community/community_contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,28 +114,3 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
- 合入代码之后会在本文档第一节中更新信息,默认链接为github名字及主页,如果有需要更换主页,也可以联系我们。
- 新增重要功能类,会在用户群广而告之,享受开源社区荣誉时刻。
- **如果您有基于PaddleOCR的项目,但未出现在上述列表中,请按照 `4. 联系我们` 的步骤与我们联系。**

## 附录:社区常规赛积分榜

| 开发者| 总积分 | 开发者| 总积分 |
| ---- | ------ | ----- | ------ |
| [RangeKing](https://github.com/RangeKing) | 220 | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [hao6699](https://github.com/hao6699) | 145 | [v3fc](https://github.com/v3fc) | 35 |
| [mymagicpower](https://github.com/mymagicpower) | 140 | [imiyu](https://github.com/imiyu) | 30 |
| [raoyutian](https://github.com/raoyutian) | 90 | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 80 | [daassh](https://github.com/daassh) | 23 |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 70 | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 20 |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 70 | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 60 | [chccc1994](https://github.com/chccc1994) | 20 |
| [edencfc](https://github.com/edencfc) | 57 | [BeyondYourself](https://github.com/BeyondYourself) | 20 |
| [zhangyingying520](https://github.com/zhangyingying520) | 57 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [telppa](https://github.com/telppa) | 40 | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| sosojust1984 | 40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [redearly123](https://github.com/redearly123) | 40 | [JimEverest](https://github.com/JimEverest) | 10 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) | 40 | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
| [GreatV](https://github.com/GreatV) | 40 | | |
| CLXK294 | 40 | | |
2 changes: 2 additions & 0 deletions docs/ppocr/model_train/detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```

**注意:** 文本检测模型使用AMP时可能遇到训练不收敛问题,可以参考[discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions/12445)中的临时解决方案进行使用。

### 2.5 分布式训练

多机多卡训练时,通过 `--ips` 参数设置使用的机器IP地址,通过 `--gpus` 参数设置使用的GPU ID:
Expand Down