Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Update lip reading example #13647

Merged
merged 48 commits into from
Feb 13, 2019
Merged

Update lip reading example #13647

merged 48 commits into from
Feb 13, 2019

Conversation

seujung
Copy link
Contributor

@seujung seujung commented Dec 14, 2018

Description

Add lip reading model using gluon

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

Tech. Prototyping그룹 정승환 and others added 2 commits December 14, 2018 14:56
@seujung seujung requested a review from szha as a code owner December 14, 2018 06:11
@roywei
Copy link
Member

roywei commented Dec 14, 2018

@mxnet-label-bot add[Example, Gluon, pr-awaiting-review]

@marcoabreu marcoabreu added Example Gluon pr-awaiting-review PR is waiting for code review labels Dec 14, 2018
Copy link
Contributor

@aaronmarkham aaronmarkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to test this end-to-end yet. I've tried the data download process a couple of times, but have had to restart due to connectivity and space issues. I'll try again later, but I thought I'd at least give some initial feedback.
I'm looking forward to seeing this work. It seems like a very cool example. Thanks for sharing it.

example/gluon/lipnet/README.md Outdated Show resolved Hide resolved
example/gluon/lipnet/README.md Outdated Show resolved Hide resolved
example/gluon/lipnet/README.md Show resolved Hide resolved
"args = dict()\n",
"args['batch_size'] = 64\n",
"args['epochs'] = 100\n",
"args['image_path'] = '/home/ubuntu/works/2018/lips_model/data/datasets/'\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make it easier if you used paths relative to where this is in the examples folder and where the data gets downloaded.

Copy link
Contributor

@soeque1 soeque1 Dec 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

example/gluon/lipnet/README.md Outdated Show resolved Hide resolved
example/gluon/lipnet/utils/download_data.py Show resolved Hide resolved

def split_seq(sam_num, n_tile):
"""
Spli the number(sam_num) into numbers by n_tile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Spli the number(sam_num) into numbers by n_tile
Split the number(sam_num) into numbers by n_tile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

example/gluon/lipnet/utils/multi.py Outdated Show resolved Hide resolved
example/gluon/lipnet/utils/multi.py Outdated Show resolved Hide resolved
example/gluon/lipnet/utils/preprocess_data.py Outdated Show resolved Hide resolved
@aaronmarkham
Copy link
Contributor

aaronmarkham commented Dec 28, 2018

Please add these to your prerequisites list:

  • scikit-image
  • scikit-video
  • dlib
  • tqdm

Also, I tried to run it without a GPU and couldn't get it to work with:

python3 main.py --use_gpu False 
# or 0

Raises this error:

mxnet.base.MXNetError: [06:52:06] src/ndarray/ndarray.cc:1233: GPU is not enabled

@aaronmarkham
Copy link
Contributor

I built the project on a GPU instance this time and was able to run main.py. However, I immediately get a dump of a lot of these errors:

  File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 166, in _recursive_fork_recordio
    if depth >= max_depth:
RecursionError: maximum recursion depth exceeded in comparison

Looks like it ran that 200 times and failed each time.

@soeque1
Copy link
Contributor

soeque1 commented Jan 25, 2019

@aaronmarkham @seujung and I resumed the revision yesterday again. We plan to finish all of the current comments by this weekend.

@vandanavk
Copy link
Contributor

is this PR good to go @thomelane @larroy ?

@thomelane
Copy link
Contributor

@aaronmarkham any chance you could zip up all of the preprocessed files, to avoid aws s3 sync downloading loads of small files? would definitely save time and bandwidth.

And then we need to get those instructions into the README, so people don't try to download and preprocess the data themselves.

@aaronmarkham
Copy link
Contributor

@thomelane Ok, I'm creating the tar files now. The reason I didn't do that is that I kept getting disconnected and wanted to be able to resume a sync. If you're pulling a 15gb file and have to start over, well, that's no fun.
At least people will have the option for either route once I've uploaded the files.
Rather than force another CI pass, if the code is alright here, let's merge it and then I can update the README in a follow-up PR.

@aaronmarkham
Copy link
Contributor

I put the tar files in a separate bucket so you can pick how you want to download.

To get the tar files:

 aws s3 sync s3://mxnet-public/lipnet/data-archives .

Or to download them by link:
https://mxnet-public.s3.amazonaws.com/lipnet/data-archives/align.tgz
https://mxnet-public.s3.amazonaws.com/lipnet/data-archives/datasets.tgz

To get the folders (unzipped):

 aws s3 sync s3://mxnet-public/lipnet/data .

@thomelane
Copy link
Contributor

@aaronmarkham thanks for uploading! sure, you can add the instructions to the readme in a different commit, wouldn't that need a CI run anyway, or has that been optimised now to ignore markdown changes?

@seujung the model seems to be training okay (i.e. loss going down), but still noticeable differences between target and prediction. How good are the predictions on a correctly trained model? Also noticed that things like learning rate aren't explicitly defined, are the defaults correct for this model?

@soeque1
Copy link
Contributor

soeque1 commented Feb 10, 2019

@aaronmarkham I checked the file you uploaded. It was nice as intended.
@thomelane @seujung and I are working together for this example.
(1) After we did some experiments, we decided to use default learning rate.
(2) We checked the prediction, decoded using beam search.

It takes too long time to train this model. The main reason is the decode part (beam search) on validation data (def infer_batch). Actually, we do not need decode all the validation examples during training, so we skip this or check only one mini batch example. To speed up, we only check the decoded result using infer.py not main.py (train).
But it helps to understand how good the result is.

(3) Although the loss is still decreasing, I attach the pre-trained model.
Model
Sample

You can get the result.

python infer.py model_path='checkpoint/epoches_81_loss_15.7157'

Or
You can resume training.

python main.py model_path='checkpoint/epoches_81_loss_15.7157'

@thomelane
Copy link
Contributor

Great, thanks for clarifying @soeque1!

@thomelane
Copy link
Contributor

LGTM

@@ -0,0 +1,194 @@
# LipNet: End-to-End Sentence-level Lipreading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# LipNet: End-to-End Sentence-level Lipreading
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
# LipNet: End-to-End Sentence-level Lipreading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License isn't required on readme files. @szha if you feel strongly about adding it, I'm going to modify the readme in another PR later today and I can add it then.

@aaronmarkham aaronmarkham merged commit 7ff6ad1 into apache:master Feb 13, 2019
@szha szha mentioned this pull request Feb 13, 2019
4 tasks
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* update lipnet

* update utils

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/preprocess_data.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/download_data.py

Co-Authored-By: seujung <[email protected]>

* fix error for using gpu mode

* Add requirements

* Remove unnecessary requirements

* Update .gitignore

* Remove inappropriate license file

* Changed relative path

* Fix description

* Fix description

* Fix description

* Fix description

* Change doc strings and add url reference

* Fix align_path

* Remove zip files

* Fix bugs: source_path, n_process

* Fix target_path

* Fix exception handler and resume the preprocess

* Pass the output when it fails to detect the mouth

* Add exception during collecting images

* Add the disk space and fix default align_path

* Change optimizer

* Update readme for pip

* Update README

* Add checkpoint folder

* Apply to train using multiprocess

* update network.py

* delete batchnorm comment
*fix dropout
* fix loading ndarray as F
* add space

* Update readme

* Add the info of GRID Data
* Add the info of word alignments
* Add total download size
* Add time for preprocessing

* Add test code for beamsearch

* add space

* delete line and fix code

* Add shebang in BeamSearch

* Fix trainer

* Add space line

* Fix appeding losses

* Fix trainer

* Delete debug line in data_loader

* Move transpose of input into data_loader

* Delete trailing-whitespace

* Hybridize lip model

* Hybridize model

* Refactor the len of input sequence

* Fix the shape of model

* Apply to split train and validation

* Split data into train and valid

* Update Readme

* Add infer.py

* Remove ipynb

* Apply to continual learning

* Add images

* Update readme

* Fix typo and pylint

* Fix loss digits of save_file and typo

* Add info of data split and batch size
drivanov pushed a commit to drivanov/incubator-mxnet that referenced this pull request Mar 4, 2019
* update lipnet

* update utils

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/preprocess_data.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/download_data.py

Co-Authored-By: seujung <[email protected]>

* fix error for using gpu mode

* Add requirements

* Remove unnecessary requirements

* Update .gitignore

* Remove inappropriate license file

* Changed relative path

* Fix description

* Fix description

* Fix description

* Fix description

* Change doc strings and add url reference

* Fix align_path

* Remove zip files

* Fix bugs: source_path, n_process

* Fix target_path

* Fix exception handler and resume the preprocess

* Pass the output when it fails to detect the mouth

* Add exception during collecting images

* Add the disk space and fix default align_path

* Change optimizer

* Update readme for pip

* Update README

* Add checkpoint folder

* Apply to train using multiprocess

* update network.py

* delete batchnorm comment
*fix dropout
* fix loading ndarray as F
* add space

* Update readme

* Add the info of GRID Data
* Add the info of word alignments
* Add total download size
* Add time for preprocessing

* Add test code for beamsearch

* add space

* delete line and fix code

* Add shebang in BeamSearch

* Fix trainer

* Add space line

* Fix appeding losses

* Fix trainer

* Delete debug line in data_loader

* Move transpose of input into data_loader

* Delete trailing-whitespace

* Hybridize lip model

* Hybridize model

* Refactor the len of input sequence

* Fix the shape of model

* Apply to split train and validation

* Split data into train and valid

* Update Readme

* Add infer.py

* Remove ipynb

* Apply to continual learning

* Add images

* Update readme

* Fix typo and pylint

* Fix loss digits of save_file and typo

* Add info of data split and batch size
@soeque1 soeque1 deleted the lipnet branch March 10, 2019 12:40
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* update lipnet

* update utils

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/preprocess_data.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/download_data.py

Co-Authored-By: seujung <[email protected]>

* fix error for using gpu mode

* Add requirements

* Remove unnecessary requirements

* Update .gitignore

* Remove inappropriate license file

* Changed relative path

* Fix description

* Fix description

* Fix description

* Fix description

* Change doc strings and add url reference

* Fix align_path

* Remove zip files

* Fix bugs: source_path, n_process

* Fix target_path

* Fix exception handler and resume the preprocess

* Pass the output when it fails to detect the mouth

* Add exception during collecting images

* Add the disk space and fix default align_path

* Change optimizer

* Update readme for pip

* Update README

* Add checkpoint folder

* Apply to train using multiprocess

* update network.py

* delete batchnorm comment
*fix dropout
* fix loading ndarray as F
* add space

* Update readme

* Add the info of GRID Data
* Add the info of word alignments
* Add total download size
* Add time for preprocessing

* Add test code for beamsearch

* add space

* delete line and fix code

* Add shebang in BeamSearch

* Fix trainer

* Add space line

* Fix appeding losses

* Fix trainer

* Delete debug line in data_loader

* Move transpose of input into data_loader

* Delete trailing-whitespace

* Hybridize lip model

* Hybridize model

* Refactor the len of input sequence

* Fix the shape of model

* Apply to split train and validation

* Split data into train and valid

* Update Readme

* Add infer.py

* Remove ipynb

* Apply to continual learning

* Add images

* Update readme

* Fix typo and pylint

* Fix loss digits of save_file and typo

* Add info of data split and batch size
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* update lipnet

* update utils

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/README.md

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/preprocess_data.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/multi.py

Co-Authored-By: seujung <[email protected]>

* Update example/gluon/lipnet/utils/download_data.py

Co-Authored-By: seujung <[email protected]>

* fix error for using gpu mode

* Add requirements

* Remove unnecessary requirements

* Update .gitignore

* Remove inappropriate license file

* Changed relative path

* Fix description

* Fix description

* Fix description

* Fix description

* Change doc strings and add url reference

* Fix align_path

* Remove zip files

* Fix bugs: source_path, n_process

* Fix target_path

* Fix exception handler and resume the preprocess

* Pass the output when it fails to detect the mouth

* Add exception during collecting images

* Add the disk space and fix default align_path

* Change optimizer

* Update readme for pip

* Update README

* Add checkpoint folder

* Apply to train using multiprocess

* update network.py

* delete batchnorm comment
*fix dropout
* fix loading ndarray as F
* add space

* Update readme

* Add the info of GRID Data
* Add the info of word alignments
* Add total download size
* Add time for preprocessing

* Add test code for beamsearch

* add space

* delete line and fix code

* Add shebang in BeamSearch

* Fix trainer

* Add space line

* Fix appeding losses

* Fix trainer

* Delete debug line in data_loader

* Move transpose of input into data_loader

* Delete trailing-whitespace

* Hybridize lip model

* Hybridize model

* Refactor the len of input sequence

* Fix the shape of model

* Apply to split train and validation

* Split data into train and valid

* Update Readme

* Add infer.py

* Remove ipynb

* Apply to continual learning

* Add images

* Update readme

* Fix typo and pylint

* Fix loss digits of save_file and typo

* Add info of data split and batch size
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Example Gluon pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants