Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

official Docker images for Kaldi #3284

Closed
mdoulaty opened this issue May 2, 2019 · 49 comments
Closed

official Docker images for Kaldi #3284

mdoulaty opened this issue May 2, 2019 · 49 comments
Labels
discussion stale Stale bot on the loose

Comments

@mdoulaty
Copy link
Contributor

mdoulaty commented May 2, 2019

Are there any plans to add official docker images for Kaldi on Docker Hub?
Running Kaldi inside containers might be quite helpful for some users/workloads and I think having official Kaldi images in Docker Hub would be a good thing to have
we can setup automated builds for cpu and gpu based images and I can help with the setup etc if this is something that you think would be beneficial to other users
(we've some good experience with running containerized Kaldi ASR workloads, both training and decoding on slurm cluster)

@danpovey
Copy link
Contributor

danpovey commented May 2, 2019 via email

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 2, 2019

yes, happy to help
to start with, we'll need to setup a new public repository in Docker Hub (http://hub.docker.com/), which is the container registry that we're going to use
and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi
then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed
Similar to other projects, we can have latest dev images (both CPU and GPU versions)
also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

@galv
Copy link
Contributor

galv commented May 2, 2019 via email

@galv
Copy link
Contributor

galv commented May 2, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented May 2, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented May 2, 2019 via email

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 2, 2019

@galv DockerHub supports building images there - we can also use your existing Travis CI pipeline to build, tag and push images to DockerHub, please have a look here: https://docs.travis-ci.com/user/docker/#building-a-docker-image-from-a-dockerfile
There are no issues with absolute paths or what so ever

@danpovey sure, however you feel like it's more appropriate - and sure, will include 5.4 onward

@danpovey
Copy link
Contributor

danpovey commented May 2, 2019 via email

@galv
Copy link
Contributor

galv commented May 2, 2019 via email

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 2, 2019

probably not fully understood what you meant then
regardless, inside the container you can train without having to change any folder structure of Kaldi and abs paths are fine
(can't think of why it can be an issue?)

@galv
Copy link
Contributor

galv commented May 2, 2019 via email

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 2, 2019

probably the easiest would be: I create my proposed changes in my own forks, both in github and dockerhub, then you guys have a look and if all looks good, then we integrate in the main Kaldi repo in github and docker hub and continue there.

@danpovey
Copy link
Contributor

danpovey commented May 2, 2019 via email

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 6, 2019

so here is the first version:
https://github.com/mdoulaty/kaldi/tree/master/docker

It includes both CPU and GPU based images. I also pushed both images to DockerHub:
https://cloud.docker.com/repository/docker/mdoulaty/kaldi/tags

I plan to add more image variants, a minimal image and etc.

We also need to automate the building and pushing process, which can eventually be done (not entirely sure about building GPU based images in DockerHub, we may need to build them somewhere else that we have access to a GPU)

@danpovey
Copy link
Contributor

danpovey commented May 6, 2019

Great! @galv do you have time to look into this? Sorry I have a lot to do today.

@mdoulaty
Copy link
Contributor Author

mdoulaty commented May 7, 2019

Sure, @galv please have a look and let me know how would you like to proceed

@mdoulaty
Copy link
Contributor Author

@danpovey @galv were you guys able to check the sample files?

@galv
Copy link
Contributor

galv commented May 12, 2019

Seems okay to me, although I'm not sure that you need this line anymore: https://github.com/mdoulaty/kaldi/blob/75338cbd787943537322cae194e3d1ae11e7f103/docker/ubuntu16.04-gpu/Dockerfile#L26

My understanding was that the default python was python 2.7 on all linux distros except Arch.

@mdoulaty
Copy link
Contributor Author

as far as I remember in debian:9.8 there was no python and had to explicitly softlink python2.7
will double check for both images and remove if that line is redundant.
after double checking that, should I create a PR to the main repo?

@danpovey
Copy link
Contributor

I will let @galv comment on that.

@fabito
Copy link

fabito commented May 14, 2019

so here is the first version:
https://github.com/mdoulaty/kaldi/tree/master/docker

It includes both CPU and GPU based images. I also pushed both images to DockerHub:
https://cloud.docker.com/repository/docker/mdoulaty/kaldi/tags

I plan to add more image variants, a minimal image and etc.

We also need to automate the building and pushing process, which can eventually be done (not entirely sure about building GPU based images in DockerHub, we may need to build them somewhere else that we have access to a GPU)

Just tested the cpu image (for diarization). It works like a charm..

@danpovey
Copy link
Contributor

danpovey commented May 14, 2019 via email

@mdoulaty
Copy link
Contributor Author

@fabito thanks for testing!
those are temporary locations and hopefully they will be moved to the official Kaldi repo here on GitHub as well as Docker Hub very soon

@galv
Copy link
Contributor

galv commented May 14, 2019 via email

@mdoulaty
Copy link
Contributor Author

#3322

@fabito
Copy link

fabito commented May 15, 2019

@mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

@galv
Copy link
Contributor

galv commented May 15, 2019 via email

@mdoulaty
Copy link
Contributor Author

@mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

yes, something along those lines, have two envs in the Docker file, one for building Kaldi and one with just the compiled artifacts
Still a bit unsure if it's a good idea to include the scripts or just have the core binaries there

@danpovey
Copy link
Contributor

danpovey commented May 15, 2019 via email

@jtrmal
Copy link
Contributor

jtrmal commented May 16, 2019 via email

@jtrmal
Copy link
Contributor

jtrmal commented May 16, 2019 via email

@mdoulaty
Copy link
Contributor Author

okay then the minimal images will include all the scripts
I think reproducibility issues in the ML community are more in center of attention than in speech community and containers are a good starting point. So I don't think it's just about productionising - research will also benefit

@sayint-ai
Copy link
Contributor

Does anyone actually use Kaldi dockers for training? Just curious.

@hwiorn
Copy link

hwiorn commented May 17, 2019

@sayint-ai Actually I am using kaldi docker container in my company. But my composition is complicated.

I have included executable kaldi compiled binaries(CPU, GPU), some audio utilities (eg ffmpeg, sox), SGE configuration and etc. I have not installed any kaldi scripts here.
This container image can be pulled from the internal docker registry and executed by mounting the volume path with kaldi-egs, recipe scripts and data.

I build this container on k8s and use it for model training.

@mdoulaty
Copy link
Contributor Author

@galv I enabled automatic builds in Docker Hub (for CPU only image), apparently there is a 4-hour timout limit and with the VM that they provide, the image can't be built in 4 hours (a sample failed build can be found here: https://cloud.docker.com/repository/registry-1.docker.io/mdoulaty/kaldi/builds/650bc55f-9f18-4aeb-b98f-1ced857246bd)

Then I tried integrating automatic builds in Travis, updated travis yaml and enabled Docker builds there (see https://github.com/mdoulaty/kaldi/blob/master/.travis.yml for ref on how to enable Docker builds) - this wasn't successful either, since Travis has a max limit of 50 mins (https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts)
Anyway neither of those was offering GPU support and we any way had to use some other VMs that had GPUs. Now I guess we'll have to build CPU images there as well. So not a big deal.

I'll prepare some scripts to create a VM with GPU (will use some cloud provider agnostic tech, such as Terraform) to create a VM, pull Kaldi, build the images and push them to DockerHub. Then we can put this inside another Docker image and build this small image in Travis (which triggers the main build and returns well before the 50min timeout). Any thoughts?

@danpovey
Copy link
Contributor

danpovey commented May 18, 2019 via email

@mdoulaty
Copy link
Contributor Author

the simple work around (and what they officially suggest) is to break down into smaller images
for example have one base image with some dependencies and build more layers on top - that way we can control build time of each layer (and of course what you suggest about shared libs can help in some of those layers, but not all)
certainly doable, but would make it less understandable with several layers
as I said, we any way have to use some external vms for building the GPU images, so probably running on our own infra we will have more freedom to keep the images simple and more understandable

@mdoulaty
Copy link
Contributor Author

@galv here is a working version of the automated builds:
https://github.com/mdoulaty/kaldi-image-builder
it uses Terraform (which is a provider agnostic tool) to provision a VM in the cloud, then build and push the images from there.
Currently it's scheduled to run and push nightly builds. Please have a look around and let me know if you have any questions. I'll then expand this to build GPU images as well.

@mdoulaty
Copy link
Contributor Author

mdoulaty commented Jun 5, 2019

@galv did you have a chance to check this?

@mdoulaty
Copy link
Contributor Author

mdoulaty commented Jun 6, 2019

updated GPU image scripts
Also added GPU images to the daily builds, meaning the GPU images will be built and pushed every day as well

@danpovey
Copy link
Contributor

danpovey commented Jun 6, 2019

Thanks! Let me know if there is anything you need from me, e.g. merging something.

@mdoulaty
Copy link
Contributor Author

mdoulaty commented Jun 6, 2019

Sure, just sent a PR with some minor changes
This initial part can be considered done. Two images are provided in the main repo (CPU-based and GPU-based images).
Also have a side repo which contains the automatic build & push scripts for the daily builds - probably better to keep that as a separate repo (but can move to kaldi-asr org if that makes sense). That repo includes some code for provisioning VMs (in any public or private cloud provider - as long as it's supported by Terraform, but the examples are with Microsoft Azure). I'm running those builds daily on my account and pushing the latest versions of both CPU and GPU images to Docker hub.

@danpovey
Copy link
Contributor

danpovey commented Jun 6, 2019 via email

@luitjens
Copy link
Contributor

Semi-related NVIDIA maintains a docker Kaldi image with a once a month release cycle. We try to keep the source relatively recent with TOT.

https://ngc.nvidia.com/catalog/containers/nvidia:kaldi

Note this container is tested against NVIDIA hardware to validate that things are functionally correct.

@lucgeo
Copy link

lucgeo commented Oct 25, 2019

@hwiorn : Hi, I wish to perform Kaldi training from multiple docker containers being on different physical machines. I have experience with SGE and Kaldi in the past, but I have troubles making the containers visible for SGE. Could you provide please some hints about how you configured SGE inside containers? My physical machines are in the same LAN. Thanks!

@danpovey
Copy link
Contributor

This is really a gridengine question- you should ask on the gridengine-users list. It's really a networking issue more than anything else, as you need the docker images to be individually addressable on the local network.

@mdoulaty
Copy link
Contributor Author

An alternative approach could be having the SGE running outside the containers and change queue.pl to call the command as a Docker command. For example when you run
queue.pl log.txt somescript.sh p1 p2 p3, it writes the wrapper script as:
docker run kaldiasr/kaldi:TAG -v '/shared-fs:/shared-fs .... somescript.sh p1 p2 p3 &> log.txt'
this should give your more flexibility

@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@jtrmal
Copy link
Contributor

jtrmal commented Aug 18, 2022

resolved, we now rely on github actions to get the images for docker built

@jtrmal jtrmal closed this as completed Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion stale Stale bot on the loose
Projects
None yet
Development

No branches or pull requests

9 participants