CapsNet-ASR

Phoneme recognition with Capsule Networks

Presentation 24 april link: https://docs.google.com/presentation/d/149ZALP9stKvSWqu2N12QfzwSR06X3CsfPLHTvgIRO0U/edit?usp=sharing

Extra ideas (possible individual research questions):

Since our frames are not necessarily of only one phone, we can maybe label them as 0.75A 0.25B and compare this with the classifier distribution. This would mean that (0.75A 0.25B) predicted as (0.6A 0.2B 0.2C) is better than (0.6A 0.1B 0.3C), even though in both predictions the amount of prediciton for A is the same.

Extra reading:

First paper to use 48 (39) instead of 61 labels: http://repository.cmu.edu/cgi/viewcontent.cgi?article=2768&context=compsci

Single recurrent neural network on TIMIT phoneme classification: http://people.idsia.ch/~santiago/papers/IDSIA-04-08.pdf

Good general info about TIMIT and phoneme classification: https://www.intechopen.com/books/speech-technologies/phoneme-recognition-on-the-timit-database

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data exploration.ipynb		data exploration.ipynb
main.py		main.py
model.py		model.py
phonedict.txt		phonedict.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CapsNet-ASR

About

Releases

Packages

Contributors 3

Languages

SvenDH/CapsNet-ASR

Folders and files

Latest commit

History

Repository files navigation

CapsNet-ASR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages