Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

Closed
3 tasks done
EurFelux opened this issue Nov 1, 2024 · 1 comment · Fixed by #853
Closed
3 tasks done

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

EurFelux opened this issue Nov 1, 2024 · 1 comment · Fixed by #853
Assignees
Labels

Comments

@EurFelux
Copy link

EurFelux commented Nov 1, 2024

Debugging checklist

Describe the issue
I encountered an ImportError requiring Chinese tokenization support, which conflicts with numpy and sklearn versions.

I installed MFA via conda, and the version of numpy is 1.26.4.

I tried aligning on a Mandarin corpus, but the terminal prompted that I needed to install dependencies.

ImportError: Please install Chinese tokenization support via pip install spacy-pkuseg dragonmapper hanziconv.

However, spacy-pkuseg requires numpy>=2.0.0. I attempted to run the command provided in the error message, which updated Numpy to 2.0.2. However, if I install Numpy 2.0.2, the dependency installed with MFA is sklearn 1.2.2, and these two packages seem to conflict. I encountered an error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

And I found a solution here: StackOverflow, which suggested downgrading numpy to 1.26.4.

I ultimately found a temporary solution by specifying an older version of spacy-pkuseg:

pip install spacy-pkuseg==0.0.33

This version only need numpy>=1.19.0.

I hope MFA can resolve this dependency issue and update the documentation, as there are no instructions indicating that I need to install these dependencies, but I receive a prompt when executing mfa align ....

For Reproducing your issue

  1. Corpus structure
    • What language is the corpus in? Mandarin
    • How many files/speakers? 1 speaker, 1 audio and 1 text. Just for test.
    • Are you using lab files or TextGrid files for input? No.
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? Yes. mandarin_china_mfa
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? Yes. mandarin_mfa
    • If it's a model you've trained, what data was it trained on?

To reproduce:

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
mfa model download acoustic mandarin_mfa
mfa model download dictionary mandarin_china_mfa
mfa validate CORPUS_DIRECTORY mandarin_china_mfa
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY
pip install spacy-pkuseg dragonmapper hanziconv
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY

Log file
sp1.log

Desktop (please complete the following information):

  • OS: Linux
  • Version: Ubuntu 20.04.6 LTS

Additional context

@chenchenzi
Copy link

chenchenzi commented Nov 10, 2024

I encoutered the same issue too. Thanks for the tip of installing spacy-pkuseg version 0.0.33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants