[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

EurFelux · 2024-11-01T03:52:31Z

Debugging checklist

Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version?
Have you tried rerunning the command with the --clean flag?

Describe the issue
I encountered an ImportError requiring Chinese tokenization support, which conflicts with numpy and sklearn versions.

I installed MFA via conda, and the version of numpy is 1.26.4.

I tried aligning on a Mandarin corpus, but the terminal prompted that I needed to install dependencies.

ImportError: Please install Chinese tokenization support via pip install spacy-pkuseg dragonmapper hanziconv.

However, spacy-pkuseg requires numpy>=2.0.0. I attempted to run the command provided in the error message, which updated Numpy to 2.0.2. However, if I install Numpy 2.0.2, the dependency installed with MFA is sklearn 1.2.2, and these two packages seem to conflict. I encountered an error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

And I found a solution here: StackOverflow, which suggested downgrading numpy to 1.26.4.

I ultimately found a temporary solution by specifying an older version of spacy-pkuseg:

pip install spacy-pkuseg==0.0.33

This version only need numpy>=1.19.0.

I hope MFA can resolve this dependency issue and update the documentation, as there are no instructions indicating that I need to install these dependencies, but I receive a prompt when executing mfa align ....

For Reproducing your issue

Corpus structure
- What language is the corpus in? Mandarin
- How many files/speakers? 1 speaker, 1 audio and 1 text. Just for test.
- Are you using lab files or TextGrid files for input? No.
Dictionary
- Are you using a dictionary from MFA? If so, which one? Yes. mandarin_china_mfa
- If it's a custom dictionary, what is the phoneset?
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? Yes. mandarin_mfa
- If it's a model you've trained, what data was it trained on?

To reproduce:

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
mfa model download acoustic mandarin_mfa
mfa model download dictionary mandarin_china_mfa
mfa validate CORPUS_DIRECTORY mandarin_china_mfa
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY
pip install spacy-pkuseg dragonmapper hanziconv
mfa align CORPUS_DIRECTORY mandarin_china_mfa mandarin_mfa OUTPUT_DIRECTORY

Log file
sp1.log

Desktop (please complete the following information):

OS: Linux
Version: Ubuntu 20.04.6 LTS

Additional context

The text was updated successfully, but these errors were encountered:

chenchenzi · 2024-11-10T16:10:08Z

I encoutered the same issue too. Thanks for the tip of installing spacy-pkuseg version 0.0.33.

EurFelux added the bug label Nov 1, 2024

EurFelux assigned mmcauliffe Nov 1, 2024

This was referenced Nov 18, 2024

[BUG] Mandarin MFA not working #849

Closed

[BUG] mfa validate for mandarin not working #842

Closed

mmcauliffe mentioned this issue Dec 2, 2024

Update pinned dependencies #853

Merged

mmcauliffe closed this as completed in #853 Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

EurFelux commented Nov 1, 2024

chenchenzi commented Nov 10, 2024 •

edited

Loading

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

[BUG]Dependency Issues with MFA Installation and Chinese Tokenization Support #843

Comments

EurFelux commented Nov 1, 2024

chenchenzi commented Nov 10, 2024 • edited Loading

chenchenzi commented Nov 10, 2024 •

edited

Loading