-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
corpus.mallet returned non-zero exit status 1. #2876
Comments
What fixes have you tried? What output do you see when you run the I'd definitely recommend using an absolute path for the executable, not just |
Thanks for your reply. Fixes I have tried as follows:
All give the same error (non zero exit status). How do I run the command manually outside of Gensim? When I navigate to the mallet bat directory, and copy and paste into CMD (anaconda): mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.txt --output C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.mallet I get the error "MALLET requires an environment variable MALLET_HOME". I should mention that I have reverted all the above fixes and am working with a default unzipped mallet-2.0.8. Kind regards, |
Set your Also, that's not the command you posted earlier (the path at the beginning is different). |
Thanks for your speedy responses. The command I pasted was adjusted slightly because I navigated into the bin folder directly in CMD. I have now added a MALLET_HOME to environment variables in windows, and run the command again in CMD. No error message here, so I assume it worked? For future reference, if anyone else doesn't know how to do that stage, you go to system properties, and add a user environment variable (http://shiningmeadow.blogspot.com/2016/04/tutorial-for-installing-mallet-on.html). This was not clear for me when installing MALLET, as I assumed the os.environ.update command in python would take care of this on a temp basis. Running again in Python, I still get the error, however the number has changed. It is now a non-zero exit status 2 (instead of 1). |
OK. Please post the exact command you're running from CLI (which you say works), and the exact command that Gensim outputs (when it fails with exist status 2). Exact, character-for-character. Cheers. |
Sure, though I can't imagine it makes much difference, since multiple commands all give the same error. CMD command: C:\Users\DraGoN\Documents\python\mallet-2.0.8\bin>mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.txt --output C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.mallet Works fine. No error in CMD. In python: os.environ.update({'MALLET_HOME':r'C:/Users/DraGoN/Documents/python/mallet-2.0.8/'})
mallet_path = r'C:/Users/DraGoN/Documents/python/mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=20, id2word=id2word) I have tried hashing out the os.environ.update command now too. And removing/adding the "/" at the end of the dir, but it makes no difference. This is the error in full: CalledProcessError Traceback (most recent call last)
<ipython-input-19-d0c4d0ee93c2> in <module>
6 mallet_path = r'C:/Users/DraGoN/Documents/python/mallet-2.0.8/bin/mallet' # update this path
7
----> 8 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=20, id2word=id2word)
9
10 # Show Topics
D:\Apps\Anaconda\lib\site-packages\gensim\models\wrappers\ldamallet.py in __init__(self, mallet_path, corpus, num_topics, alpha, id2word, workers, prefix, optimize_interval, iterations, topic_threshold, random_seed)
129 self.random_seed = random_seed
130 if corpus is not None:
--> 131 self.train(corpus)
132
133 def finferencer(self):
D:\Apps\Anaconda\lib\site-packages\gensim\models\wrappers\ldamallet.py in train(self, corpus)
270
271 """
--> 272 self.convert_input(corpus, infer=False)
273 cmd = self.mallet_path + ' train-topics --input %s --num-topics %s --alpha %s --optimize-interval %s '\
274 '--num-threads %s --output-state %s --output-doc-topics %s --output-topic-keys %s '\
D:\Apps\Anaconda\lib\site-packages\gensim\models\wrappers\ldamallet.py in convert_input(self, corpus, infer, serialize_corpus)
259 cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet())
260 logger.info("converting temporary corpus to MALLET format with %s", cmd)
--> 261 check_output(args=cmd, shell=True)
262
263 def train(self, corpus):
D:\Apps\Anaconda\lib\site-packages\gensim\utils.py in check_output(stdout, *popenargs, **kwargs)
1930 error = subprocess.CalledProcessError(retcode, cmd)
1931 error.output = output
-> 1932 raise error
1933 return output
1934 except KeyboardInterrupt:
CalledProcessError: Command 'C:/Users/DraGoN/Documents/python/mallet-2.0.8/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\DraGoN\AppData\Local\Temp\ea532c_corpus.txt --output C:\Users\DraGoN\AppData\Local\Temp\ea532c_corpus.mallet' returned non-zero exit status 2. |
That's weird. Can you try the exact same command, with |
Yes, have tried it now in CMD. No error, everything seems fine. Out of interest, what does the non zero exit status 2 mean? And how can I confirm that Mallet is running correctly in CMD. Is there a way to import the output file manually into gensim to see if it produces anything? |
Mallet is running correctly if its output indicates training without errors. It prints a lot of information. And at the end of training, it will have created the requested output files (new files appear on your disk). Not sure what exit status 2 is. |
Ok, so new files do appear in the temp directory. So I can assume manually, it works. Interestingly, if I restart the kernel and hash the os.environ.update so that it doesn't run, I then get the error: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\DraGoN\AppData\Local\Temp\d941f0_state.mallet.gz' As soon as I unhash it, and re run the code, then it goes back to the normal non-zero exit status 2 error. Does this help diagnose it at all? |
What kernel, what "manually"? What are you actually doing? |
Sorry, I always assume everyone does it the same way. I am using jupyter notebook. Restarting the kernel, is basically clearing the cache and re-loading the code. I thought that since I had already set the MALLET_HOME environmental variable in windows, I wouldn't need to specify the code in python. However, not doing so gave the FileNotFoundError. So then I uncommented that particular line, and then it went back to the normal non-zero exit status error. Basically I am trying to figure out what exactly is causing the issue, and narrow down what parts are working from those that aren't. When I refer to manually, I mean, putting the code into CMD. Don't forget I am a horrible noob, and don't really understand gensim at the best of times. I was hoping, since I can't get the wrapper to work in jupyter/python, that perhaps there is a way to import the output file from mallet directly, so that I can use the Lda analysis and continue working in jupyter after this step. Any ideas as to what else could be wrong? |
Ok, scrap that. For whatever reason, it now works. I restarted my computer and then, boom, it spat out the LDA. I can only assume that one of the solutions above, or perhaps the environment variable was not properly set until windows had been restarted? Fingers crossed it stays working. Thanks for your speedy responses, and sticking in there with me today. It means a lot! |
No problem :) |
Problem description
So many people have had this issue, and I have tried all the fixes suggested, to no avail. The path is correct, and I have changed it multiple times to remove spaces etc. I get the same error. Bearing in mind I have no idea how to provide all the information necessary, please respond with precise instructions as to how to debug this issue.
Error is
CalledProcessError: Command 'mallet-2.0.8/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.txt --output C:\Users\DraGoN\AppData\Local\Temp\b76d8f_corpus.mallet' returned non-zero exit status 1.
I have looked and the temp files do exist in the temp directory. I have even tried editing the .bat file to hard code the mallet_home directory, and the java installation directory. Nothing works. I get the same error.
Steps/code/corpus to reproduce
Versions
Please provide the output of:
The text was updated successfully, but these errors were encountered: