Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

mlcp still not working on Windows #542

Closed
patrickmcelwee opened this issue Oct 24, 2015 · 8 comments
Closed

mlcp still not working on Windows #542

patrickmcelwee opened this issue Oct 24, 2015 · 8 comments

Comments

@patrickmcelwee
Copy link

There is something happening in mlcp.bat that we are not reproducing. I discovered this working with a consultant on a Windows box.

When we ran mlcp through Roxy, we got:

C:\MarkLogic\ml-hebrew-search>ml.bat local deploy_data
java -cp "C:/MarkLogic/mlcp/mlcp-1.3-3/lib/avro-1.7.4.jar;C:/MarkLogic/mlcp/mlcp
-1.3-3/lib/commons-cli-1.2.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/commons-codec-1.
4.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/commons-collections-3.2.1.jar;C:/MarkLogi
c/mlcp/mlcp-1.3-3/lib/commons-configuration-1.6.jar;C:/MarkLogic/mlcp/mlcp-1.3-3
/lib/commons-httpclient-3.1.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/commons-io-2.4.
jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/commons-lang-2.6.jar;C:/MarkLogic/mlcp/mlcp
-1.3-3/lib/commons-logging-1.1.3.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/commons-mo
deler-2.0.1.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/guava-11.0.2.jar;C:/MarkLogic/m
lcp/mlcp-1.3-3/lib/hadoop-annotations-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib
/hadoop-auth-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop-common-2.6.0.jar;
C:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop-hdfs-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.
3-3/lib/hadoop-mapreduce-client-common-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/li
b/hadoop-mapreduce-client-core-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop
-mapreduce-client-jobclient-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop-ya
rn-api-2.6.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop-yarn-client-2.6.0.jar;C
:/MarkLogic/mlcp/mlcp-1.3-3/lib/hadoop-yarn-common-2.6.0.jar;C:/MarkLogic/mlcp/m
lcp-1.3-3/lib/htrace-core-3.0.4.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jackson-cor
e-asl-1.9.13.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jackson-jaxrs-1.8.3.jar;C:/Mar
kLogic/mlcp/mlcp-1.3-3/lib/jackson-mapper-asl-1.9.13.jar;C:/MarkLogic/mlcp/mlcp-
1.3-3/lib/jackson-xc-1.8.3.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jena-arq-2.10.0.
jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jena-core-2.10.0.jar;C:/MarkLogic/mlcp/mlcp
-1.3-3/lib/jena-iri-0.9.5.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jersey-client-1.9
.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/jersey-core-1.9.jar;C:/MarkLogic/mlcp/mlcp
-1.3-3/lib/junit-4.11.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/log4j-1.2.17.jar;C:/M
arkLogic/mlcp/mlcp-1.3-3/lib/marklogic-mapreduce2-2.1-3.jar;C:/MarkLogic/mlcp/ml
cp-1.3-3/lib/marklogic-xcc-8.0-3.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/mlcp-1.3-3
.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/protobuf-java-2.5.0.jar;C:/MarkLogic/mlcp/
mlcp-1.3-3/lib/slf4j-api-1.6.1.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/slf4j-log4j1
2-1.6.4.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/solr-commons-csv-3.5.0.jar;C:/MarkL
ogic/mlcp/mlcp-1.3-3/lib/xercesImpl-2.10.0.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/
xml-apis-1.4.01.jar;C:/MarkLogic/mlcp/mlcp-1.3-3/lib/xpp3-1.1.3.3.jar;C:/MarkLog
ic/mlcp/mlcp-1.3-3/lib/xstream-1.4.2.jar" -Xmx512m com.marklogic.contentpump.Con
tentPump import -input_file_path data -output_uri_prefix / -output_collections h
ebrew -output_uri_replace "/Users/pmcelwee/work/dev/hebrew/,''" -transform_modul
e /transform/hebrew-transform.xqy -transform_namespace http://marklogic.com/tran
sform/hebrew -transform_function transform -output_permissions hebrew-role,read,
hebrew-role,update,hebrew-role,insert,hebrew-role,execute -username admin -pass
word ****** -host localhost -port 8041

log4j:WARN No appenders could be found for logger (com.marklogic.contentpump.Con
tentPump).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more in
fo.
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
loadPermissionInfo(RawLocalFileSystem.java:582)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.fs.LocatedFileStatus.(LocatedFileStatus.java:
42)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1699)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1681)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedL
istStatus(FileInputFormat.java:303)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(File
InputFormat.java:264)
at com.marklogic.contentpump.FileAndDirectoryInputFormat.getSplits(FileA
ndDirectoryInputFormat.java:80)
at com.marklogic.contentpump.CombineDocumentInputFormat.getSplits(Combin
eDocumentInputFormat.java:64)
at com.marklogic.contentpump.LocalJobRunner.run(LocalJobRunner.java:128)

    at com.marklogic.contentpump.ContentPump.runJobLocally(ContentPump.java:
  1. at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:204
    )
    at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)

This was despite setting HADOOP_HOME and explicitly setting the mlcp/bin directory into the PATH. But using the bat file directly worked:

mlcp.bat import -input_file_path data -output_uri_prefix / -output_collections hebrew -transform_module /transform/hebrew-transform.xqy -transform_namespace http://marklogic.com/transform/hebrew -transform_function transform -output_permissions hebrew-role,read,hebrew-role,update,hebrew-role,insert,hebrew-role,execute -username admin -password ****** -host localhost -port 8041

@DALDEI
Copy link

DALDEI commented Oct 24, 2015

Could you enable 'echo on' or refer to where the java call is made ? I can't find it in ml.bat
Useful would be a 'set' command just before java to dump the env vars
The log4j file being referenced comes with mlcp I believe not Hadoop, so the relevant class path
Or env vars wouldn't be fixed by setting HADOOP_HOME

I can check the source build along with mlcp.bat
If mlcp.bat is working that means the log4j configuration. file was found, probably as a resource.
So it must exist in the deployed build.
There are log4j env vars and java properties that can be set to debug where it looked and found or failed
( I need to look that up .. Should be on the log4j 1.2 web site )

@grtjn
Copy link
Contributor

grtjn commented Oct 24, 2015

Are you running master branch? If so, upgrade to dev with ml upgrade --branch=dev..

@patrickmcelwee
Copy link
Author

@DALDEI To clarify, this was the result of calling an mlcp method defined by Roxy. I meant that Roxy probably has to replicate something that mlcp.bat is successfully doing.

@grtjn Yes, we were running off the master branch.

This was the error once I got the log4j file working as was committed in #480

15/10/29 11:04:38 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
        at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
        at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:94)
        at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)

I was able to fix this by setting HADOOP_HOME to the mlcp directory.

But then I got an error identical to this one on Stack Overflow: http://stackoverflow.com/questions/31113529/marklogic-error-while-importing-files-using-mlcp

I didn't see a solution there, but thought I would post this in case this rings a bell for someone. Hope to have a solution soon.

@patrickmcelwee
Copy link
Author

This new problem occurs whether I run mlcp through Roxy or directly. I am going to close this because I think it is mlcp-specific now, not a Roxy problem. The Roxy problems would be solved by #480 and #529

More details on the mlcp error: I am on Windows 10 and running mlcp 1.3-3, with Java 8. The input_file_path exists (I get a different error if I pass something non-existent). The file permissions look ok. Specifying a relative or an absolute path to the files did not make a difference. Messing with the hosts file and specifying the host directly (not as localhost) also did not have an impact.

C:\Users\IEUser\dev\ml-hebrew-search>C:\Users\IEUser\dev\mlcp-1.3-3\mlcp-1.3-3\bin\mlcp.bat import -input_file_path C:\Users\IEUser\dev\ml-hebrew-search\data -output_uri_prefix / -output_collections hebrew -transform_module /transform/hebrew-transform.xqy -transform_namespace http://marklogic.com/transform/hebrew -transform_function transform -output_permissions hebrew-role,read,hebrew-role,update,hebrew-role,insert,hebrew-role,execute  -username admin -password admin -host localhost -port 8041
15/10/29 13:28:58 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/10/29 13:28:58 INFO contentpump.LocalJobRunner: Content type is set to MIXED.  The format of the  inserted documents will be determined by the MIME  type specification configured on MarkLogic Server.
15/10/29 13:28:59 ERROR contentpump.ContentPump: Error running a ContentPump job
java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:42)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1699)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1681)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:303)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
        at com.marklogic.contentpump.FileAndDirectoryInputFormat.getSplits(FileAndDirectoryInputFormat.java:80)
        at com.marklogic.contentpump.CombineDocumentInputFormat.getSplits(CombineDocumentInputFormat.java:64)
        at com.marklogic.contentpump.LocalJobRunner.run(LocalJobRunner.java:128)
        at com.marklogic.contentpump.ContentPump.runJobLocally(ContentPump.java:307)
        at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:204)
        at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)

        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:620)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:42)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1699)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1681)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:303)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
        at com.marklogic.contentpump.FileAndDirectoryInputFormat.getSplits(FileAndDirectoryInputFormat.java:80)
        at com.marklogic.contentpump.CombineDocumentInputFormat.getSplits(CombineDocumentInputFormat.java:64)
        at com.marklogic.contentpump.LocalJobRunner.run(LocalJobRunner.java:128)
        at com.marklogic.contentpump.ContentPump.runJobLocally(ContentPump.java:307)
        at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:204)
        at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)
java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:42)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1699)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1681)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:303)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
        at com.marklogic.contentpump.FileAndDirectoryInputFormat.getSplits(FileAndDirectoryInputFormat.java:80)
        at com.marklogic.contentpump.CombineDocumentInputFormat.getSplits(CombineDocumentInputFormat.java:64)
        at com.marklogic.contentpump.LocalJobRunner.run(LocalJobRunner.java:128)
        at com.marklogic.contentpump.ContentPump.runJobLocally(ContentPump.java:307)
        at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:204)
        at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)

        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:620)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:42)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1699)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1681)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:303)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
        at com.marklogic.contentpump.FileAndDirectoryInputFormat.getSplits(FileAndDirectoryInputFormat.java:80)
        at com.marklogic.contentpump.CombineDocumentInputFormat.getSplits(CombineDocumentInputFormat.java:64)
        at com.marklogic.contentpump.LocalJobRunner.run(LocalJobRunner.java:128)
        at com.marklogic.contentpump.ContentPump.runJobLocally(ContentPump.java:307)
        at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:204)
        at com.marklogic.contentpump.ContentPump.main(ContentPump.java:67)

@grtjn
Copy link
Contributor

grtjn commented Nov 2, 2015

Was this closed by accident?

@patrickmcelwee
Copy link
Author

No, I closed it because I think the problem is specific to mlcp itself, and not Roxy.

@grtjn
Copy link
Contributor

grtjn commented Nov 10, 2015

Depends if Roxy makes the wrong call to MLCP, which causes this to show up.. :)
But you show output that runs mlcp directly, so sounds like you are right..

@patrickmcelwee
Copy link
Author

For anyone stumbling on this thread, I eventually fixed java.lang.RuntimeException: Error while running command to get file permissions by changing the -input_file_path to -input_file_path C:\Users\IEUser\dev\ml-hebrew-search\data\*.xml ... apparently mlcp + Windows 10 requires individual files to be specified.

I found the solution here: http://stackoverflow.com/a/34075351/971445

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants