Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloning TensorFlow failed on Bazel CI #627

Closed
meteorcloudy opened this issue Apr 24, 2019 · 7 comments
Closed

Cloning TensorFlow failed on Bazel CI #627

meteorcloudy opened this issue Apr 24, 2019 · 7 comments
Labels

Comments

@meteorcloudy
Copy link
Member

https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/928#394d5b89-5927-4c2e-8c5b-0526a560c6dc


Fetching tensorflow sources at 187ac8ef39db7856bafe816c7b5d1fe565af2c25 | 11m 6s
-- | --
  |  
  |  
  | git clone --reference /var/lib/bazelbuild https://github.com/tensorflow/tensorflow.git /var/lib/buildkite-agent/builds/bk-docker-58fd/bazel-downstream-projects/tensorflow
  | Cloning into '/var/lib/buildkite-agent/builds/bk-docker-58fd/bazel-downstream-projects/tensorflow'...
  | error: RPC failed; curl 18 transfer closed with outstanding read data remaining
  | fatal: The remote end hung up unexpectedly
  | Traceback (most recent call last):
  | File "bazelci.py", line 2418, in <module>
  | sys.exit(main())
  | File "bazelci.py", line 2398, in main
  | bazel_version=task_config.get("bazel") or configs.get("bazel"),
  | File "bazelci.py", line 668, in execute_commands
  | clone_git_repository(git_repository, platform, git_commit)
  | File "bazelci.py", line 937, in clone_git_repository
  | ["git", "clone", "--reference", "/var/lib/bazelbuild", git_repository, clone_path]
  | File "bazelci.py", line 1397, in execute_command
  | return subprocess.run(args, shell=shell, check=fail_if_nonzero, env=os.environ).returncode
  | File "/usr/lib/python3.6/subprocess.py", line 418, in run
  | output=stdout, stderr=stderr)
  | subprocess.CalledProcessError: Command '['git', 'clone', '--reference', '/var/lib/bazelbuild', 'https://github.com/tensorflow/tensorflow.git', '/var/lib/buildkite-agent/builds/bk-docker-58fd/bazel-downstream-projects/tensorflow']' returned non-zero exit status 128.
  | 🚨 Error: The command exited with status 1


@meteorcloudy
Copy link
Member Author

/cc @philwo Can you take a look?

@meteorcloudy
Copy link
Member Author

@philwo
Copy link
Member

philwo commented Apr 24, 2019

I guess we'll have to switch to cloning over SSH :( For some unknown reason the HTTP connections die during git clone and there seems to be no fix.

@philwo
Copy link
Member

philwo commented Apr 25, 2019

I tried cloning via the git:// and ssh:// protocol, but they're also very slow.

It looks a bit like we're being throttled, but we use git clone --reference, so the network traffic isn't that much... :( It's also weird that it's only happening for that repository. The Bazel repository is even larger than Tensorflow's and we don't have the problems there.

@meteorcloudy Do you know if there's a mirror of the Tensorflow repository available on a Google server, similar to http://bazel.googlesource.com?

@meteorcloudy
Copy link
Member Author

@yifeif @gunan Is there a mirror of TensorFlow somewhere? Did you ever encounter similar issue before?

@gunan
Copy link

gunan commented Apr 25, 2019

I am not sure if we have a mirror.
@yifeif do we have a googlesource.com mirror of TF head?

@philwo
Copy link
Member

philwo commented May 8, 2019

This should be fixed now, thanks to the new git-mirrors feature in Buildkite, which uses a repo-specific reference instead of our former "all-in-one" repo, which seems to have caused the problems.

@philwo philwo closed this as completed May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants