Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SH fails to download Ubuntu 5.0.100 #127

Closed
donJoseLuis opened this issue Jan 19, 2021 · 15 comments
Closed

SH fails to download Ubuntu 5.0.100 #127

donJoseLuis opened this issue Jan 19, 2021 · 15 comments
Assignees
Labels

Comments

@donJoseLuis
Copy link
Contributor

donJoseLuis commented Jan 19, 2021

SH script fails to download https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.100/dotnet-dev-ubuntu.16.04-x64.5.0.100.tar.gz.

In this case (404 for a resource), the problem could be

  1. the script configures a BAD resource URL (we'll fix) or
  2. a valid resource unavailable in the server (we can ask feed owner to take a look)

log:
https://dev.azure.com/dnceng/public/_build/results?buildId=954947&view=logs&j=73b7db51-6fbc-594d-25c9-15b2737b0bf6&t=590c8228-c55a-505d-ce75-e0df1b5682ea

@ViktorHofer heads up on this issue, transfer of problem in dotnet/runtime#34015 which had a different root-cause.

@dougbu
Copy link
Member

dougbu commented Jan 20, 2021

@donJoseLuis should @jaredpar or @VictorHofer make this into another "live" issue❔ I'm not offering their services 😺 but do miss the information about recent events of this type.

@ViktorHofer
Copy link
Member

I don't know a Victor so I don't feel obliged ;)

@dougbu
Copy link
Member

dougbu commented Jan 20, 2021

Apogees for my spelling 😃

@AR-May
Copy link
Member

AR-May commented Jan 20, 2021

Error 404 for the legacy link https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.100/dotnet-dev-ubuntu.16.04-x64.5.0.100.tar.gz is expectable, as they should be valid only for old versions of SDK. The actual error was

dotnet-install: Downloading link: https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.100/dotnet-sdk-5.0.100-linux-x64.tar.gz
curl: (18) transfer closed with 117694364 bytes remaining to read

That seems to be a temporary problem with either the storage or the network. Anyway, the install script for SDK version 5.0.100 works well and the resource https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.100/dotnet-sdk-5.0.100-linux-x64.tar.gz is now available. So, the bug is not reproducible anymore.

@AR-May
Copy link
Member

AR-May commented Jan 20, 2021

BTW, we fixed the log messages to avoid this confusion with legacy links' errors recently. So, as soon as changes for issue #11 are deployed, there should be less confusion about the actual error.

@donJoseLuis
Copy link
Contributor Author

@AR-May thank you.
@ViktorHofer greetings. The root-cause appears to be a temporary network glitch. Can you please confirm the impact is gone?

@donJoseLuis
Copy link
Contributor Author

donJoseLuis commented Jan 21, 2021

We have a monitoring system which triggers script runs for both PowerShell and SH, and when these runs fail, an incident is created. We immediately address such cases so far proactively without customer impact. We are simply not observing these intermittent failures.

In this case, the socket was closed during the download, "curl: (18) transfer closed with 117694364 bytes remaining to read". This issue derived from dotnet/runtime#34015, whose root-cause was "problems in a custom CURL used by the run-time team's build agents". That particular ticket had been reopened twice prior to this latest case.

The scripts are doing the right thing & we confirmed the feed system had the proper bits available. Thus, the likely suspects remain (1) custom CURL & (2) intermittent network issues. However, our monitoring is not encountering these networking hiccups.

One way to root-cause would be to collect a network trace once the problem occurs on these agents and trying a standard CURL.

We'll think about how to wisely improve resiliency to handle such external factors in SH particularly. We maybe able to detect specific failure reasons when sockets are closed & retry in some cases.

@dougbu
Copy link
Member

dougbu commented Jan 22, 2021

@donJoseLuis was your monitoring system triggered for the failures discussed recently in the ASP.NET Core Build Teams channel

Here's some of the conversation:

Error is usually curl: (56) Unexpected EOF though I see curl: (16) Error in the HTTP2 framing layer in https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-aspnetcore-refs-heads-master-3c9cec8f4bb24547ab/Microsoft.AspNetCore.Authentication.AzureAD.FunctionalTests--net6.0/console.03f3bf41.log?sv=2019-07-07 from https://dev.azure.com/dnceng/public/_build/results?buildId=953173. The EOF builds were:

Mostly 6.0.100-alpha.1.20523.3 but one failure w/ 6.0.100-alpha.1.21063.4

A few more examples of recent macOS SDK download failures:

@dougbu
Copy link
Member

dougbu commented Jan 22, 2021

/fyi @halter73

@bekir-ozturk
Copy link
Contributor

In hopes of improving the resiliency against such issues, we are adding retry logic into dotnet-install.sh script.
Please track the related issue here #95.

@bekir-ozturk bekir-ozturk reopened this Jan 27, 2021
@donJoseLuis donJoseLuis removed their assignment Feb 3, 2021
@bekir-ozturk
Copy link
Contributor

Hi @dougbu,
Now that the retry functionality is in place, can you observe for a bit and see if it had a positive effect on mitigating the issues?

@dougbu
Copy link
Member

dougbu commented Feb 5, 2021

@bozturkMSFT yes, we're watching. /cc @dotnet/aspnet-build

@captainsafia please mention need to report curl failures here and occasional need to check for Helix logging that matches "Download attempt #$attempts has failed: $http_code $download_error_msg", "Attempt #$((attempts+1)) will start in $((attempts*10)) seconds.", or the errors listed above in your handoff. runfo or https://runfo.azurewebsites.net/ can help here.

@bekir-ozturk
Copy link
Contributor

@dougbu Any updates? Let me know if we can close this issue.

@dougbu
Copy link
Member

dougbu commented Feb 23, 2021

@bozturkMSFT, @JunTaoLuo updated our Helix setup to retry the downloads but, last I heard, not much more information was visible.

@JunTaoLuo please link a few builds here that needed retries…

@bekir-ozturk
Copy link
Contributor

Closing this. Let me know if the issue persists and there is something we can do on the install-scripts side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants