Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release(prod): Update miner to latest GA 2021.11.22.0 #269

Merged
merged 13 commits into from
Nov 23, 2021
Merged

Conversation

shawaj
Copy link
Member

@shawaj shawaj commented Nov 23, 2021

Update miner to latest GA 2021.11.22.0

Ref #268

Pushed to testnet at Tue Nov 23 06:15:44 UTC 2021

shawaj and others added 11 commits November 21, 2021 22:39
Create production PR automatically when pushing a miner release to testnet
Add testnet to name
Due to new multi architecture building in hm-miner we need to now add the arm64 tag here

Relates-to: #NebraLtd/hm-miner#56
bump miner to latest arm64-version
fix environment variable
release(testnet): Update miner to latest GA 2021.11.22.0
release(testnet): Update miner to latest GA 2021.11.22.0
@shawaj shawaj requested review from a team as code owners November 23, 2021 06:19
Fetch the whole repo
@shawaj shawaj changed the title release(prod): Update miner to latest GA 2021.11.22.0 rrelease(prod): Update miner to latest GA 2021.11.22.0 Nov 23, 2021
release(testnet): Update miner to latest GA 2021.11.22.0
@shawaj shawaj changed the title rrelease(prod): Update miner to latest GA 2021.11.22.0 release(prod): Update miner to latest GA 2021.11.22.0 Nov 23, 2021
@shawaj
Copy link
Member Author

shawaj commented Nov 23, 2021

Copy link
Contributor

@KevinWassermann94 KevinWassermann94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get it out to production asap. Testnet looking good

@usalis
Copy link

usalis commented Nov 23, 2021

Guys, with all respect what you are doing, why is PR not automerged to production branch after two approvals? It looks like step of authorized users to click "Merge" button is redundant and it is creating a bottle neck.

@justkidding96
Copy link

@usalis I think they forgot how important this is. Because many devices are stuck now because this PR is not merged yet..

@usalis
Copy link

usalis commented Nov 23, 2021

@justkidding96, I do not think they forgot about importance, just because PR and actions related were done at night. We appreciate it. I am questioning a release to production process. If all checks passed (2 reviews + testnet), but we still wait for one person to do a merge. Isn't it a counter productive?

@shawaj shawaj merged commit 24663c3 into production Nov 23, 2021
@shawaj
Copy link
Member Author

shawaj commented Nov 23, 2021

Testnet checks are not always immediate which is why we don't want to auto merge. But thanks for the suggestions

@usalis
Copy link

usalis commented Nov 23, 2021

@shawaj, my understanding is that testnet checks are done by human observation: someone checks hotspot activity for a first beacon witnessed after deployment to testnet to confirm it is working. I am sure there will be volunteers in a community to help to automate this, if there are no plans or resources to do it soon by team.

@shawaj
Copy link
Member Author

shawaj commented Nov 23, 2021

@usalis it isn't really something the community can help with per-se as we need someone on our team who knows the codebase quite intimately to perform tests internally on our testnet (this part isn't documented here on the README currently for the internal part)

Having said that, we potentially could put some customers devices onto our testnet to help verify releases....however I dont think many people will be interested in that as the devices will have a considerable amount more down time on testnet than in production and people don't like that

open to ideas though for sure

@rawrmaan
Copy link

@shawaj We'd be happy to see if some of our hosts are opening to running an internal beta hotspot in addition to their regular hotspot. They're all experiencing so much downtime due to firmware issues already that I don't think anyone would mind.

@shawaj
Copy link
Member Author

shawaj commented Nov 23, 2021

@rawrmaan will discuss this internally and see if there is an easy way to achieve something like this (although having multiple onboarded hotspots in same location may not be good for hosts either?). perhaps we could have two levels of testnet or something. need to have a think and see what can fit in with the existing pipeline/workflow. And will need to make sure that people are fully aware that it will be far more unreliable to be on bleeding-edge updates flow.

unfortunately though, the vast majority of issues at the moment are coming from helium side - either due to the excessive number of issues on the network (including a GA every 2 days on average the last few weeks, 2 major chain halts and a "slow down"!) as well as the fact that GAs are not well documented with breaking changes and often breaking changes are dropped with no prior warning.

To be honest, it feels like there needs to be something more substantial from the helium side - building a robust CI/CD pipeline with lots of automated testing but also a better / more standardised release flow as per helium/HIP#309

Whilst the manufacturer has responsibility for a lot of stuff on the hardware and software side, AFAIK helium right now does not test on all vendors hardware/software stack when pushing GAs which is bound to cause issues and these will only increase as the network (and number of different types of hardware) grows. Helium also has the resources and the recurring incentives (large % of HNT earnings) to fix this - more so than any other network stakeholder IMO.

Probably worth further discussion though I guess from all aspects.

@rawrmaan
Copy link

I understand that Helium has been in massive flux and nothing is stable right now. It has to be very frustrating to deal with this as you're trying to improve your own software and systems.

However, writing 3 paragraphs about how Helium needs to fix things when Nebra is uniquely failing to get their hotspots working is not a good look.

image

No other major manufacturer has such an awful PoC success rate. We have first-hand experience with large sample size that shows Nebra hotspots are failing to even boot at a 10-30% rate. This happens before any of Helium's code comes into the picture.

Look, I get it. Software is hard. I've built companies. Sometimes you ship to production 20 times in a row and it's still broken and you don't know why.

We need an acknowledgement that you guys are having issues stabilizing the firmware and openness about what issues you're seeing and how you're planning to fix it. Since this is an open source project, maybe we could even help fix the issues if we know what's wrong.

What will it be, @shawaj? Continued denial and gaslighting your customers, or acknowledging the situation so we can all move forward constructively?

@shawaj
Copy link
Member Author

shawaj commented Nov 23, 2021

We aren't having issues stabilising the software (over and above the huge number of outages and GAs lately). In fact, many others operating large quantities of our miners are telling us that our software is far superior to others. We use balenaCloud which is extremely reliable (I believe Emrit and others use it too).

We have had a small percentage of hotspots facing the issues described here #266 (there are about 4 separate issues, which relate to balena-engine not killing containers properly) after the large number of GAs in recent weeks. But other than that there are no issues we are aware of.

If you are having particular problems, please reach out to our support at [email protected]

@NebraLtd NebraLtd locked as off-topic and limited conversation to collaborators Nov 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants