Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging/Versioning tags #692

Closed
mstormi opened this issue Aug 28, 2019 · 24 comments · Fixed by #882
Closed

Staging/Versioning tags #692

mstormi opened this issue Aug 28, 2019 · 24 comments · Fixed by #882

Comments

@mstormi
Copy link
Contributor

mstormi commented Aug 28, 2019

Currently any user starting openhabian-config gets an immediate update of latest code because HEAD/master is cloned/pulled.
I suggest introducing versioning and tags "current_install" and "current_active" (or similar) that openhabian-config clones instead on image boot and interactive startup, respectively.
OH itself uses snapshot/milestone/release in addition to versioning and we also already have these tags in the issue/PR tracking system for good reason. But our code is still "flat" unversioned, all-or-nothing.
This change will reduce the risk (number of affected installations, actually) of providing bad or insufficiently tested code to all of our users.
Note if it's a fix everybody needs or would benefit from we can always put the "current_active" tag on commits that we want to be distributed asap. But not every change is necessarily a step forward see e.g. those frontail issues.

@EliasGabrielsson
Copy link
Contributor

Me and @ThomDietrich discussed this topic a while ago.
We recognized that introducing a dev branch or tag-semantic could reduce risk for breaking systems but also slow development as the build-test cycle would be longer.

We settle in on taking the "risk" of breaking some system in tradeoff of early identify issues and speed up the development cycle. So far in openhabian history no "bad patch" have not been solved in a relative fastly manner.

As a way of reducing this risk we identified the need of a solid "install-test" infrastructure which the CI infrastructure partly solves.

@mstormi
Copy link
Contributor Author

mstormi commented Aug 28, 2019

Having a solid test infrastructure is good but you cannot cover all use cases because you simply don't think of all of them. And looking at history isn't a valid argument into the future. Atomic power also was considered safe until there was Chernobyl ...
We should at least give users the choice like OH does by adding tags for a 'release' version and presenting them with a menu switch to clone either HEAD or release.
Experts will choose HEAD more often than ordinary users will so we'll have a group to be affected by newly introduced issues which is smaller in size but better in terms of being able to help analyze the issue. More helping hands and less "noise" to dig through.
Why do you think that slows development ? I don't get that point. I'd be inclined to believe that that will actually rather speed up the fixing and thus development process than slow it.

@holgerfriedrich
Copy link
Member

I actually do very much like this approach and agree this is useful for bigger projects. But I think it could complicate a lot of things here. Actually tagging will only work only as long until bugs on this version are found. Then it ends up in having different development branches. Handling of bugfixes on different branches and a staging process, more complicated handling of support tickets, etc. Not sure if we want to spend this efforts. And I do agree with @EliasGabrielsson that this will slow down development (handling two branches and less people testing our latest stuff).

Regarding CI, I have the feeling that @mstormi is right, we will never get a full coverage. Getting it more stable will improve things, and maybe we can get a good coverage about the first time install. Getting tested all the options in the interactive menu - lots of work. Though, probably CI is worth all the effort we can spend.

Anyway, I would vote for the approach we currently take. Developer testing, CI, fixing blockers quickly once reported. Just thing about the response time we had for the last blockers (signature of openhab repo etc).

Just my few cents...

@mstormi
Copy link
Contributor Author

mstormi commented Aug 28, 2019

And I do agree with @EliasGabrielsson that this will slow down development (handling two
branches and less people testing our latest stuff)

I think that's a fundamental misunderstanding here. I do not suggest different, parallel branches (I'd agree that's more work) but Release to be a 'delayed' HEAD. So anything found & fixed in HEAD does not need to be found and fixed again in Release. We can move any HEAD fix to release by advancing the tag 'border', but the advantage over our current handling is that anything found in HEAD won't hit the majority of users (who will be on Release).

Then it ends up in having different development branches. Handling of bugfixes on different branches and a staging process, more complicated handling of support tickets

Why? I'm not buying that either. Having releases will actually reduce the number of different versions out there in use and thus the effort we need to spend on average until we can classify an issue to be a duplicate. I don't think it'll have major impact there but if any then it would be a positive one rather than more work.

@holgerfriedrich
Copy link
Member

I think that's a fundamental misunderstanding here. I do not suggest different, parallel branches (I'd agree that's more work) but Release to be a 'delayed' HEAD. So anything found & fixed in HEAD does not need to be found and fixed again in Release. We can move any HEAD fix to release by advancing the tag 'border', but the advantage over our current handling is that anything found in HEAD won't hit the majority of users (who will be on Release).

Just trying to think what happens if we encounter bugs in the release tagged version. It gets fixed on HEAD. Then we need to decide if we can move the release tag - at least if the bug affects typical users. Maybe we cannot because too many unstable things happened on master in the meantime. How do we know? That's why I think at the end we might end up with branching because that is much more practical.

more complicated handling of support tickets

I was thinking of what happens after bugs are fixed. Now it is just to (automatically) update the tool.
Done.
With tagging: Unless the tag is moved, users/bug reporter will not see this bugfix. What is the advice for affected users, switching to HEAD? How does a user know which bugs got resolved in a new release tag? Just imagine users to search for a bug which is marked as closed on GitHub: is it already included in latest release tag? Not sure if everyone will read the commit logs....

...still not convinced if we can gain anything here.

@EliasGabrielsson
Copy link
Contributor

EliasGabrielsson commented Aug 29, 2019

@holgerfriedrich you really put some good words of the trade-offs we discussed, thanks.

I don't see that a release process using versions in sense of x.y.z would be feasible for openhabian from a configuration management perspective. To actually use that method, to example make a bugfix we need a higher grader of isolation and atomic properties of our commits. In your case of using TAGs Markus we would need to git cherry pick commits or with branches we would need to merge them.

There is no way around that fundamental problem. Don't get mislead by the toolig (git, svn, dropbbox etc.)

In a communication perspective using versions on the hardware images can make some sense as it indicates level of changes. Eg. 1.5 marked a new base os compared to 1.4. I do wonder though if moving the images to incremental releases as well (1,2,3,4 etc.), because we have no configuration management patching method in place.

@mstormi
Copy link
Contributor Author

mstormi commented Aug 29, 2019

In your case of using TAGs Markus we would need to git cherry pick commits or with branches we would need to merge them.

Well I don't mean to cherry pick single (isolated) commits but a level of (aggregated) commits.
Cherry picking single elements off a chain wouldn't work, it could break that chain or lead to a need for branching.

We could define a "grace" period of say for example 6 weeks.
'Release' tag would in that case point to the complete codebase that HEAD was pointing to 6 weeks ago. It doesn't have to be 6 weeks, that's just a figure. Could also be a dynamically changing timeframe.
We would move the 'Release' tag forward based depending on either time (could even be automated) or change activity or (HEAD) user feedback (or lack of issues found/raised).
We for sure can discuss the exact mechanism.
We would only need to branch in case a (critical) bug is found to affect Release and HEAD and HEAD has changed a lot since such that we cannot easily apply the same patch to both.
That's a very very rare case we shouldn't be bothered about.

With tagging: Unless the tag is moved, users/bug reporter will not see this bugfix. What is the advice for affected users, switching to HEAD?

That would be the standard advice, yes. But it's left up to any user to try that first... we could help him with taking that decision e.g. by (auto-)generating the list of differences (commit messages) every time we advance the 'Release' tag.
But any user can also decide to stick with Release and to raise a GitHub issue.

@mstormi
Copy link
Contributor Author

mstormi commented Sep 7, 2019

The recent issue with Java download is another strong sign in favor of this proposed change of mine: we (contributors & maintainers) are not necessarily available "on standby" and also capable to fix every critical issue right when it turns up.
Turning the main version into a 'delayed HEAD' as I suggested will temper or even avoid the effects altogether for the majority of users.

@holgerfriedrich
Copy link
Member

holgerfriedrich commented Sep 7, 2019

As the current issue was caused by some change in an external source (which is know not to be very automation friendly in the past), I am in doubt that any kind of staging approach could have saved us us here.

For me it is another clear indication that the base system should be built on standard Raspbian packages. For all external stuff - which we clearly need to provide as install options - we cannot guarantee that updates will not break functionality. No way to proxy all external repos and downloads out there. Maybe we should highlight this towards users also in our menu structure.

@mstormi
Copy link
Contributor Author

mstormi commented Sep 7, 2019

As the current issue was caused by some change in an external source (which is know not to be
very automation friendly in the past), I am in doubt that any kind of staging approach could have
saved us us here.

I believe you're missing the point.
First and foremost I think it's proof to my statement that we must always be available to fix things immediately when in need. But we cannot guarantee that. Elias was unavailable today, and I for my part still want to be able to go even on summer vacation without a grumpy feeling.

For me it is another clear indication that the base system should be built on standard Raspbian
packages. For all external stuff - which we clearly need to provide as install options - we cannot
guarantee that updates will not break functionality. No way to proxy all external repos and
downloads out there. Maybe we should highlight this towards users also in our menu structure.

Caching of external resources (the java .tgz's) would have done the trick here. And unlike what you say yes, caching in combination with versioning would ensure that any broken latest update will never be forwarded to users right away but only after a grace period (or if we enforce it to).
But caching is a complementary measure to be discussed in #655 please.
Yes of course we can proxy (cache) all external repos. It's not many, we can slowly introduce it one-by-one, and it can be implemented as a one-time "copy-to-vault" action run, or as automated repo mirroring.

@EliasGabrielsson
Copy link
Contributor

EliasGabrielsson commented Sep 9, 2019

Caching the java packages is one technical solution to ensure uptime for the Java installation. I would also argue that it's bad because of a) it's a lot of work b) I don't know the legal standpoint c) it is even more work to keep it updated.

Having a release branch or TAG would not solve the problem of an external system change.

The road ahead would be to use more stable endpoints and package managers. In the case of java I would say we shall modify current code to use Azuls download API. https://www.azul.com/downloads/zulu-community/api/

It is also worth to note the downtime was roughly ~1day which I would say is perfectly good enough for an opensource project. It got reported to github as the instructions and a solution was made fairly quick.

@mstormi
Copy link
Contributor Author

mstormi commented Oct 6, 2019

It is also worth to note the downtime was roughly ~1day which I would say is perfectly good enough for an opensource project. I

It was good work but that's missing/fogging the point.
Point is that it always has to be a day or less, which by the way is resulting in stress to us, plus we get user criticism nonetheless.
Here's another problem instance with way more than a day: https://community.openhab.org/t/zram-status/80996/16
I hate reading stuff like that because he's right and my proposal here would have avoided that, too.
And my point is that even any downtime - short or not - can be avoided for all users but those few early adopters to be aware of the risk and to intentionally go with HEAD.

@ecdye
Copy link
Member

ecdye commented Apr 21, 2020

Perhaps an authoritative opinion from one on the @openhab/architecture-council would help resolve the indecision in this case?

@mstormi
Copy link
Contributor Author

mstormi commented Apr 21, 2020

I don't think so. The AC is on openHAB core code, noone of them is involved with openHABian.

@ecdye
Copy link
Member

ecdye commented Apr 21, 2020

Well nonetheless, I would have to agree with you in this case @mstromi the use of versioning tags in some form, or a delayed HEAD branch for the typical end user certainly be a good idea.

@rkoshak
Copy link

rkoshak commented Apr 21, 2020

The AC is supposed to be a resource that any of the openHAB repos can appeal to if the maintainers of that repo cannot come to a consensus. If you need a tie breaker the AC can be that tie breaker. I'm not saying you need to bring the AC in for this, just mentioning that it is available to you should you choose to use them.

I don't have a strong opinion myself on this matter. But since I'm here... as a user, I think I would appreciate there being a tiny bit more stability in openHABian that a slightly delayed HEAD would provide. For many/most users, openHABian is going to be the very first impression they get of the openHAB ecosystem. Providing some extra time to test before a PR goes live I think would help improve that first impression since the chances that there is something broken in the delayed HEAD is less than is currently the case.

@ghys
Copy link
Member

ghys commented Apr 21, 2020

You could perhaps have a look at following something like Gitlab Flow or something even more simple:

The development would happen on master (it's more natural than a develop branch or similar like in Git flow), and then there would also be a parallel stable branch; then when you feel it's stable enough to update the stable version with the latest bits, you do a PR here on GitHub from master to stable and merge it. So there's no need to do proper releases or tagging if you don't want/need to. If there's a commit that urgently needs to go to the stable version while there's something else being worked on, you would have to do some cherry-picking but that should be the exception rather than the rule.
Just my 2c ;)

@mstormi
Copy link
Contributor Author

mstormi commented Apr 21, 2020

FTR, now with #683 finally merged yesterday, any user can set clonebranch= in openhabian.conf to choose his branch.
I like Yannick's 2c, simple and effective.

@ghys
Copy link
Member

ghys commented Apr 21, 2020

Making master>stable PRs might seem like a hassle at first, but you get a convenient way of reviewing what's added since the last stable "release", and the default PR messages proposed by GitHub (concatenation of the commits' messages) will essentially be your changelog.

@mstormi
Copy link
Contributor Author

mstormi commented Apr 22, 2020

Just learned that Microsoft drives a similar strategy. They have optional updates which are previews of future patches that get enforced on regular patch days. Users can choose to install them ahead of time. So those optionals are the equivalent of our master while the regular patches would be what's in stable.
(ok, to some M$ may not be a good reference but let's be frank the Steve Ballmer "Linux is cancer" days are long gone and they're doing a good job nowadays).

@ecdye
Copy link
Member

ecdye commented Apr 22, 2020

If implemented we should make it a new version release as it is a very radical change from past usage.

@holgerfriedrich
Copy link
Member

I think we have planned to clean up pending PRs and issues, get the CI to work reliably, release another 1.x image.
Then prepare the 2.0, removing pinea64 support and other hardly maintained features. New features only in the 2.x...

Transition will be tricky, as old releases will update themselves from master....

@ecdye
Copy link
Member

ecdye commented Apr 22, 2020

Perhaps remove 1.x support by setting it to longer track changes from the master repo or something akin thereto when the time comes.

@mstormi
Copy link
Contributor Author

mstormi commented Apr 23, 2020

This is not a new feature but stabilization improvement at it's best.
So we could introduce it before 2.x and without invalidating our agreements.

Transition will be tricky, as old releases will update themselves from master....

The clonebranch= parameter should help.
But for sure will need to thoroughly think of a proper way of introducing this and when.

@ThomDietrich @EliasGabrielsson did the AC statements change your mind a little ?
Would you be willing to agree with a change as described by @ghys ?

This was referenced May 9, 2020
@mstormi mstormi added this to the Refinement and Refocus milestone May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants