Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big-bang bang update (#333) broke most of my LXC updates #404

Closed
1 of 2 tasks
dsiebel opened this issue Nov 20, 2024 · 16 comments
Closed
1 of 2 tasks

Big-bang bang update (#333) broke most of my LXC updates #404

dsiebel opened this issue Nov 20, 2024 · 16 comments

Comments

@dsiebel
Copy link
Contributor

dsiebel commented Nov 20, 2024

Please verify that you have read and understood the guidelines.

yes

A clear and concise description of the issue.

In #333 the resource checks were changed in a way that now breaks almost all my LXC updates, e.g.:

⚠️  Required: 1 CPU, 1024MB RAM | Current: 1 CPU, 512MB RAM
Please ensure that the Uptime Kuma LXC is configured with at least 1 vCPU and 1024 MB RAM for the build process.

I fine-tuned the resources of my LXC containers to what they actually need, the new resource checks however would require me to:

  1. switch back to default resource values (might not be available on the PVE node)
  2. reboot the container (downtime)
  3. run the update
  4. switch back to chosen values (downtime, again)

Am I holding this wrong? Is there a better way to do the updates?

This only happened after running sed -i 's/tteck\/Proxmox/community-scripts\/ProxmoxVE/g' /usr/bin/update to switch to the new repo (😢).
The old update scripts, still work fine, but will be outdated soon enough.

To be honest, I think the previous approach of briefly over-provisioning the container,
in case of resource intensive build processes, was perfectly fine.
Especially considering the resources were reverted, after install or update, to the chosen/previous values, which makes this effortless for the user.

What settings are you currently utilizing?

  • Default Settings
  • Advanced Settings

Which Linux distribution are you employing?

Debian 12

If relevant, including screenshots or a code block can be helpful in clarifying the issue.

No response

Please provide detailed steps to reproduce the issue.

  1. Adjust the resources on any LXC container, so they deviate from the defaults.
  2. Migrate the update scripts to the new community-scripts repo:
    sed -i 's/tteck\/Proxmox/community-scripts\/ProxmoxVE/g' /usr/bin/update
  3. run update
@MickLesk
Copy link
Member

Under-provisioning LXC containers is generally not a recommended practice. Containers only use the resources they actually need, regardless of what is allocated to them. For example, even if you assign 64 GB of RAM, a container requiring only 250 MB will only use that amount. Setting resource limits too low can lead to avoidable issues, particularly during resource-intensive tasks like build or update processes.

We introduced these checks to ensure stability and avoid scenarios where builds fail or installations break due to insufficient resources. This was a frequent issue in the past and caused significant frustration for users. While the old approach of temporarily over-provisioning resources might have worked in some cases, it also introduced its own challenges, especially for users who were unaware of these requirements.

The current method ensures that containers meet the minimum resource requirements upfront, providing a more reliable experience overall. If you prefer to maintain custom resource limits, you could consider temporarily increasing resources before running updates and then reverting them afterward. However, consistently meeting the minimum requirements for updates will likely save you time and effort in the long run.

@dsiebel
Copy link
Contributor Author

dsiebel commented Nov 21, 2024

That is all true and valid and I totally understand the reason why it was done.
I too was one of the users who was having issues with a container once that had insufficient resources to run a go build during updates.

Another reason why setting different resource limits, specifically memory, is to help with garbage collection. While it is true that containers will only consume the resources they need, many (if not all) garbage collected languages actually obey cgroup limits and it has a direct impact on garbage collection and the consumed resources. Hence, running at lower resources limits can help reduce the amount of resources a container thinks it needs.

I am running 40 LXC containers on my Proxmox VE. Having to temporarily update the resource settings to run an update turns a simple, automatable task into a full day of work (see steps involved above).

So, yeah, I basically wanted to gauge the interest in changing the current approach.
One could think about still allowing updates if the container's resource "requirements" aren't met, maybe with a simple y/n prompt.

@MickLesk
Copy link
Member

The problem is, if I add the Y/N question, and we update e.g. an intensive script like Vaultwarden or others, the user has only 4GB RAM and during the update process the installer flies away, so we are only bugfixing. If it is still possible at all, because the process has deleted the folder beforehand. In other words, we are talking about potentially major data loss, which will probably come back to us last.

One idea would have been that you can have a 10-25% deviation from the recommended resources, but even then, we can't ensure that everything is running -> so we get bug reports again.

The right way would rather be to size the LXC's properly, e.g. if you say LXC “Test” has 2 cores and 2 GB RAM according to our specification, but only needs 786MB RAM even when updating, you could scale it down.

@dsiebel
Copy link
Contributor Author

dsiebel commented Nov 21, 2024

The problem is, if I add the Y/N question, and we update e.g. an intensive script like Vaultwarden or others, the user has only 4GB RAM and during the update process the installer flies away, so we are only bugfixing. If it is still possible at all, because the process has deleted the folder beforehand. In other words, we are talking about potentially major data loss, which will probably come back to us last.

I understand.
But having a prompt with a prominent warning would make it an easy close, no?
Maybe differentiating between scripts that run builds and scripts that don't?
So, I guess what I am asking here is to move some responsibility back to the user (#keinBackupKeinMitleid 😅).

One idea would have been that you can have a 10-25% deviation from the recommended resources, but even then, we can't ensure that everything is running -> so we get bug reports again.

Yeah, I don't think that 10-25% are going to cut it.
In most cases I could reduce the resource by 50% or less.

The right way would rather be to size the LXC's properly, e.g. if you say LXC “Test” has 2 cores and 2 GB RAM according to our specification, but only needs 786MB RAM even when updating, you could scale it down.

That is kind-of the problem.
If I scale the container down but the specs are set to higher values, it won't let me update.
Or am I misunderstanding your point here?

@havardthom
Copy link
Contributor

Adding a data loss warning and requiring a full yes in a prompt like this might be acceptable @MickLesk ?

  if [[ "$current_ram" -lt "$var_ram" ]] || [[ "$current_cpu" -lt "$var_cpu" ]]; then
    echo -e "\n⚠️${HOLD} ${GN}Required: ${var_cpu} CPU, ${var_ram}MB RAM ${CL}| ${RD}Current: ${current_cpu} CPU, ${current_ram}MB RAM${CL}"
    echo -e "${YWB}Please ensure that the ${APP} LXC is configured with at least ${var_cpu} vCPU and ${var_ram} MB RAM for the build process.${CL}\n"
    read -r -p "⚠️ May cause data loss! ⚠️ Continue update with under-provisioned LXC? <yes/No>  " prompt  
    # Check if the input is 'yes', otherwise exit with status 1
    if [[ ! ${prompt,,} =~ ^(yes)$ ]]; then
      echo -e "❌${HOLD} ${YWB}Exiting based on user input.${CL}"
      exit 1
    fi
  else
    echo -e ""
  fi

@MickLesk
Copy link
Member

Can we do, if there are many issues in the next time because this, we revert

@dsiebel
Copy link
Contributor Author

dsiebel commented Nov 21, 2024

I can also imagine a rollback mechanism, that would not necessarily incur data loss, by having the old release next to the new one and switching symlinks on them.
It's a pattern I've seen quite often in real world projects.
Not sure if that's feasible for all the scripts, since I don't know all of them (yet). But it's definitely something i'd be willing to give a try, when I find the time.

@MickLesk
Copy link
Member

Do an Example with 2-3 CT.sh and then we take a look.

@dsiebel
Copy link
Contributor Author

dsiebel commented Nov 21, 2024

I think I will look at these for the implementation, for reasons mentioned below:

  • zigbee2mqtt - elaborate git clone update procedure
  • zoraxy - Simple GitHub release install and I contributed to it before
  • uptimekuma - because it was the first one that failed on me

If you have other suggestions which ones to look at, please throw them my way.
Will there still be an implementation of @havardthom 's suggestion in the meantime?

EDIT: while looking through a few more update scripts I noticed that in many cases, like zigbee2mqtt for example, the dreaded data loss can be easily avoided by properly configuring the runtime and not store the user data inside the git checkout, i.e. setting ZIGBEE2MQTT_DATA.

EDIT2: paperless-ngx might be an interesting candidate too, for the same reason as the above (data paths are not set to external dir).

@Aloe-recite
Copy link

Hi,
sorry for intervening in a technical discussion despite being a complete newbie. I'm pretty sure that allocating RAM and CPU to a LXC does not require a reboot (unlike VMs). Having to modify them manually before and after updating is still a hassle but at least no downtime. Maybe the update process could temporarily assign the required resources for the update and then revert to what they were before? Since the assigned resources are not actively used unless needed and act more as a cap/limit than a real assignation, thi shouldn't werak havoc, although maybe this requires running the update from the host to interact with resources (unless a container is allowed to alter them from inside, which seems unlikely).
Honestly, I'll use the future updates to revert to the default resources, I manually tweaked many of them but I recognized that I did it for no real, practical, tested and tangible reason....
Sorry if I wasted your time by saying something stupid!

@MickLesk
Copy link
Member

Yes, you are right, LXC resources can be adjusted as desired during active operation.

This is not possible in the update script because we are not in the Proxmox main node. In other words, you cannot manipulate the RAM and CPU values via this location, this is only possible from the main node.
This is the reason why the install scripts once contained the following “Set resources to normal”, because this was executed via the Proxmox node and was therefore technically possible.

To realize that, all >220 scripts would have to be completely rebuilt, a completely new logic would have to be created. That's not going to happen. Also from a security point of view.

@Aloe-recite
Copy link

Thanks, makes perfect sense. Sorry for butting in!

@MickLesk MickLesk closed this as completed Dec 2, 2024
@MickLesk
Copy link
Member

MickLesk commented Dec 2, 2024

Closed for now, is included in my PR.

@burgerga
Copy link
Contributor

burgerga commented Dec 2, 2024

Hi @MickLesk which PR is this? (just curious :) )

And @dsiebel it would still be nice to explore the update/backup symlink logic, maybe we can start a discussion for that?

There are several scripts where in an update the /opt/$APP directory is removed, and then a clean install is done. If this fails, there is no way to go back, because a clean install from scratch will also fail.

@MickLesk
Copy link
Member

MickLesk commented Dec 2, 2024

not pushed yet, i think tomorrow or today evening (GMT)

@dsiebel
Copy link
Contributor Author

dsiebel commented Dec 2, 2024

it would still be nice to explore the update/backup symlink logic, maybe we can start a discussion for that?

@burgerga sure, feel free to start one and mention me for tracking.
Sadly, I haven't found the time to dig into it, yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants