Big-bang bang update (#333) broke most of my LXC updates #404

dsiebel · 2024-11-20T23:07:01Z

Please verify that you have read and understood the guidelines.

yes

A clear and concise description of the issue.

In #333 the resource checks were changed in a way that now breaks almost all my LXC updates, e.g.:

⚠️  Required: 1 CPU, 1024MB RAM | Current: 1 CPU, 512MB RAM
Please ensure that the Uptime Kuma LXC is configured with at least 1 vCPU and 1024 MB RAM for the build process.

I fine-tuned the resources of my LXC containers to what they actually need, the new resource checks however would require me to:

switch back to default resource values (might not be available on the PVE node)
reboot the container (downtime)
run the update
switch back to chosen values (downtime, again)

Am I holding this wrong? Is there a better way to do the updates?

This only happened after running sed -i 's/tteck\/Proxmox/community-scripts\/ProxmoxVE/g' /usr/bin/update to switch to the new repo (😢).
The old update scripts, still work fine, but will be outdated soon enough.

To be honest, I think the previous approach of briefly over-provisioning the container,
in case of resource intensive build processes, was perfectly fine.
Especially considering the resources were reverted, after install or update, to the chosen/previous values, which makes this effortless for the user.

What settings are you currently utilizing?

Default Settings
Advanced Settings

Which Linux distribution are you employing?

Debian 12

If relevant, including screenshots or a code block can be helpful in clarifying the issue.

No response

Please provide detailed steps to reproduce the issue.

Adjust the resources on any LXC container, so they deviate from the defaults.
Migrate the update scripts to the new community-scripts repo:
sed -i 's/tteck\/Proxmox/community-scripts\/ProxmoxVE/g' /usr/bin/update
run update

The text was updated successfully, but these errors were encountered:

MickLesk · 2024-11-21T06:04:56Z

Under-provisioning LXC containers is generally not a recommended practice. Containers only use the resources they actually need, regardless of what is allocated to them. For example, even if you assign 64 GB of RAM, a container requiring only 250 MB will only use that amount. Setting resource limits too low can lead to avoidable issues, particularly during resource-intensive tasks like build or update processes.

We introduced these checks to ensure stability and avoid scenarios where builds fail or installations break due to insufficient resources. This was a frequent issue in the past and caused significant frustration for users. While the old approach of temporarily over-provisioning resources might have worked in some cases, it also introduced its own challenges, especially for users who were unaware of these requirements.

The current method ensures that containers meet the minimum resource requirements upfront, providing a more reliable experience overall. If you prefer to maintain custom resource limits, you could consider temporarily increasing resources before running updates and then reverting them afterward. However, consistently meeting the minimum requirements for updates will likely save you time and effort in the long run.

dsiebel · 2024-11-21T07:39:41Z

That is all true and valid and I totally understand the reason why it was done.
I too was one of the users who was having issues with a container once that had insufficient resources to run a go build during updates.

Another reason why setting different resource limits, specifically memory, is to help with garbage collection. While it is true that containers will only consume the resources they need, many (if not all) garbage collected languages actually obey cgroup limits and it has a direct impact on garbage collection and the consumed resources. Hence, running at lower resources limits can help reduce the amount of resources a container thinks it needs.

I am running 40 LXC containers on my Proxmox VE. Having to temporarily update the resource settings to run an update turns a simple, automatable task into a full day of work (see steps involved above).

So, yeah, I basically wanted to gauge the interest in changing the current approach.
One could think about still allowing updates if the container's resource "requirements" aren't met, maybe with a simple y/n prompt.

MickLesk · 2024-11-21T08:09:12Z

The problem is, if I add the Y/N question, and we update e.g. an intensive script like Vaultwarden or others, the user has only 4GB RAM and during the update process the installer flies away, so we are only bugfixing. If it is still possible at all, because the process has deleted the folder beforehand. In other words, we are talking about potentially major data loss, which will probably come back to us last.

One idea would have been that you can have a 10-25% deviation from the recommended resources, but even then, we can't ensure that everything is running -> so we get bug reports again.

The right way would rather be to size the LXC's properly, e.g. if you say LXC “Test” has 2 cores and 2 GB RAM according to our specification, but only needs 786MB RAM even when updating, you could scale it down.

dsiebel · 2024-11-21T08:27:38Z

The problem is, if I add the Y/N question, and we update e.g. an intensive script like Vaultwarden or others, the user has only 4GB RAM and during the update process the installer flies away, so we are only bugfixing. If it is still possible at all, because the process has deleted the folder beforehand. In other words, we are talking about potentially major data loss, which will probably come back to us last.

I understand.
But having a prompt with a prominent warning would make it an easy close, no?
Maybe differentiating between scripts that run builds and scripts that don't?
So, I guess what I am asking here is to move some responsibility back to the user (#keinBackupKeinMitleid 😅).

One idea would have been that you can have a 10-25% deviation from the recommended resources, but even then, we can't ensure that everything is running -> so we get bug reports again.

Yeah, I don't think that 10-25% are going to cut it.
In most cases I could reduce the resource by 50% or less.

The right way would rather be to size the LXC's properly, e.g. if you say LXC “Test” has 2 cores and 2 GB RAM according to our specification, but only needs 786MB RAM even when updating, you could scale it down.

That is kind-of the problem.
If I scale the container down but the specs are set to higher values, it won't let me update.
Or am I misunderstanding your point here?

havardthom · 2024-11-21T10:35:12Z

Adding a data loss warning and requiring a full yes in a prompt like this might be acceptable @MickLesk ?

  if [[ "$current_ram" -lt "$var_ram" ]] || [[ "$current_cpu" -lt "$var_cpu" ]]; then
    echo -e "\n⚠️${HOLD} ${GN}Required: ${var_cpu} CPU, ${var_ram}MB RAM ${CL}| ${RD}Current: ${current_cpu} CPU, ${current_ram}MB RAM${CL}"
    echo -e "${YWB}Please ensure that the ${APP} LXC is configured with at least ${var_cpu} vCPU and ${var_ram} MB RAM for the build process.${CL}\n"
    read -r -p "⚠️ May cause data loss! ⚠️ Continue update with under-provisioned LXC? <yes/No>  " prompt  
    # Check if the input is 'yes', otherwise exit with status 1
    if [[ ! ${prompt,,} =~ ^(yes)$ ]]; then
      echo -e "❌${HOLD} ${YWB}Exiting based on user input.${CL}"
      exit 1
    fi
  else
    echo -e ""
  fi

MickLesk · 2024-11-21T10:42:08Z

Can we do, if there are many issues in the next time because this, we revert

dsiebel · 2024-11-21T11:10:00Z

I can also imagine a rollback mechanism, that would not necessarily incur data loss, by having the old release next to the new one and switching symlinks on them.
It's a pattern I've seen quite often in real world projects.
Not sure if that's feasible for all the scripts, since I don't know all of them (yet). But it's definitely something i'd be willing to give a try, when I find the time.

MickLesk · 2024-11-21T11:44:49Z

Do an Example with 2-3 CT.sh and then we take a look.

dsiebel · 2024-11-21T14:57:58Z

I think I will look at these for the implementation, for reasons mentioned below:

zigbee2mqtt - elaborate git clone update procedure
zoraxy - Simple GitHub release install and I contributed to it before
uptimekuma - because it was the first one that failed on me

If you have other suggestions which ones to look at, please throw them my way.
Will there still be an implementation of @havardthom 's suggestion in the meantime?

EDIT: while looking through a few more update scripts I noticed that in many cases, like zigbee2mqtt for example, the dreaded data loss can be easily avoided by properly configuring the runtime and not store the user data inside the git checkout, i.e. setting ZIGBEE2MQTT_DATA.

EDIT2: paperless-ngx might be an interesting candidate too, for the same reason as the above (data paths are not set to external dir).

Aloe-recite · 2024-11-22T09:16:24Z

Hi,
sorry for intervening in a technical discussion despite being a complete newbie. I'm pretty sure that allocating RAM and CPU to a LXC does not require a reboot (unlike VMs). Having to modify them manually before and after updating is still a hassle but at least no downtime. Maybe the update process could temporarily assign the required resources for the update and then revert to what they were before? Since the assigned resources are not actively used unless needed and act more as a cap/limit than a real assignation, thi shouldn't werak havoc, although maybe this requires running the update from the host to interact with resources (unless a container is allowed to alter them from inside, which seems unlikely).
Honestly, I'll use the future updates to revert to the default resources, I manually tweaked many of them but I recognized that I did it for no real, practical, tested and tangible reason....
Sorry if I wasted your time by saying something stupid!

MickLesk · 2024-11-22T09:31:43Z

Yes, you are right, LXC resources can be adjusted as desired during active operation.

This is not possible in the update script because we are not in the Proxmox main node. In other words, you cannot manipulate the RAM and CPU values via this location, this is only possible from the main node.
This is the reason why the install scripts once contained the following “Set resources to normal”, because this was executed via the Proxmox node and was therefore technically possible.

To realize that, all >220 scripts would have to be completely rebuilt, a completely new logic would have to be created. That's not going to happen. Also from a security point of view.

Aloe-recite · 2024-11-22T09:45:41Z

Thanks, makes perfect sense. Sorry for butting in!

MickLesk · 2024-12-02T09:26:55Z

Closed for now, is included in my PR.

burgerga · 2024-12-02T09:43:17Z

Hi @MickLesk which PR is this? (just curious :) )

And @dsiebel it would still be nice to explore the update/backup symlink logic, maybe we can start a discussion for that?

There are several scripts where in an update the /opt/$APP directory is removed, and then a clean install is done. If this fails, there is no way to go back, because a clean install from scratch will also fail.

MickLesk · 2024-12-02T09:49:24Z

not pushed yet, i think tomorrow or today evening (GMT)

dsiebel · 2024-12-02T11:22:38Z

it would still be nice to explore the update/backup symlink logic, maybe we can start a discussion for that?

@burgerga sure, feel free to start one and mention me for tracking.
Sadly, I haven't found the time to dig into it, yet.

MickLesk closed this as completed Dec 2, 2024

MickLesk mentioned this issue Dec 2, 2024

Massive Update: build.func | install.func | create_lxc.sh (Part 1) #643

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big-bang bang update (#333) broke most of my LXC updates #404

Big-bang bang update (#333) broke most of my LXC updates #404

dsiebel commented Nov 20, 2024 •

edited

Loading

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024 •

edited

Loading

havardthom commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024 •

edited

Loading

Aloe-recite commented Nov 22, 2024

MickLesk commented Nov 22, 2024

Aloe-recite commented Nov 22, 2024

MickLesk commented Dec 2, 2024

burgerga commented Dec 2, 2024

MickLesk commented Dec 2, 2024

dsiebel commented Dec 2, 2024

Big-bang bang update (#333) broke most of my LXC updates #404

Big-bang bang update (#333) broke most of my LXC updates #404

Comments

dsiebel commented Nov 20, 2024 • edited Loading

Please verify that you have read and understood the guidelines.

A clear and concise description of the issue.

What settings are you currently utilizing?

Which Linux distribution are you employing?

If relevant, including screenshots or a code block can be helpful in clarifying the issue.

Please provide detailed steps to reproduce the issue.

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024 • edited Loading

havardthom commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024

MickLesk commented Nov 21, 2024

dsiebel commented Nov 21, 2024 • edited Loading

Aloe-recite commented Nov 22, 2024

MickLesk commented Nov 22, 2024

Aloe-recite commented Nov 22, 2024

MickLesk commented Dec 2, 2024

burgerga commented Dec 2, 2024

MickLesk commented Dec 2, 2024

dsiebel commented Dec 2, 2024

dsiebel commented Nov 20, 2024 •

edited

Loading

dsiebel commented Nov 21, 2024 •

edited

Loading

dsiebel commented Nov 21, 2024 •

edited

Loading