-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ram and non-containerized storage preflight check #3518
add ram and non-containerized storage preflight check #3518
Conversation
memoryfree = get_var(task_vars, "facter_memoryfree") | ||
|
||
recommended_memory = get_var(task_vars, "min_memory_gb") | ||
recommended_storage = get_var(task_vars, "min_diskspace_gb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brenton not sure where these values should be coming from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get the supported values here:
https://docs.openshift.com/container-platform/latest/install_config/install/prerequisites.html#hardware
Even though there may be different requirements for older versions of the product, in general I think we should always encourage customers to meet the requirements of the latest version so they don't hit problems when they upgrade.
a366cd2
to
0391ebc
Compare
# any images are missing, or error occurs during pull | ||
def run(self, tmp, task_vars): | ||
ansible_mounts = get_var(task_vars, "ansible_mounts") | ||
diskfree = self.to_gigabytes(self.get_disk_space(ansible_mounts)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brenton or @rhcarvalho This is the total value of the /
filesystem, not sure if I should be checking that, or openshift-xfs-vol-dir
. I am getting this information from ansible_mounts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber, do you have any idea where openshift-xfs-vol-dir
came from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a disk label or a lv name? In any case, I'm not sure we can rely on specific values.
I think the best we can do is match a device with specific directories we care about.
For containerized docker info
would work for devicemapper w/ a thin pool, not sure about other backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For containerized docker info would work for devicemapper w/ a thin pool, not sure about other backends.
Thanks, for now at least, I have added support for checking containerized installs that use devicemapper for storage driver
2ee2091
to
0deaaa8
Compare
from openshift_checks.mixins import NotContainerizedMixin | ||
|
||
|
||
class RamDiskspaceAvailability(NotContainerizedMixin, OpenShiftCheck): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the memory and diskspace checks be separated?
I would think that Memory checks would apply equally to containerized/non-containerized.
Are there plans to also add a disk space check for containerized installs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to splitting into 2 checks.
2420583
to
d93cc16
Compare
|
||
def run(self, tmp, task_vars): | ||
group_names = get_var(task_vars, "group_names", default=[]) | ||
memoryfree = get_var(task_vars, "facter_memoryfree") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking free memory is not the right measure here, as I believe cache use is excluded, so it's kind of an arbitrarily changing value. Maybe "memory.system.available" however I'm not sure that's in the task_vars and I wonder if we really want to base it on memory usage at time of operation. I think it would be least surprising to use memorysize (total system memory).
group_names = get_var(task_vars, "group_names", default=[]) | ||
memoryfree = get_var(task_vars, "facter_memoryfree") | ||
|
||
recommended_memory_gb = self.recommended_master_memory_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this begs for a ternary:
recommended_memory_gb = self.recommended_node_memory_gb if "nodes" in group_names else self.recommended_master_memory_gb
Pythonistas? :)
Also shouldn't this be the other way around? Because masters can be nodes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masters should be nodes, but in general they are unschedulable nodes, so they do not require additional disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pythonistas? :)
I'd go with a dict
and max
:
class MemoryAvailability(OpenShiftCheck):
recommended_memory = {
"nodes": X,
"masters": Y,
"etcd": Z,
"nfs": M,
"lb": N,
}
def run(...):
# ...
recommended_memory = max(self.recommended_memory.get(group, 0) for group in group_names)
|
||
@staticmethod | ||
def mem_to_float(mem): | ||
return float(mem.rstrip(" GB")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's GiB... or MB? I think it's the "display size" which would be MB if < 1 GB.
@@ -0,0 +1,31 @@ | |||
# pylint: disable=missing-docstring | |||
from openshift_checks import OpenShiftCheck, get_var | |||
from openshift_checks.mixins import NotContainerizedMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not run this in containerized installs too?
@@ -0,0 +1,44 @@ | |||
# pylint: disable=missing-docstring | |||
from openshift_checks import OpenShiftCheck, get_var | |||
from openshift_checks.mixins import NotContainerizedMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding but doesn't this exclude containerized installs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a leftover, unused import.
Unlike Go, the Python interpreter doesn't complain, but I'm sure this is failing our CI.
def get_disk_space(ansible_mounts, is_containerized): | ||
if len(ansible_mounts): | ||
for mnt in ansible_mounts: | ||
if mnt.get("mount") == "/": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the requirements are for the filesystem containing /var
(https://docs.openshift.com/container-platform/3.4/install_config/install/prerequisites.html#hardware). So should check if /var
is mounted separately before using /
.
|
||
@staticmethod | ||
def to_gigabytes(total_bytes): | ||
return total_bytes / 1073741824 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but wouldn't this be gibibytes instead of gigabytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed!
|
||
def run(self, tmp, task_vars): | ||
group_names = get_var(task_vars, "group_names", default=[]) | ||
memoryfree = get_var(task_vars, "facter_memoryfree") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can guarantee that factor
will be available on the remote hosts. ansible_memfree_mb
would be the equivalent. That said, I don't think you can trust either factor_memoryfree
or ansible_memfree_mb
here, since they both would include cached data.
Instead of trying to get fancy, I think it might be best to stick with comparing against ansible_memtotal_mb
. There is also ansible_memory_mb
that provides access to swap, physical memory info, and memory values adjusted for cached data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 ansible_memtotal_mb
if "nodes" in group_names: | ||
recommended_memory_gb = self.recommended_node_memory_gb | ||
|
||
if self.mem_to_float(memoryfree) < recommended_memory_gb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to convert the retrieved value to bytes and do a int comparison instead?
@@ -0,0 +1,44 @@ | |||
# pylint: disable=missing-docstring | |||
from openshift_checks import OpenShiftCheck, get_var | |||
from openshift_checks.mixins import NotContainerizedMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a leftover, unused import.
Unlike Go, the Python interpreter doesn't complain, but I'm sure this is failing our CI.
from openshift_checks import OpenShiftCheck, get_var | ||
from openshift_checks.mixins import NotContainerizedMixin | ||
|
||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case of imported and not used.
tags = ["preflight"] | ||
|
||
recommended_node_diskspace_gb = 15.0 | ||
recommended_master_diskspace_gb = 40.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about things like etcd?
group_names = get_var(task_vars, "group_names", default=[]) | ||
memoryfree = get_var(task_vars, "facter_memoryfree") | ||
|
||
recommended_memory_gb = self.recommended_master_memory_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pythonistas? :)
I'd go with a dict
and max
:
class MemoryAvailability(OpenShiftCheck):
recommended_memory = {
"nodes": X,
"masters": Y,
"etcd": Z,
"nfs": M,
"lb": N,
}
def run(...):
# ...
recommended_memory = max(self.recommended_memory.get(group, 0) for group in group_names)
1fe382c
to
d2e2d58
Compare
@sdodson, @rhcarvalho, or @sosiouxme wondering if you could give this patch one last pass? Thanks |
|
||
|
||
if __name__ == '__main__': | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this available to other roles? I could see this being of value in the docker-related roles as well. It might also be nice to return back at least some of the values as 'facts' instead of just a single info
dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will update this to return its values under ansible_facts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, maybe it would be better to rename the module from docker_info
to simply docker
? That way, it makes sense for it to return not just info, but docker client info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker_facts
would probably be better, docker
would shadow the older (and deprecated) docker module upstream.
def main(): | ||
client = AnsibleDockerClient( | ||
argument_spec=dict( | ||
name=dict(type='list'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name
doesn't appear to be used anywhere.
I would add supports_check_mode=True
to params for AnsibleDockerClient.
e7bc690
to
43680e3
Compare
|
||
recommended_node_space_gb = 15.0 | ||
recommended_master_space_gb = 40.0 | ||
minimum_docker_space_percent = 5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be configurable in run time.
They may be defined as variables in the preflight/check.yml
playbook, and their names should have some prefix to avoid name collisions...
diskfree = self.to_gibibytes(self.get_disk_space(ansible_mounts)) | ||
|
||
# if running containerized installation check | ||
# that available space is not below 5% instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should not refer to specific values like "5%" that are defined in a variable elsewhere.
(multiple occurrences, check all)
info = docker_facts.get("docker_facts") | ||
status = dict(info.get("DriverStatus")) | ||
|
||
if info.get("Driver", None) == self.supported_docker_dev_driver: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of this check doesn't seem to be clearly defined when the driver is not what we consider to be "supported".
We need to clarify the intent.
status = dict(info.get("DriverStatus")) | ||
|
||
if info.get("Driver", None) == self.supported_docker_dev_driver: | ||
disktotal = status.get("Data Space Total", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the guarantee we have on these key names "Data Space Total", etc? I'm afraid those are not guaranteed APIs to rely on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, they do not seem to be guaranteed as far as I have seen.
Another idea to retrieve these values would be to use the docker_facts
module to get the value of DockerRootDir
(/var/lib/docker
by default), and perform df
on it:
$ df --output=avail -B G /var/lib/docker | grep "G" -m 1
20G
Simpler yet, we could just use df
to get the mountpoint for /var/lib/docker
, then look for that in ansible_mounts
under task_vars
and get available and total values from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhcarvalho
Upon looking at documentation for devmapper https://github.com/docker/docker/tree/master/daemon/graphdriver/devmapper#information-on-docker-info, the values Data Space Total
and Data Space Available
appear in the linked readme. Although the values underneath the storage driver vary from driver to driver, since we are first checking that the host machine uses devicemapper, I would prefer to rely on these values
|
||
if info.get("Driver", None) == self.supported_docker_dev_driver: | ||
disktotal = status.get("Data Space Total", None) | ||
diskavail = status.get("Data Space Available", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless in a context that would otherwise require None
for clarity, in Python dictionaries dict.get
has a default value of None
, so it is more common to write:
some_dict.get("some key")
However, in this case here there is a problem with data types. I'd avoid mixing whatever is that type we expect to find in the dictionary (apparently a str
) x NoneType
.
"pb": 1000 ** 5 | ||
} | ||
|
||
segments = strfmt.lower().split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can easily be an AttributeError
exception here. We use strfmt
as if it was a str
, but in the code above we may clearly be calling this function with None
as the first argument...
In a dynamically typed language, we have to keep our usage of data types sane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I now make sure to pass a string type to this method. Corrected this in other places as well
|
||
recommended_memory_gb = max(self.recommended_memory_gb.get(group, 0) for group in group_names) | ||
|
||
if self.to_gigabytes(memoryfree) < recommended_memory_gb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we store the recommended values in the unit we want to compare to (MB)?
recommended_memory_gb = { | ||
"nodes": 8.0, | ||
"masters": 16.0, | ||
"etcd": 20.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good practice to include a comma at the end of every line in multiline definitions like this.
It makes it easier to add / remove / move items without having to make changes to other lines.
recommended_memory_gb = max(self.recommended_memory_gb.get(group, 0) for group in group_names) | ||
|
||
if self.to_gigabytes(memoryfree) < recommended_memory_gb: | ||
kind = "master" if "masters" in group_names else "node" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. There are many other possibilities for group_names
, we can't call everything else a "node" (there are etcd
, nfs
, ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will fix. Originally referenced this check to determine this logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic there is different, it is additive: start with an empty set and add items based on the group names. Here we are incorrectly assuming that everything that is not master is a node.
if info.get("Driver", None) == self.supported_docker_dev_driver: | ||
disktotal = status.get("Data Space Total", None) | ||
diskavail = status.get("Data Space Available", None) | ||
diskleft_percent = self.to_bytes(diskavail) / self.to_bytes(disktotal) * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be division by zero here...
diskleft_percent = self.to_bytes(diskavail) / self.to_bytes(disktotal) * 100 | ||
|
||
# fail if less than 5% of available dataspace space left | ||
if int(diskleft_percent) < self.minimum_docker_space_percent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be mindful of rounding problems here.
# fail if less than 5% of available dataspace space left | ||
if int(diskleft_percent) < self.minimum_docker_space_percent: | ||
msg = "The amount of data space remaining (%s%%) for the docker storage driver is below %s%%" \ | ||
% (int(diskleft_percent), self.minimum_docker_space_percent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use %s
(formatting of strings) with int
values.
And we should prefer using str.format
instead of the %
operator.
(multiple occurrences, check all)
recommended_diskspace_gb = self.recommended_master_space_gb | ||
|
||
if "nodes" in group_names: | ||
recommended_diskspace_gb = self.recommended_node_space_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is wrong.
- There are more group names that
masters
andnodes
. - If
"nodes" in group_names
, we should not overwriterecommended_diskspace_gb
-- what if"masters"
is also ingroup_names
?!
if "nodes" in group_names: | ||
recommended_diskspace_gb = self.recommended_node_space_gb | ||
|
||
if float(diskfree) < recommended_diskspace_gb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the conversion to float
?!
|
||
# if running containerized installation check | ||
# that available space is not below 5% instead | ||
if is_containerized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ease of reading and understanding what this check does, please make the "run" method short and clear on its intents.
For example, we could have run_containerized
and run_non_containerized
methods to which run
delegates depending on is_containerized
.
if mnt.get("mount") == "/": | ||
root_size_available = mnt.get("size_available") | ||
if mnt.get("mount") == "/var": | ||
return mnt.get("size_available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return early on /var
, but keep iterating when we see /
?!
We need to clarify the intent here, what is this function promising to return? "get disk space" is not a great name, we can try to improve it once the expected output is clear.
From the recommended available storage, is that 100% towards Docker storage or do we need to look at Docker storage + other mount points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I return immediately if /var
exists at all. If not, /
is the fallback mount path. I update this method to be a bit more legible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the recommended available storage, is that 100% towards Docker storage or do we need to look at Docker storage + other mount points?
I meant for this to be a check on the mount point for the docker volume. At least in my case, the mount point for its volume was /
, so I checked for the overall disk space remaining on that point
return {"changed": False} | ||
|
||
@staticmethod | ||
def get_disk_space(ansible_mounts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring here would make it clear what the ansible_mounts
parameter should be for the function to work properly. It's a list of hashes I guess?
ansible_mounts = []
ansible_mounts.append({'size_available': 12345, 'mount': '/'})
ansible_mounts.append({'size_available': 34512, 'mount': '/var'})
Right? (plus other possible params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. ansible_mounts
is a map like you described. I updated the method to be more legible, added a docstring
tags = ["preflight"] | ||
|
||
recommended_memory_mb = { | ||
"nodes": 8000.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8000 or 8 * 1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 8000 is fine, unless we prefer MiB instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docs use the decimal prefixes, let's go with powers of 10 then.
https://docs.openshift.org/latest/install_config/install/prerequisites.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I scanned over the output from docker info
and saw that it reports different measurements in different prefix unit systems.
...
Storage Driver: devicemapper
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Data Space Used: 1.42 GB
Data Space Total: 44.43 GB
Data Space Available: 43.01 GB
Metadata Space Used: 1.507 MB
Metadata Space Total: 511.7 MB
Metadata Space Available: 510.2 MB
Thin Pool Minimum Free Space: 4.443 GB
...
Total Memory: 15.11 GiB
...
Sizes are reported using the SI base-10 system (kB
, MB
, GB
) and memory is reported using the NIST base-2 system (KiB
, GiB
, etc).
As it was pointed out before, I do maintain a library, 'bitmath' (200 unit tests, in RHEL/Fedora, etc etc, etc) that handles all of that automatically (and more).
Here's a quick gist I made that takes code from this PR and shows the bitmath equivalent.
I think that trying to get bitmath available as an RPM in the RHSM channels would be a big effort, and getting it into Atomic is nigh impossible. I'm OK w/ vendoring/slipping the bitmath library into this project.
But that's a bigger conversation we should probably have as a team.
edit#2: If you want to try bitmath you can pip install bitmath
or dnf
/yum
install python-bitmath
(or depending on your distribution version: python2-bitmath
or python3-bitmath
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is memory though which isn't coming from docker info... comes from ansible setup. Whatever our docs say, nobody measures RAM in GB (only disk manufacturers get to pull that crap), so it seems to me MiB/GiB should be used.
return {"changed": False} | ||
|
||
@staticmethod | ||
def to_gigabytes(size_in_mb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will remove
recommended_memory_mb = max(self.recommended_memory_mb.get(group, 0) for group in group_names) | ||
|
||
if memoryfree < recommended_memory_mb: | ||
msg = "Available memory ({available} GB) below recommended storage. Minimum required memory: {recommended} GB".format(available=memoryfree, recommended=recommended_memory_mb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too long line...
The reported units are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix, taking care of linting issues towards the end
|
||
segments = strfmt.lower().split(" ") | ||
if not units.get(segments[1], False): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types: why does this function sometimes return a number / float, sometimes NoneType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will address this throughout the code
"tib": 1024 ** 4, | ||
"tb": 1000 ** 4, | ||
"pib": 1024 ** 5, | ||
"pb": 1000 ** 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still puzzled about we doing these conversions this way...
But anyway, a nit but part of learning "the Python way":
1000**5,
- The
**
operator is typically not surrounded by spaces. - Add a trailing comma to this type of 'lists'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure what specific units to expect, so this at least covers standard cases. @tbielawa has written a great python module that we could use in cases such as this one if we are willing to include it in our openshift-ansible image / make it a requirement on the host machine. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to do any conversions :-)
"the best code is no code at all"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am certainly for this, I am not sure to what extent we can guarantee that the units returned by devicemapper's Data Space Total
, Data Space Available
will be consistent in all cases
|
||
@staticmethod | ||
def to_gigabytes(total_bytes): | ||
return total_bytes / 1000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we also need some clarity if we should be using base 10^3 or 2^10.
https://en.wikipedia.org/wiki/Byte#Unit_multiples
Would be easier to read a formula than counting zeros:
total_bytes / 1000000000
total_bytes / (10**3)**3
total_bytes / 1000**3
def containerized_volume_check(self, task_vars): | ||
docker_facts = self.module_executor("docker_facts", {}, task_vars) | ||
info = docker_facts.get("docker_facts") | ||
status = dict(info.get("DriverStatus")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> dict(None)
TypeError: 'NoneType' object is not iterable
>>> info = None
>>> info.get("DriverStatus")
AttributeError: 'NoneType' object has no attribute 'get'
failed, msg = self.containerized_volume_check(task_vars) | ||
if failed: | ||
return {"failed": True, "msg": msg} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it stands now, we will call both containerized_volume_check
and noncontainerized_volume_check
when is_containerized
is true and failed
is false...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that :)
if "masters" in group_names: | ||
return self.recommended_master_space_gb | ||
|
||
return self.recommended_node_space_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also assuming that everything that is not a master is a node. If we expect to run this check on non-master-non-node hosts, we should follow the docs and use the proper storage required for each type of host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, will go ahead and use info from https://docs.openshift.com/container-platform/3.4/install_config/install/prerequisites.html#hardware, however it only contains reqs for masters, nodes, and etcd
I'm not sure if it is showing up anymore, due to the folded 'Hide Outdated' segments above, but I did post another comment with more information on bitmath and the problem of converting/parsing inconsistent units. Direct link to above comment here |
from openshift_checks import OpenShiftCheck, get_var | ||
|
||
|
||
class DiskAvailability(OpenShiftCheck): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the file is inconsistent with the check name and class name.
def main(): | ||
client = AnsibleDockerClient( | ||
argument_spec=dict( | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. Why not make it one line...
name = "disk_availability" | ||
tags = ["preflight"] | ||
|
||
recommended_diskspace = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a comment about why it's base 10, and a link to origin docs on where this recommendation comes from?
} | ||
|
||
minimum_docker_space_percent = 5.0 | ||
supported_docker_dev_driver = "devicemapper" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems conceptually like it should be a list (of one currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking of removing the check for the storage driver from this PR altogether, and just handling it as part of the docker_storage_check
PR https://github.com/openshift/openshift-ansible/pull/3787/files#diff-c3d0f0c6a0d94ad7d6c5a62da695118dR17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, remembering a very similar conversation I had with @rhcarvalho about this, it would be better to have the storage driver check be its own preflight check, allowing this PR and #3787 to focus on a single thing. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, belatedly... yes.
|
||
if is_containerized: | ||
failed, msg = self.containerized_volume_check(task_vars) | ||
if failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not be simplified to:
if is_containerized:
failed, msg = self.containerized_volume_check(task_vars)
else:
failed, msg = self.noncontainerized_volume_check(ansible_mounts, task_vars)
return {"failed": failed, "changed": False, "msg": msg}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, even in the case of non-containerized components, https://docs.openshift.org/latest/install_config/install/prerequisites.html indicates nodes should have docker storage of 15GB free. Seems like the code could be reused for that.
failed, msg = self.noncontainerized_volume_check(ansible_mounts, task_vars) | ||
return {"failed": failed, "msg": msg} | ||
|
||
def containerized_volume_check(self, task_vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring is in order here, e.g.
"""A contained component uses only docker storage, so check docker storage info against requirements."""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, per the comments on #3518 (comment), I have decided to leave all aspects of the containerized check to #3787. (I will further refine that PR, rename it so the name is consistent with the check in this PR, and split out the storage driver check from it into its own check). Any of your feedback on #3787 is definitely welcome :)
openshift_diskfree_bytes = self.get_openshift_disk_availability(ansible_mounts) | ||
|
||
if openshift_diskfree_bytes < recommended_diskspace_bytes: | ||
msg = ("Available disk space ({diskfree} GB) for OpenShift volume below recommended storage. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for volume containing /var" - there is no "OpenShift volume"
if not units.get(segments[1]): | ||
return 0 | ||
|
||
byte_size = float(segments[0]) * float(units.get(segments[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why float? bytes are integers...
|
||
|
||
class DiskAvailability(OpenShiftCheck): | ||
"""Check that recommended disk space is available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how else we should annotate it but it should be clear that this is disk space prior to an install. If running preflight checks before an upgrade, they may have used a bunch of the space and that should be fine; we should probably not even run this check in that case. We would do a different kind of calculation for hosts that are already installed and running.
tags = ["preflight"] | ||
|
||
recommended_memory_mb = { | ||
"nodes": 8000.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is memory though which isn't coming from docker info... comes from ansible setup. Whatever our docs say, nobody measures RAM in GB (only disk manufacturers get to pull that crap), so it seems to me MiB/GiB should be used.
|
||
def run(self, tmp, task_vars): | ||
group_names = get_var(task_vars, "group_names", default=[]) | ||
memoryfree = get_var(task_vars, "ansible_memtotal_mb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, shouldn't memoryfree
be named total_memory
or similar?
81fef87
to
1af86d7
Compare
@ncdc thanks for taking a look! I have seen that, but I still do not understand it.
|
The stacktrace isn't being helpful here unfortunately, see openshift/origin#13759 -- I'll work on the stacktrace being better soon. The error may have been in container cleanup? Unclear, unfortunately. |
- only support a fixed list of recommended values for now, no overwriting via Ansible variables (keep it simple, add features as needed). - implement is_active: run this check only for hosts that have a recommended disk space. - test priority of mount paths / and /var.
- Expose only is_active and no other method. - Move general comment to module docstring.
- Fix required memory for etcd hosts (10 -> 20 GB), as per documentation. - Some changes to make the code more similar to the similar DiskAvailability check. - Do not raise exception for hosts that do not have a recommended memory value (those are ignored anyway through `is_active`, so that was essentially dead code). - Test that the required memory is the max of the recommended memories for all groups assigned to a host. E.g. if a host is master and node, we should check that it has enough memory to be a master, because the memory requirement for a master is higher than for a node.
9cb85c9
to
2aca8a4
Compare
Evaluated for openshift ansible test up to 2aca8a4 |
aos-ci-test |
[merge] |
merge test flaked on openshift/origin#13271 re[merge] |
merge flaked on openshift/origin#13271 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/49/) (Base Commit: 00095b2) |
Test failure is a bug in the CI job -- openshift/origin#13806 [merge] |
@sdodson do we have documented anywhere what are the required checks before a PR can be merged? Now in this PR I do not know if we need to re-trigger aos-ci-test. Would be useful to operate like Origin -- where the [merge] automatically runs the tests it needs, without requiring 2 different bot commands, is the plan going in that direction? |
The configuration for branches is here. We do aim to move more and more tests away from AOS CI and onto the Origin CI, and aid in that effort is greatly appreciated, but it's not trivial and will take some time. |
@stevekuznetsov thanks for the pointer. Which branch should we refer to to the latest version of that file / what is currently enforced in in-flight PRs? |
re[merge] |
Evaluated for openshift ansible merge up to 2aca8a4 |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/254/) (Base Commit: 9ace041) |
Last flake was openshift/origin#13814. Since the code here is not related to installs nor upgrades, is strictly contained in the |
Related Trello card: https://trello.com/c/cLu85dOf/71-uhg-diagnostics-check-memory-matches-requirements?menu=filter&filter=[diagnostics]
Related Trello card: https://trello.com/c/5pr2WcQH/43-diagnostics-additional-pre-flight-checks?menu=filter&filter=[diagnostics]
cc @brenton @rhcarvalho