Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ram and non-containerized storage preflight check #3518

Merged

Conversation

memoryfree = get_var(task_vars, "facter_memoryfree")

recommended_memory = get_var(task_vars, "min_memory_gb")
recommended_storage = get_var(task_vars, "min_diskspace_gb")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brenton not sure where these values should be coming from

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the supported values here:
https://docs.openshift.com/container-platform/latest/install_config/install/prerequisites.html#hardware

Even though there may be different requirements for older versions of the product, in general I think we should always encourage customers to meet the requirements of the latest version so they don't hit problems when they upgrade.

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch from a366cd2 to 0391ebc Compare February 27, 2017 22:54
# any images are missing, or error occurs during pull
def run(self, tmp, task_vars):
ansible_mounts = get_var(task_vars, "ansible_mounts")
diskfree = self.to_gigabytes(self.get_disk_space(ansible_mounts))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brenton or @rhcarvalho This is the total value of the / filesystem, not sure if I should be checking that, or openshift-xfs-vol-dir. I am getting this information from ansible_mounts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber, do you have any idea where openshift-xfs-vol-dir came from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a disk label or a lv name? In any case, I'm not sure we can rely on specific values.

I think the best we can do is match a device with specific directories we care about.

For containerized docker info would work for devicemapper w/ a thin pool, not sure about other backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For containerized docker info would work for devicemapper w/ a thin pool, not sure about other backends.

Thanks, for now at least, I have added support for checking containerized installs that use devicemapper for storage driver

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch 4 times, most recently from 2ee2091 to 0deaaa8 Compare February 27, 2017 23:01
from openshift_checks.mixins import NotContainerizedMixin


class RamDiskspaceAvailability(NotContainerizedMixin, OpenShiftCheck):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the memory and diskspace checks be separated?

I would think that Memory checks would apply equally to containerized/non-containerized.

Are there plans to also add a disk space check for containerized installs as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to splitting into 2 checks.

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch from 2420583 to d93cc16 Compare February 28, 2017 23:28

def run(self, tmp, task_vars):
group_names = get_var(task_vars, "group_names", default=[])
memoryfree = get_var(task_vars, "facter_memoryfree")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking free memory is not the right measure here, as I believe cache use is excluded, so it's kind of an arbitrarily changing value. Maybe "memory.system.available" however I'm not sure that's in the task_vars and I wonder if we really want to base it on memory usage at time of operation. I think it would be least surprising to use memorysize (total system memory).

group_names = get_var(task_vars, "group_names", default=[])
memoryfree = get_var(task_vars, "facter_memoryfree")

recommended_memory_gb = self.recommended_master_memory_gb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this begs for a ternary:

    recommended_memory_gb = self.recommended_node_memory_gb if "nodes" in group_names else self.recommended_master_memory_gb

Pythonistas? :)

Also shouldn't this be the other way around? Because masters can be nodes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masters should be nodes, but in general they are unschedulable nodes, so they do not require additional disk space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pythonistas? :)

I'd go with a dict and max:

class MemoryAvailability(OpenShiftCheck):
    recommended_memory = {
        "nodes": X,
        "masters": Y,
        "etcd": Z,
        "nfs": M,
        "lb": N,
    }

    def run(...):
        # ...

        recommended_memory = max(self.recommended_memory.get(group, 0) for group in group_names)


@staticmethod
def mem_to_float(mem):
return float(mem.rstrip(" GB"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's GiB... or MB? I think it's the "display size" which would be MB if < 1 GB.

@@ -0,0 +1,31 @@
# pylint: disable=missing-docstring
from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerizedMixin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not run this in containerized installs too?

@@ -0,0 +1,44 @@
# pylint: disable=missing-docstring
from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerizedMixin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding but doesn't this exclude containerized installs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a leftover, unused import.
Unlike Go, the Python interpreter doesn't complain, but I'm sure this is failing our CI.

def get_disk_space(ansible_mounts, is_containerized):
if len(ansible_mounts):
for mnt in ansible_mounts:
if mnt.get("mount") == "/":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the requirements are for the filesystem containing /var (https://docs.openshift.com/container-platform/3.4/install_config/install/prerequisites.html#hardware). So should check if /var is mounted separately before using /.


@staticmethod
def to_gigabytes(total_bytes):
return total_bytes / 1073741824
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but wouldn't this be gibibytes instead of gigabytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!


def run(self, tmp, task_vars):
group_names = get_var(task_vars, "group_names", default=[])
memoryfree = get_var(task_vars, "facter_memoryfree")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can guarantee that factor will be available on the remote hosts. ansible_memfree_mb would be the equivalent. That said, I don't think you can trust either factor_memoryfree or ansible_memfree_mb here, since they both would include cached data.

Instead of trying to get fancy, I think it might be best to stick with comparing against ansible_memtotal_mb. There is also ansible_memory_mb that provides access to swap, physical memory info, and memory values adjusted for cached data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ansible_memtotal_mb

if "nodes" in group_names:
recommended_memory_gb = self.recommended_node_memory_gb

if self.mem_to_float(memoryfree) < recommended_memory_gb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to convert the retrieved value to bytes and do a int comparison instead?

@@ -0,0 +1,44 @@
# pylint: disable=missing-docstring
from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerizedMixin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a leftover, unused import.
Unlike Go, the Python interpreter doesn't complain, but I'm sure this is failing our CI.

from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerizedMixin

import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case of imported and not used.

tags = ["preflight"]

recommended_node_diskspace_gb = 15.0
recommended_master_diskspace_gb = 40.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about things like etcd?

group_names = get_var(task_vars, "group_names", default=[])
memoryfree = get_var(task_vars, "facter_memoryfree")

recommended_memory_gb = self.recommended_master_memory_gb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pythonistas? :)

I'd go with a dict and max:

class MemoryAvailability(OpenShiftCheck):
    recommended_memory = {
        "nodes": X,
        "masters": Y,
        "etcd": Z,
        "nfs": M,
        "lb": N,
    }

    def run(...):
        # ...

        recommended_memory = max(self.recommended_memory.get(group, 0) for group in group_names)

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch 2 times, most recently from 1fe382c to d2e2d58 Compare March 2, 2017 19:57
@brenton brenton changed the title [WIP] add ram and storage preflight check add ram and storage preflight check Mar 9, 2017
@juanvallejo
Copy link
Contributor Author

juanvallejo commented Mar 9, 2017

@sdodson, @rhcarvalho, or @sosiouxme wondering if you could give this patch one last pass? Thanks



if __name__ == '__main__':
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this available to other roles? I could see this being of value in the docker-related roles as well. It might also be nice to return back at least some of the values as 'facts' instead of just a single info dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update this to return its values under ansible_facts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe it would be better to rename the module from docker_info to simply docker? That way, it makes sense for it to return not just info, but docker client info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker_facts would probably be better, docker would shadow the older (and deprecated) docker module upstream.

def main():
client = AnsibleDockerClient(
argument_spec=dict(
name=dict(type='list'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name doesn't appear to be used anywhere.

I would add supports_check_mode=True to params for AnsibleDockerClient.

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch from e7bc690 to 43680e3 Compare March 10, 2017 23:40

recommended_node_space_gb = 15.0
recommended_master_space_gb = 40.0
minimum_docker_space_percent = 5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be configurable in run time.

They may be defined as variables in the preflight/check.yml playbook, and their names should have some prefix to avoid name collisions...

diskfree = self.to_gibibytes(self.get_disk_space(ansible_mounts))

# if running containerized installation check
# that available space is not below 5% instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should not refer to specific values like "5%" that are defined in a variable elsewhere.

(multiple occurrences, check all)

info = docker_facts.get("docker_facts")
status = dict(info.get("DriverStatus"))

if info.get("Driver", None) == self.supported_docker_dev_driver:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of this check doesn't seem to be clearly defined when the driver is not what we consider to be "supported".
We need to clarify the intent.

status = dict(info.get("DriverStatus"))

if info.get("Driver", None) == self.supported_docker_dev_driver:
disktotal = status.get("Data Space Total", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the guarantee we have on these key names "Data Space Total", etc? I'm afraid those are not guaranteed APIs to rely on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, they do not seem to be guaranteed as far as I have seen.

Another idea to retrieve these values would be to use the docker_facts module to get the value of DockerRootDir (/var/lib/docker by default), and perform df on it:

$ df --output=avail -B G  /var/lib/docker | grep "G" -m 1
 20G

Simpler yet, we could just use df to get the mountpoint for /var/lib/docker, then look for that in ansible_mounts under task_vars and get available and total values from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhcarvalho
Upon looking at documentation for devmapper https://github.com/docker/docker/tree/master/daemon/graphdriver/devmapper#information-on-docker-info, the values Data Space Total and Data Space Available appear in the linked readme. Although the values underneath the storage driver vary from driver to driver, since we are first checking that the host machine uses devicemapper, I would prefer to rely on these values


if info.get("Driver", None) == self.supported_docker_dev_driver:
disktotal = status.get("Data Space Total", None)
diskavail = status.get("Data Space Available", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless in a context that would otherwise require None for clarity, in Python dictionaries dict.get has a default value of None, so it is more common to write:

some_dict.get("some key")

However, in this case here there is a problem with data types. I'd avoid mixing whatever is that type we expect to find in the dictionary (apparently a str) x NoneType.

"pb": 1000 ** 5
}

segments = strfmt.lower().split(" ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can easily be an AttributeError exception here. We use strfmt as if it was a str, but in the code above we may clearly be calling this function with None as the first argument...

In a dynamically typed language, we have to keep our usage of data types sane.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I now make sure to pass a string type to this method. Corrected this in other places as well


recommended_memory_gb = max(self.recommended_memory_gb.get(group, 0) for group in group_names)

if self.to_gigabytes(memoryfree) < recommended_memory_gb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we store the recommended values in the unit we want to compare to (MB)?

recommended_memory_gb = {
"nodes": 8.0,
"masters": 16.0,
"etcd": 20.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good practice to include a comma at the end of every line in multiline definitions like this.
It makes it easier to add / remove / move items without having to make changes to other lines.

recommended_memory_gb = max(self.recommended_memory_gb.get(group, 0) for group in group_names)

if self.to_gigabytes(memoryfree) < recommended_memory_gb:
kind = "master" if "masters" in group_names else "node"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. There are many other possibilities for group_names, we can't call everything else a "node" (there are etcd, nfs, ...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix. Originally referenced this check to determine this logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic there is different, it is additive: start with an empty set and add items based on the group names. Here we are incorrectly assuming that everything that is not master is a node.

if info.get("Driver", None) == self.supported_docker_dev_driver:
disktotal = status.get("Data Space Total", None)
diskavail = status.get("Data Space Available", None)
diskleft_percent = self.to_bytes(diskavail) / self.to_bytes(disktotal) * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be division by zero here...

diskleft_percent = self.to_bytes(diskavail) / self.to_bytes(disktotal) * 100

# fail if less than 5% of available dataspace space left
if int(diskleft_percent) < self.minimum_docker_space_percent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be mindful of rounding problems here.

# fail if less than 5% of available dataspace space left
if int(diskleft_percent) < self.minimum_docker_space_percent:
msg = "The amount of data space remaining (%s%%) for the docker storage driver is below %s%%" \
% (int(diskleft_percent), self.minimum_docker_space_percent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use %s (formatting of strings) with int values.

And we should prefer using str.format instead of the % operator.
(multiple occurrences, check all)

recommended_diskspace_gb = self.recommended_master_space_gb

if "nodes" in group_names:
recommended_diskspace_gb = self.recommended_node_space_gb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is wrong.

  • There are more group names that masters and nodes.
  • If "nodes" in group_names, we should not overwrite recommended_diskspace_gb -- what if "masters" is also in group_names?!

if "nodes" in group_names:
recommended_diskspace_gb = self.recommended_node_space_gb

if float(diskfree) < recommended_diskspace_gb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the conversion to float?!


# if running containerized installation check
# that available space is not below 5% instead
if is_containerized:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of reading and understanding what this check does, please make the "run" method short and clear on its intents.

For example, we could have run_containerized and run_non_containerized methods to which run delegates depending on is_containerized.

if mnt.get("mount") == "/":
root_size_available = mnt.get("size_available")
if mnt.get("mount") == "/var":
return mnt.get("size_available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return early on /var, but keep iterating when we see /?!

We need to clarify the intent here, what is this function promising to return? "get disk space" is not a great name, we can try to improve it once the expected output is clear.

From the recommended available storage, is that 100% towards Docker storage or do we need to look at Docker storage + other mount points?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return immediately if /var exists at all. If not, / is the fallback mount path. I update this method to be a bit more legible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the recommended available storage, is that 100% towards Docker storage or do we need to look at Docker storage + other mount points?

I meant for this to be a check on the mount point for the docker volume. At least in my case, the mount point for its volume was /, so I checked for the overall disk space remaining on that point

return {"changed": False}

@staticmethod
def get_disk_space(ansible_mounts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring here would make it clear what the ansible_mounts parameter should be for the function to work properly. It's a list of hashes I guess?

ansible_mounts = []
ansible_mounts.append({'size_available': 12345, 'mount': '/'})
ansible_mounts.append({'size_available': 34512, 'mount': '/var'})

Right? (plus other possible params)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. ansible_mounts is a map like you described. I updated the method to be more legible, added a docstring

tags = ["preflight"]

recommended_memory_mb = {
"nodes": 8000.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000 or 8 * 1024?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 8000 is fine, unless we prefer MiB instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docs use the decimal prefixes, let's go with powers of 10 then.
https://docs.openshift.org/latest/install_config/install/prerequisites.html

Copy link
Contributor

@tbielawa tbielawa Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scanned over the output from docker info and saw that it reports different measurements in different prefix unit systems.

...
Storage Driver: devicemapper
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Data Space Used: 1.42 GB
 Data Space Total: 44.43 GB
 Data Space Available: 43.01 GB
 Metadata Space Used: 1.507 MB
 Metadata Space Total: 511.7 MB
 Metadata Space Available: 510.2 MB
 Thin Pool Minimum Free Space: 4.443 GB
...
Total Memory: 15.11 GiB
...

Sizes are reported using the SI base-10 system (kB, MB, GB) and memory is reported using the NIST base-2 system (KiB, GiB, etc).

As it was pointed out before, I do maintain a library, 'bitmath' (200 unit tests, in RHEL/Fedora, etc etc, etc) that handles all of that automatically (and more).

Here's a quick gist I made that takes code from this PR and shows the bitmath equivalent.

I think that trying to get bitmath available as an RPM in the RHSM channels would be a big effort, and getting it into Atomic is nigh impossible. I'm OK w/ vendoring/slipping the bitmath library into this project.

But that's a bigger conversation we should probably have as a team.

edit#2: If you want to try bitmath you can pip install bitmath or dnf/yum install python-bitmath (or depending on your distribution version: python2-bitmath or python3-bitmath)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is memory though which isn't coming from docker info... comes from ansible setup. Whatever our docs say, nobody measures RAM in GB (only disk manufacturers get to pull that crap), so it seems to me MiB/GiB should be used.

return {"changed": False}

@staticmethod
def to_gigabytes(size_in_mb):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will remove

recommended_memory_mb = max(self.recommended_memory_mb.get(group, 0) for group in group_names)

if memoryfree < recommended_memory_mb:
msg = "Available memory ({available} GB) below recommended storage. Minimum required memory: {recommended} GB".format(available=memoryfree, recommended=recommended_memory_mb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too long line...

The reported units are wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix, taking care of linting issues towards the end


segments = strfmt.lower().split(" ")
if not units.get(segments[1], False):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types: why does this function sometimes return a number / float, sometimes NoneType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will address this throughout the code

"tib": 1024 ** 4,
"tb": 1000 ** 4,
"pib": 1024 ** 5,
"pb": 1000 ** 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still puzzled about we doing these conversions this way...

But anyway, a nit but part of learning "the Python way":

1000**5,
  • The ** operator is typically not surrounded by spaces.
  • Add a trailing comma to this type of 'lists'.

Copy link
Contributor Author

@juanvallejo juanvallejo Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure what specific units to expect, so this at least covers standard cases. @tbielawa has written a great python module that we could use in cases such as this one if we are willing to include it in our openshift-ansible image / make it a requirement on the host machine. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do any conversions :-)
"the best code is no code at all"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am certainly for this, I am not sure to what extent we can guarantee that the units returned by devicemapper's Data Space Total, Data Space Available will be consistent in all cases


@staticmethod
def to_gigabytes(total_bytes):
return total_bytes / 1000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we also need some clarity if we should be using base 10^3 or 2^10.
https://en.wikipedia.org/wiki/Byte#Unit_multiples

Would be easier to read a formula than counting zeros:

total_bytes / 1000000000
total_bytes / (10**3)**3
total_bytes / 1000**3

def containerized_volume_check(self, task_vars):
docker_facts = self.module_executor("docker_facts", {}, task_vars)
info = docker_facts.get("docker_facts")
status = dict(info.get("DriverStatus"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> dict(None)
TypeError: 'NoneType' object is not iterable
>>> info = None
>>> info.get("DriverStatus")
AttributeError: 'NoneType' object has no attribute 'get'

failed, msg = self.containerized_volume_check(task_vars)
if failed:
return {"failed": True, "msg": msg}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it stands now, we will call both containerized_volume_check and noncontainerized_volume_check when is_containerized is true and failed is false...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that :)

if "masters" in group_names:
return self.recommended_master_space_gb

return self.recommended_node_space_gb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also assuming that everything that is not a master is a node. If we expect to run this check on non-master-non-node hosts, we should follow the docs and use the proper storage required for each type of host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, will go ahead and use info from https://docs.openshift.com/container-platform/3.4/install_config/install/prerequisites.html#hardware, however it only contains reqs for masters, nodes, and etcd

@tbielawa
Copy link
Contributor

I'm not sure if it is showing up anymore, due to the folded 'Hide Outdated' segments above, but I did post another comment with more information on bitmath and the problem of converting/parsing inconsistent units. Direct link to above comment here

#3518 (comment)

from openshift_checks import OpenShiftCheck, get_var


class DiskAvailability(OpenShiftCheck):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the file is inconsistent with the check name and class name.

def main():
client = AnsibleDockerClient(
argument_spec=dict(
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. Why not make it one line...

name = "disk_availability"
tags = ["preflight"]

recommended_diskspace = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a comment about why it's base 10, and a link to origin docs on where this recommendation comes from?

}

minimum_docker_space_percent = 5.0
supported_docker_dev_driver = "devicemapper"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems conceptually like it should be a list (of one currently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of removing the check for the storage driver from this PR altogether, and just handling it as part of the docker_storage_check PR https://github.com/openshift/openshift-ansible/pull/3787/files#diff-c3d0f0c6a0d94ad7d6c5a62da695118dR17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, remembering a very similar conversation I had with @rhcarvalho about this, it would be better to have the storage driver check be its own preflight check, allowing this PR and #3787 to focus on a single thing. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, belatedly... yes.


if is_containerized:
failed, msg = self.containerized_volume_check(task_vars)
if failed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not be simplified to:

if is_containerized:
    failed, msg = self.containerized_volume_check(task_vars)
else:
    failed, msg = self.noncontainerized_volume_check(ansible_mounts, task_vars)
return {"failed": failed, "changed": False, "msg": msg}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, even in the case of non-containerized components, https://docs.openshift.org/latest/install_config/install/prerequisites.html indicates nodes should have docker storage of 15GB free. Seems like the code could be reused for that.

failed, msg = self.noncontainerized_volume_check(ansible_mounts, task_vars)
return {"failed": failed, "msg": msg}

def containerized_volume_check(self, task_vars):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring is in order here, e.g.

"""A contained component uses only docker storage, so check docker storage info against requirements."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, per the comments on #3518 (comment), I have decided to leave all aspects of the containerized check to #3787. (I will further refine that PR, rename it so the name is consistent with the check in this PR, and split out the storage driver check from it into its own check). Any of your feedback on #3787 is definitely welcome :)

openshift_diskfree_bytes = self.get_openshift_disk_availability(ansible_mounts)

if openshift_diskfree_bytes < recommended_diskspace_bytes:
msg = ("Available disk space ({diskfree} GB) for OpenShift volume below recommended storage. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for volume containing /var" - there is no "OpenShift volume"

if not units.get(segments[1]):
return 0

byte_size = float(segments[0]) * float(units.get(segments[1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why float? bytes are integers...



class DiskAvailability(OpenShiftCheck):
"""Check that recommended disk space is available."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how else we should annotate it but it should be clear that this is disk space prior to an install. If running preflight checks before an upgrade, they may have used a bunch of the space and that should be fine; we should probably not even run this check in that case. We would do a different kind of calculation for hosts that are already installed and running.

tags = ["preflight"]

recommended_memory_mb = {
"nodes": 8000.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is memory though which isn't coming from docker info... comes from ansible setup. Whatever our docs say, nobody measures RAM in GB (only disk manufacturers get to pull that crap), so it seems to me MiB/GiB should be used.


def run(self, tmp, task_vars):
group_names = get_var(task_vars, "group_names", default=[])
memoryfree = get_var(task_vars, "ansible_memtotal_mb")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, shouldn't memoryfree be named total_memory or similar?

@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch from 81fef87 to 1af86d7 Compare April 6, 2017 21:06
@juanvallejo juanvallejo changed the title add ram and storage preflight check add ram and non-containerized storage preflight check Apr 6, 2017
@rhcarvalho
Copy link
Contributor

@ncdc thanks for taking a look! I have seen that, but I still do not understand it.

  • INFO: Error running cluster/log-dump.sh...: looks like an error but is not, according to @stevekuznetsov on email:

    This is a red-herring message that has been around for a long time. Some part of the Kubernetes e2e framework is trying to run a script that doesn't exist. It does not cause a failure and happens on every e2e run. I think the previous consensus was that we don't want to carry yet another UPSTREAM commit to get rid of the function in our vendored k8s code as it's benign. I can understand how this is confusing, though, and we can revisit that decision.

  • SUCCESS!, PASS, Test Suite Passed: that means we're good, right? :-)

  • [ERROR] PID 26302: hack/lib/cleanup.sh:74: [[ -n "${OS_DEBUG:-}" ]] exited with status 1.: what should we make of this?! Failed to dump container logs?

@stevekuznetsov
Copy link
Contributor

The stacktrace isn't being helpful here unfortunately, see openshift/origin#13759 -- I'll work on the stacktrace being better soon. The error may have been in container cleanup? Unclear, unfortunately.

juanvallejo and others added 5 commits April 17, 2017 13:29
- only support a fixed list of recommended values for now, no
overwriting via Ansible variables (keep it simple, add features as
needed).

- implement is_active: run this check only for hosts that have a
recommended disk space.

- test priority of mount paths / and /var.
- Expose only is_active and no other method.
- Move general comment to module docstring.
- Fix required memory for etcd hosts (10 -> 20 GB), as per
documentation.
- Some changes to make the code more similar to the similar
DiskAvailability check.
- Do not raise exception for hosts that do not have a recommended memory
value (those are ignored anyway through `is_active`, so that was
essentially dead code).
- Test that the required memory is the max of the recommended memories
for all groups assigned to a host. E.g. if a host is master and node, we
should check that it has enough memory to be a master, because the
memory requirement for a master is higher than for a node.
@juanvallejo juanvallejo force-pushed the jvallejo/ram_disk_space_checker branch from 9cb85c9 to 2aca8a4 Compare April 17, 2017 17:29
@openshift-bot
Copy link

Evaluated for openshift ansible test up to 2aca8a4

@rhcarvalho
Copy link
Contributor

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 2aca8a4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 2aca8a4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 2aca8a4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 2aca8a4 (logs)

@rhcarvalho
Copy link
Contributor

[merge]

@juanvallejo
Copy link
Contributor Author

merge test flaked on openshift/origin#13271 re[merge]

@juanvallejo
Copy link
Contributor Author

juanvallejo commented Apr 18, 2017

merge flaked on openshift/origin#13271
re[merge]

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/49/) (Base Commit: 00095b2)

@rhcarvalho
Copy link
Contributor

Test failure is a bug in the CI job -- openshift/origin#13806

[merge]

@rhcarvalho
Copy link
Contributor

@sdodson do we have documented anywhere what are the required checks before a PR can be merged? Now in this PR I do not know if we need to re-trigger aos-ci-test.

Would be useful to operate like Origin -- where the [merge] automatically runs the tests it needs, without requiring 2 different bot commands, is the plan going in that direction?

@stevekuznetsov
Copy link
Contributor

The configuration for branches is here. We do aim to move more and more tests away from AOS CI and onto the Origin CI, and aid in that effort is greatly appreciated, but it's not trivial and will take some time.

@rhcarvalho
Copy link
Contributor

@stevekuznetsov thanks for the pointer. Which branch should we refer to to the latest version of that file / what is currently enforced in in-flight PRs?

@stevekuznetsov
Copy link
Contributor

@sdodson
Copy link
Member

sdodson commented Apr 19, 2017

re[merge]
EXPOSE THE KUBECONFIG known issue

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to 2aca8a4

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/254/) (Base Commit: 9ace041)

@rhcarvalho
Copy link
Contributor

Last flake was openshift/origin#13814.

Since the code here is not related to installs nor upgrades, is strictly contained in the openshift_health_checker role, merge failed at least 8 times with known flakes linked above, and aos-ci-test and Travis are green, we'll use the temporary exception to merge it manually.

@rhcarvalho rhcarvalho merged commit 3820dab into openshift:master Apr 20, 2017
@juanvallejo juanvallejo deleted the jvallejo/ram_disk_space_checker branch April 20, 2017 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants