Change flock command argument #358

dmuldrew · 2020-12-15T01:08:00Z

Purpose

Use more standard flag for flock, to correspond with the flock installed on the ssh server container. Improve error handling of executed commands to provide more information from stderr.

What the code is doing

Saves stdout and stderr buffers to variables, then prints strerr in the IOError message, which for flock error looks like the following:

 if len(command_error) != 0:
     raise IOError(err_message + '\n' + command_error)
E           OSError: Failed to generate id for new scenario
E           ['flock: unrecognized option: e\n', 'BusyBox v1.31.1 () multi-call binary.\n', '\n', 'Usage: flock [-sxun] FD|{FILE [-c] PROG ARGS}\n', '\n', '[Un]lock file descriptor, or lock FILE, run PROG\n', '\n', '\t-s\tShared lock\n', '\t-x\tExclusive lock (default)\n', '\t-u\tUnlock FD\n', '\t-n\tFail rather than wait\n']

powersimdata/data_access/csv_store.py:62: OSError

The provided error Failed to generate id for new scenario wasn't enough information diagnose the underlying issue.

Testing

Automated testing of dockerized framework. Checked to make sure flock on the compute server also has an -x flag. I'm actually not sure why flock -e works on the compute server given the following:

Time estimate

~15 min

rouille · 2020-12-15T01:24:02Z

When looking at the manual of the flock command via man flock, I have got the following option:

       -e, -x, --exclusive
              Obtain an exclusive lock, sometimes called a write lock.  This is the default.

danielolsen · 2020-12-15T01:36:01Z

What version of flock is installed in the ssh server container? Googling BusyBox leads me to believe this is something for embedded systems with very limited resources. Is this something we're choosing intentionally, or is this chosen for us by default?

powersimdata/data_access/csv_store.py

dmuldrew · 2020-12-15T20:39:09Z

@danielolsen I'm using an image based on Alpine which Docker currently recommends:
https://docs.docker.com/engine/examples/running_ssh_service/
Initially I started with a base image of Ubuntu but ran to an Ubuntu-related issue of failing builds when installing the ssh server. This coupled the maintenance benefits of leaving the ssh server configuration to experts, lead me to use their recommended project as a base image. I suspect the Docker team became tired of updating their build instructions, and recognized the potential liability of accidentally recommending ssh configs which create a security risk.

danielolsen · 2020-12-15T21:50:15Z

@dmuldrew is the goal of the ssh server container to be a reproduction of our existing setup, something which we hope to distribute to externals users, or both? This issue seems to be fairly easy to resolve, but there may be many other smaller issues (e.g. mawk/gawk) that arise because of choices we've made about which distros we're using in which containers.

powersimdata/data_access/csv_store.py

dmuldrew · 2020-12-15T22:26:45Z

@danielolsen I think the goal is to reduce testing with our production server and model more of our existing setup so that we can automate more integration tests with Github actions. I don't think we want to distribute to external users a platform which uses ssh? It's nontrivial to setup and is specific to our infrastructure. It's possible to make a Docker build script that will a make a container almost identical to our compute server, however that would be a lot more work to get everything to build correctly and more complex to maintain.

jenhagg · 2020-12-15T22:36:29Z

Yep, the goal of the ssh container is purely for testing. For external use we'll provide a different container which uses the shared volume (so no ssh).

danielolsen · 2020-12-15T23:57:10Z

If we're trying to mock our production server setup for internal testing, do we know that all commands that will work on an Alpine build will also work on Ubuntu? Otherwise we may think something will work but it could fail unexpectedly as soon as we deploy.

What blocks us from using Ubuntu directly?

rouille · 2020-12-16T01:45:36Z

If we're trying to mock our production server setup for internal testing, do we know that all commands that will work on an Alpine build will also work on Ubuntu? Otherwise we may think something will work but it could fail unexpectedly as soon as we deploy.

What blocks us from using Ubuntu directly?

I agree with @danielolsen

jenhagg · 2020-12-17T20:13:16Z

powersimdata/data_access/csv_store.py

@@ -55,6 +55,13 @@ def _execute_and_check_err(self, command, err_message):
        :return: (*str*) -- standard output stream.
        """
        stdin, stdout, stderr = self.data_access.execute_command(command)
-        if len(stderr.readlines()) != 0:
+        command_output = stdout.readlines()


I know it's not part of your changes, but just noticed the return type in the docstring should actually be list

Yeah, I'm not a fan of passing stream references around. If you read the buffer and don't save the contents somewhere, you can potentially have later code that thinks there was no error.

jenhagg · 2020-12-17T21:31:49Z

Mentioning a couple points about the docker image for continuity. The ideal situation would be using the exact same image in dev/test that we use in practice. So using ubuntu would be closer to reality but still different than the server (which is not a docker image) - it wouldn't necessarily have important aspects like the nfs mount, currently installed software which is managed manually, users, etc. This could actually lead to false positives, if we put too much trust in tests that aren't truly an exact replica of the production server.

I think the salient point here is we are sacrificing something, and as long as we know exactly what that is we can use the tools effectively. In this case, the docker test infra will essentially just test code that interacts with the filesystem (so any posix compliant image should work). Anything else that depends on os functionality is explicitly not tested by this setup, so we need to treat changes of that sort differently. I think that's ok based on the relative frequency (there are way more changes related to pure python than there are to the interaction with external dependencies).

Last point is that I'd like to not make assumptions about what the future architecture looks like. We do have some abstract long term goals but those are (hopefully) independent of this stuff, which is basically implementation details. What I mean, is we may or may not have a server at some point, or the server could be used solely to run containers, or we may have a variable number of servers but they are in the cloud, or we might use some "containers as a service" thing, etc. What we're doing here is addressing a known short term need with a reasonable amount of effort, but avoiding trying to address a possible longer term need which would be much harder, and with potentially less (or zero) payoff. Sorry this is kind of meta but just want to make sure we collectively avoid falling into some kind of design trap (not sure exactly which one.. something about planning involving unknowns).

dmuldrew added this to the Hello Darkness My Old Friend milestone Dec 15, 2020

dmuldrew requested review from danielolsen, rouille and jenhagg December 15, 2020 01:08

dmuldrew self-assigned this Dec 15, 2020

jenhagg reviewed Dec 15, 2020

View reviewed changes

powersimdata/data_access/csv_store.py Outdated Show resolved Hide resolved

dmuldrew force-pushed the dmuldrew/ssh_server_command_update branch from a493053 to 76736b4 Compare December 15, 2020 21:44

rouille reviewed Dec 15, 2020

View reviewed changes

powersimdata/data_access/csv_store.py Outdated Show resolved Hide resolved

rouille changed the title ~~Dmuldrew/ssh server command update~~ Change flock command argument Dec 17, 2020

refactor: use flock -x flag instead of -e

e3e97cc

dmuldrew force-pushed the dmuldrew/ssh_server_command_update branch from 9b1a1ed to e3e97cc Compare December 17, 2020 19:48

jenhagg reviewed Dec 17, 2020

View reviewed changes

jenhagg approved these changes Dec 17, 2020

View reviewed changes

docs: update execute server command doc strings

39b780e

dmuldrew merged commit 4ea408c into develop Dec 18, 2020

dmuldrew deleted the dmuldrew/ssh_server_command_update branch December 18, 2020 19:29

ahurli mentioned this pull request Mar 11, 2021

Develop into Master #410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change flock command argument #358

Change flock command argument #358

dmuldrew commented Dec 15, 2020 •

edited

Loading

rouille commented Dec 15, 2020

danielolsen commented Dec 15, 2020

dmuldrew commented Dec 15, 2020 •

edited

Loading

danielolsen commented Dec 15, 2020

dmuldrew commented Dec 15, 2020 •

edited

Loading

jenhagg commented Dec 15, 2020

danielolsen commented Dec 15, 2020 •

edited

Loading

rouille commented Dec 16, 2020

jenhagg Dec 17, 2020

dmuldrew Dec 17, 2020

jenhagg commented Dec 17, 2020

Change flock command argument #358

Change flock command argument #358

Conversation

dmuldrew commented Dec 15, 2020 • edited Loading

Purpose

What the code is doing

Testing

Time estimate

rouille commented Dec 15, 2020

danielolsen commented Dec 15, 2020

dmuldrew commented Dec 15, 2020 • edited Loading

danielolsen commented Dec 15, 2020

dmuldrew commented Dec 15, 2020 • edited Loading

jenhagg commented Dec 15, 2020

danielolsen commented Dec 15, 2020 • edited Loading

rouille commented Dec 16, 2020

jenhagg Dec 17, 2020

Choose a reason for hiding this comment

dmuldrew Dec 17, 2020

Choose a reason for hiding this comment

jenhagg commented Dec 17, 2020

dmuldrew commented Dec 15, 2020 •

edited

Loading

dmuldrew commented Dec 15, 2020 •

edited

Loading

dmuldrew commented Dec 15, 2020 •

edited

Loading

danielolsen commented Dec 15, 2020 •

edited

Loading