Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change flock command argument #358

Merged
merged 2 commits into from
Dec 18, 2020
Merged

Conversation

dmuldrew
Copy link
Collaborator

@dmuldrew dmuldrew commented Dec 15, 2020

Pull Request Etiquette doc

Purpose

Use more standard flag for flock, to correspond with the flock installed on the ssh server container. Improve error handling of executed commands to provide more information from stderr.

What the code is doing

Saves stdout and stderr buffers to variables, then prints strerr in the IOError message, which for flock error looks like the following:

 if len(command_error) != 0:
     raise IOError(err_message + '\n' + command_error)
E           OSError: Failed to generate id for new scenario
E           ['flock: unrecognized option: e\n', 'BusyBox v1.31.1 () multi-call binary.\n', '\n', 'Usage: flock [-sxun] FD|{FILE [-c] PROG ARGS}\n', '\n', '[Un]lock file descriptor, or lock FILE, run PROG\n', '\n', '\t-s\tShared lock\n', '\t-x\tExclusive lock (default)\n', '\t-u\tUnlock FD\n', '\t-n\tFail rather than wait\n']

powersimdata/data_access/csv_store.py:62: OSError

The provided error Failed to generate id for new scenario wasn't enough information diagnose the underlying issue.

Testing

Automated testing of dockerized framework. Checked to make sure flock on the compute server also has an -x flag. I'm actually not sure why flock -e works on the compute server given the following:

Screen Shot 2020-12-14 at 4 53 23 PM

Time estimate

~15 min

@dmuldrew dmuldrew added this to the Hello Darkness My Old Friend milestone Dec 15, 2020
@dmuldrew dmuldrew self-assigned this Dec 15, 2020
@rouille
Copy link
Collaborator

rouille commented Dec 15, 2020

When looking at the manual of the flock command via man flock, I have got the following option:

       -e, -x, --exclusive
              Obtain an exclusive lock, sometimes called a write lock.  This is the default.

@danielolsen
Copy link
Contributor

What version of flock is installed in the ssh server container? Googling BusyBox leads me to believe this is something for embedded systems with very limited resources. Is this something we're choosing intentionally, or is this chosen for us by default?

@dmuldrew
Copy link
Collaborator Author

dmuldrew commented Dec 15, 2020

@danielolsen I'm using an image based on Alpine which Docker currently recommends:
https://docs.docker.com/engine/examples/running_ssh_service/
Initially I started with a base image of Ubuntu but ran to an Ubuntu-related issue of failing builds when installing the ssh server. This coupled the maintenance benefits of leaving the ssh server configuration to experts, lead me to use their recommended project as a base image. I suspect the Docker team became tired of updating their build instructions, and recognized the potential liability of accidentally recommending ssh configs which create a security risk.

@dmuldrew dmuldrew force-pushed the dmuldrew/ssh_server_command_update branch from a493053 to 76736b4 Compare December 15, 2020 21:44
@danielolsen
Copy link
Contributor

@dmuldrew is the goal of the ssh server container to be a reproduction of our existing setup, something which we hope to distribute to externals users, or both? This issue seems to be fairly easy to resolve, but there may be many other smaller issues (e.g. mawk/gawk) that arise because of choices we've made about which distros we're using in which containers.

@dmuldrew
Copy link
Collaborator Author

dmuldrew commented Dec 15, 2020

@danielolsen I think the goal is to reduce testing with our production server and model more of our existing setup so that we can automate more integration tests with Github actions. I don't think we want to distribute to external users a platform which uses ssh? It's nontrivial to setup and is specific to our infrastructure. It's possible to make a Docker build script that will a make a container almost identical to our compute server, however that would be a lot more work to get everything to build correctly and more complex to maintain.

@jenhagg
Copy link
Collaborator

jenhagg commented Dec 15, 2020

Yep, the goal of the ssh container is purely for testing. For external use we'll provide a different container which uses the shared volume (so no ssh).

@danielolsen
Copy link
Contributor

danielolsen commented Dec 15, 2020

If we're trying to mock our production server setup for internal testing, do we know that all commands that will work on an Alpine build will also work on Ubuntu? Otherwise we may think something will work but it could fail unexpectedly as soon as we deploy.

What blocks us from using Ubuntu directly?

@rouille
Copy link
Collaborator

rouille commented Dec 16, 2020

If we're trying to mock our production server setup for internal testing, do we know that all commands that will work on an Alpine build will also work on Ubuntu? Otherwise we may think something will work but it could fail unexpectedly as soon as we deploy.

What blocks us from using Ubuntu directly?

I agree with @danielolsen

@rouille rouille changed the title Dmuldrew/ssh server command update Change flock command argument Dec 17, 2020
@dmuldrew dmuldrew force-pushed the dmuldrew/ssh_server_command_update branch from 9b1a1ed to e3e97cc Compare December 17, 2020 19:48
@@ -55,6 +55,13 @@ def _execute_and_check_err(self, command, err_message):
:return: (*str*) -- standard output stream.
"""
stdin, stdout, stderr = self.data_access.execute_command(command)
if len(stderr.readlines()) != 0:
command_output = stdout.readlines()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not part of your changes, but just noticed the return type in the docstring should actually be list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not a fan of passing stream references around. If you read the buffer and don't save the contents somewhere, you can potentially have later code that thinks there was no error.

@jenhagg
Copy link
Collaborator

jenhagg commented Dec 17, 2020

Mentioning a couple points about the docker image for continuity. The ideal situation would be using the exact same image in dev/test that we use in practice. So using ubuntu would be closer to reality but still different than the server (which is not a docker image) - it wouldn't necessarily have important aspects like the nfs mount, currently installed software which is managed manually, users, etc. This could actually lead to false positives, if we put too much trust in tests that aren't truly an exact replica of the production server.

I think the salient point here is we are sacrificing something, and as long as we know exactly what that is we can use the tools effectively. In this case, the docker test infra will essentially just test code that interacts with the filesystem (so any posix compliant image should work). Anything else that depends on os functionality is explicitly not tested by this setup, so we need to treat changes of that sort differently. I think that's ok based on the relative frequency (there are way more changes related to pure python than there are to the interaction with external dependencies).

Last point is that I'd like to not make assumptions about what the future architecture looks like. We do have some abstract long term goals but those are (hopefully) independent of this stuff, which is basically implementation details. What I mean, is we may or may not have a server at some point, or the server could be used solely to run containers, or we may have a variable number of servers but they are in the cloud, or we might use some "containers as a service" thing, etc. What we're doing here is addressing a known short term need with a reasonable amount of effort, but avoiding trying to address a possible longer term need which would be much harder, and with potentially less (or zero) payoff. Sorry this is kind of meta but just want to make sure we collectively avoid falling into some kind of design trap (not sure exactly which one.. something about planning involving unknowns).

@dmuldrew dmuldrew merged commit 4ea408c into develop Dec 18, 2020
@dmuldrew dmuldrew deleted the dmuldrew/ssh_server_command_update branch December 18, 2020 19:29
@ahurli ahurli mentioned this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants