Skip to content

Contributing

jmcfarland-swri edited this page Jun 1, 2020 · 8 revisions

This document outlines guidelines for contributing to para-atm.

If you are new to git, it is recommended that you read Chapters 1-3 of the Git Book.

Forking Workflow

The general workflow for contributing to para-atm involves first creating a fork of the repository on Github, downloading (cloning) a local copy of the fork, creating a branch, pushing changes to the fork on Github, and then finally submitting a pull request for the changes to be integrated back into the original repository.

The steps are as follows:

  1. Create an account on GitHub.
  2. Create a fork of this repository (as simple as clicking the "Fork" button).
  3. Clone (download) your fork so that you can work on it on your local machine.
  4. Recommended: Prior to starting work on a new feature, open an Issue that outlines a plan for the feature. Describe the intended use case for the new feature. Outline a specific plan for the inputs and outputs of any public functions that the feature will provide. Note what types of data files, if any, the feature is intended to be used with. Opening the issue prior to implementing the feature will allow for discussion and feedback from the team, and may save time by determining the best interface for the function prior to implementation, as opposed to changing the code later at the pull request stage.
  5. Recommended: Create a feature branch for your work, instead of doing work on the master branch. This makes it possible to work on multiple features at once, and having your work on a branch also makes the pull request process easier.
  6. Once the work is ready, open a pull request for it to be integrated back into the main repository.

The following links provide more detail on these steps:

Organization of the code

To begin with, familiarize yourself with the following information on Python packages and modules, particularly Section 6.4: https://docs.python.org/3/tutorial/modules.html.

All new source files should be created under a subdirectory (sub-package) of paraatm. If you are adding functionality that fits within one of the existing categories such as safety analysis (safety) or input/output (io), then you can simply create a new module under the appropriate existing directory.

Otherwise, if you are creating a new feature that does not fall into one of the existing categories, you will need to create a new subdirectory. Suppose that you are creating a new feature called foo, for which the API is just one function named foo. You could create a subdirectory named foo along with a module file named foo.py, producing the following directory structure paraatm/foo/foo.py. However, this becomes confusing because (a) the sub-package, module, and function all have the same name, and (b) a complicated import statement is required when using your feature:

from paraatm.foo.foo import foo

Instead, a better way to organize the code for this new feature would be to create a private module _foo.py and to import the public function foo using the sub-package's __init__.py file. The directory structure looks like:

paraatm/foo/_foo.py
paraatm/foo/__init__.py

Inside _foo.py, define your public function named foo. Inside __init__.py, import your function:

from ._foo import foo

Now, the foo function is available directly from foo package. The user does not need to know about the _foo.py module. Your function can be imported using:

from paraatm.foo import foo

When does it make sense to use a private module and do imports within __init__.py as opposed to having a public module within the sub-package? It depends on the level of organization needed for the sub-package. If the sub-package should directly provide access to functions and/or classes, then use the private module approach. If further organization is needed within the sub-package, then public modules can be used. Both examples exist in para-atm. The io sub-package uses several different public modules to organize its functionality. On the other hand, the plotting sub-package does not require this extra level of organization, so it uses a private module and imports the public functions in __init__.py.

Lastly, follow Python conventions for using lower-case only for directory (sub-package) and file (module) names. Additional information is given in the Style section.

Test cases

Test cases for para-atm are implemented using the unittest module. The tests can be found under the tests directory. Test cases can be run using python -m unittest from the base project directory.

Any new functions that are added to para-atm should be accompanied by a corresponding test case. The test case may be very simple and should at a minimum verify that the functionality does run and produce expected results. The following should be considered if the test requires reading data from a file:

  • Try to make the sample data file as simple and small as possible. Do not submit large (megabyte or gigabyte sized) data files for test cases.
  • Make sure that any sample data does not include sensitive information or data that cannot be publicly released.
  • Store the sample data file under the sample_data directory
  • Follow the existing examples in test_para_atm.py for reading from sample data files.

Style

Generally speaking, code should follow the PEP 8 style guide. In particular, please follow the naming conventions:

  • Module (file) names should be all lowercase, with underscores if necessary (some older code in para-atm does not currently follow this convention, but it will be updated).
  • Class names should normally use the "CapWords" convention.
  • Function and variable names are lowercase, with words separated by underscores as necessary.

Documenting the code

Any public functions should be documented using a docstring. Function arguments should be documented using numpy or google format. This will facilitate automated generation of package documentation.

General recommendations

Functions that are integrated into para-atm are fundamentally different from functions that one may write as part of their own internal projects or analysis. The difference is that functions included in para-atm should be usable and understandable by users other than the developer of the function.

With this in mind, the most important consideration for functions that are included in para-atm is the interface (also known as the "API", or Application Programming Interface). In many cases, functions that are written for internal use will need some modification to the interface prior to being suitable within a more general framework.

The following lists some specific things to avoid when considering an interface:

  • The function should not read files from a hard-coded location on the system. For example, do not do something like pd.read_csv('data_files/' + filename). Instead, the function input argument completely specifies where data will be read from. For convenience, the developer might then create a "wrapper" function that builds filenames associated with a particular location on their system and then passes those into the more general function, but this wrapper function would not be part of para-atm.
  • Do not write data to a hard-coded filename. When the interface will write data to a file, the filename should be provided as a function argument. As noted above, the location should not be hard-coded; it is completely specified by the filename argument, which may be a relative or absolute path.
  • Consider whether writing data to a file or returning data as a function output is more appropriate. Returning data in the form of, for example, a pandas DataFrame is more general than writing it to a file, as it allows the user to decide whether and how to write the data to a file.
Clone this wiki locally