How should we structure code / repositories? #3

campagnola · 2019-12-16T21:31:31Z

Some options (not mutually exclusive):

Wrappers around manufacturer API / protocols
a. One package per wrapper
b. One package per "device type" (e.g. all camera wrappers in one package)
c. One package with all wrappers
High-level ABCs defining standard device interfaces
a. One package per abstract "device type"
b. One package with all ABCs
ABC implementations for each device type
a. In the same location as (1)
b. In the same location as (2)
c. In their own packages

campagnola · 2019-12-16T21:38:21Z

I lean toward:

(1a) because it allows finer control over versioning -- it's possible, for example, for a pvcam implementation to break backward compatibility without causing a major version number change for all other devices. It also sounds like more work, but I think worth the extra effort.

(2b) initially because there is bound to be some interdependence between different device classes, although there need to be clear mechanisms for extending the model via external packages.

(3a) for simplicity -- I like that each device has two pieces of code "how do you access the manufacturer API in python", and "how do you access the device using the standard API". The former ensures some level of access to all the idiosyncratic bells/whistles provided by the manufacurer, whereas the latter asks all devices to conform to a standard.

windelbouwman · 2019-12-17T10:52:50Z

Hi Luke,

This post might be of interest: https://danluu.com/monorepo/

It's good to keep in mind how easy somewhat larger repositories are to work with during development.

HazenBabcock · 2019-12-17T15:10:28Z

Another argument for a more monolithic repository is that it would be easier for the end user to install. If they have a setup with 7 different hardware elements they'd have to install 7 different packages.

campagnola · 2019-12-17T17:51:03Z

Thanks @windelbouwman, some interesting points in that article. Many of them boil down to "much easier to manage within-project dependencies with a single repo, and generally I agree with that (the main reason I suggest 2b). However in the case of a collection of device wrappers, I would expect to have little or no interdependence between each wrapper; just the opposite, I want each wrapper to be independent. Example of where this becomes important: I need the latest version for device X, but an older version for device Y due to a bug / dropped support (I ran into exactly this problem with micromanager).

@HazenBabcock I agree that many-repos would be more work for many reasons, but hopefully installation can be made easy (for example, with a metapackage that depends on many others).

All that said, I am happy to try out either approach and see how it goes. If we did go with many-repos, I would definitely want just enough standardization between them that it would be possible to automate common things like installation, testing, pypi uploads, etc.

windelbouwman · 2019-12-17T18:22:25Z

Another interesting concept is the concept of a multi package repository. This is often the case in the rust world, where several related crates reside in the same git repository, but they are seperately installable via cargo. The same could be done using python / pip. I assume that in your options above package equals git repository equals python package?

The argument for a monolithic approach also benefits library packagers. I recently put some effort in packaging ROS2, which consists of some 200 packages, this makes the live of a packager not easy :).

campagnola · 2019-12-17T18:57:41Z

Ooh, that's interesting, so:
1d) One package per device, but all contained in a single repo

Would this approach have made ROS2 any easier? It seems like most of the work would be in building / managing packages, for which having a single repo might only be minimally helpful..

windelbouwman · 2019-12-17T19:00:37Z

The approach makes development easier, but packaging is still a pain. Consider people having to write bitbake files for each python package. The same goes for ubuntu package maintainers, gentoo packagers, archlinux packagers, etc..

David-Baddeley · 2020-01-13T21:49:21Z

I'd argue strongly for a single, monolithic, package as I think it'll make life a lot easier, both from an organisational point of view and also from the perspective of keeping the structure of the different devices consistent. I don't think that the resulting package will be too big.

With git it's fairly easy to split a large repository up into smaller ones whilst still preserving history, so we could always choose to break it up at some point in the future (we're currently doing this with python-microscopy).

Before making a final decision, however, we should probably think about licensing. It might be that different hardware requires different licensing which might make the mono-repo approach harder. Similarly if we want to package the manufacturer DLLs (ideally you'd be able to do pip install scikits-microscope or conda install scikits-microscope and have things just work, although this might be unrealistic in the short term).

HazenBabcock · 2020-01-22T17:47:30Z

I think it will be difficult to package manufacturer DLLs without running into licensing issues. Also some of them are really large and use special installers, like National Instruments. It would probably be easier to provide (up to date) links for the various download sites.

tacaswell · 2020-01-25T01:55:26Z

Having been working on a similar project (https://blueskyproject.io/ will make a new issue with comments on that in a bit), my sense is that, you probably want to put the ABCs all in the same package independent of the implementations and then have a mix of 1a and 1b for the implementations depending on what makes sense in the given set of hardware, dependencies, and code re-use.

We have also found that entrypionts (https://pypi.org/project/entrypoints/) are super useful for the implementations to phone-home to report that they exist.

In addition to the versioning de-conflection @campagnola mentioned (which is super important) having many repositories means you can much more freely distribute maintenance burden / responsibilities as if you give someone the keys to commit to / release support for device X you don't have to also give them rights to release the core parts of the code. This also helps with managing collaborations across many institutions, as it is possible for the sub-set of institutions that care about a given sub-set of features to move quickly on them independent if people from other institutions have effort to put in on the needed time frame.

If you go with many packages-per-repo then you can not use git tags to identify versions (well, I guess you could do some sort of prefix scheme projectA-v1, projectB-v3 etc, but that is a bit ad-hoc).

A major drawback of a single project for the implementations is that the dependencies (either third party packages or DLLs etc) then you end up with is the union of dependencies. This can lead to an problem which is just as annoying as "I have to install 7 things" which is "why do I have to install 70 things, of which I will only use 5!?" I think mono-repos make a lot more sense if what you are developing is an application rather than a library.

With one of the project under the bluesky umbrella we have been through a couple of merge / split / merge (and am not sure we have fully sorted it!). The rule of thumb I have in my head at the moment is to start with a core library and 2 concrete implementations. If you want something in both it should go in core.

Another "advantage" of the multi-repository approach is that it makes changing the API in a non-backwards compatible way very annoying which helps keep you a bit more honest ;)

We have been using cookie cutters (for example https://github.com/bluesky/suitcase-cookiecutter for semi-templated export libraries) and been very happy with this.

nvladimus · 2020-03-13T15:57:15Z

I want each wrapper to be independent.

💯 This will give microscope developers more freedom to take only wrappers they need and match them in a custom, system-specific way.
I am very concerned about the banana-jungle problem:

You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

campagnola changed the title ~~How should we structure core / repositories?~~ How should we structure code / repositories? Dec 16, 2019

ksunden mentioned this issue May 15, 2020

Discussion: must-have features for a data acquisition platform #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should we structure code / repositories? #3

How should we structure code / repositories? #3

campagnola commented Dec 16, 2019 •

edited

Loading

campagnola commented Dec 16, 2019

windelbouwman commented Dec 17, 2019

HazenBabcock commented Dec 17, 2019

campagnola commented Dec 17, 2019

windelbouwman commented Dec 17, 2019 •

edited

Loading

campagnola commented Dec 17, 2019

windelbouwman commented Dec 17, 2019

David-Baddeley commented Jan 13, 2020

HazenBabcock commented Jan 22, 2020

tacaswell commented Jan 25, 2020

nvladimus commented Mar 13, 2020

How should we structure code / repositories? #3

How should we structure code / repositories? #3

Comments

campagnola commented Dec 16, 2019 • edited Loading

campagnola commented Dec 16, 2019

windelbouwman commented Dec 17, 2019

HazenBabcock commented Dec 17, 2019

campagnola commented Dec 17, 2019

windelbouwman commented Dec 17, 2019 • edited Loading

campagnola commented Dec 17, 2019

windelbouwman commented Dec 17, 2019

David-Baddeley commented Jan 13, 2020

HazenBabcock commented Jan 22, 2020

tacaswell commented Jan 25, 2020

nvladimus commented Mar 13, 2020

campagnola commented Dec 16, 2019 •

edited

Loading

windelbouwman commented Dec 17, 2019 •

edited

Loading