Skip to content

Commit

Permalink
start working on public-facing documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
favyen2 committed Jan 13, 2025
1 parent 8a39971 commit 4c1c93d
Show file tree
Hide file tree
Showing 10 changed files with 172 additions and 74 deletions.
100 changes: 26 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,42 @@
Overview
--------

rslearn_projects contains Ai2-specific tooling for managing remote sensing projects
built on top of rslearn, as well as project-specific code and configuration files.


Tooling
-------

The additional tooling comes into play when training and deploying models. This is an
outline of the steps the tooling takes care of when training models:

1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`.
2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based
on the project ID and experiment ID specified in `config.yaml`.
3. Launcher then starts a job, in this case on Beaker, to train the model.
4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading
the code. The image contains a copy of the code too, but it is overwritten with the
latest code from the user's codebase.
5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to
a canonical folder on GCS.
6. If the job is pre-empted and resumes, it will automatically load the latest
checkpoint and W&B run ID from GCS. It will also load these in calls to `model test`
or `model predict`.
rslearn_projects contains the training datasets, model weights, and corresponding code
for machine learning applications built on top of
[rslearn](https://github.com/allenai/rslearn/) at Ai2.


Setup
-----

rslp expects an environment variable specifying the GCS bucket to write prepared
rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file.

RSLP_PREFIX=gs://rslearn-eai
RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai

You will also need to setup GCP credentials that have access to this bucket.

Training additionally depends on credentials for W&B. If you train directly using
`rslp.rslearn_main`, then you will need to setup these credentials. If you use a
launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are
already configured as secrets on the platform, but you would need to setup your Beaker
or other platform credentials to be able to launch the jobs.

TODO: update GCP/W&B to use service accounts.

Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config
files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need
to set up environment variables to provide the appropriate credentials:

S3_ACCESS_KEY_ID=GOOG...
S3_SECRET_ACCESS_KEY=...

You can create these credentials at
https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1
under "Access keys for your user account".


Usage
-----

Create an environment for rslearn and setup with rslearn_projects requirements:
Install rslearn:

conda create -n rslearn python=3.12
conda activate rslearn
pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt
pip install -r rslearn_projects/requirements.txt
git clone https://github.com/allenai/rslearn.git
cd rslearn
pip install .[extra]

For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects
in editable mode, e.g.:
Install requirements:

export PYTHONPATH=.:/path/to/rslearn/rslearn
git clone https://github.com/allenai/rslearn_projects.git
cd rslearn_projects
pip install -r requirements.txt

Execute a data processing pipeline:
rslearn_projects includes tooling that expects model checkpoints and auxiliary files to
be stored in an `RSLP_PREFIX` directory. Create a file `.env` to set the `RSLP_PREFIX`
environment variable:

python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32
mkdir project_data
echo "RSLP_PREFIX=project_data/" > .env

Launch training on Beaker:

python -m rslp.main maldives_ecosystem_mapping train_maxar

Manually train locally:

python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml


Projects
--------
Applications
------------

- [Forest Loss Driver](rslp/forest_loss_driver/README.md)
- [Sentinel-2 Vessel Detection](docs/sentinel2_vessels.md)
- [Sentinel-2 Vessel Attribute Prediction](docs/sentinel2_vessel_attribute.md)
- [Landsat Vessel Detection](docs/landsat_vessels.md)
- [Satlas: Solar Farm Segmentation](docs/satlas_solar_farm.md)
- [Satlas: Wind Turbine Detection](docs/satlas_wind_turbine.md)
- [Satlas: Marine Infrastructure Detection](docs/satlas_marine_infra.md)
- [Forest Loss Driver Classification](docs/forest_loss_driver.md)
- [Maldives Ecosystem Mapping](docs/maldives_ecosystem_mapping.md)
90 changes: 90 additions & 0 deletions ai2_docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
Overview
--------

rslearn_projects contains Ai2-specific tooling for managing remote sensing projects
built on top of rslearn, as well as project-specific code and configuration files.


Tooling
-------

The additional tooling comes into play when training and deploying models. This is an
outline of the steps the tooling takes care of when training models:

1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`.
2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based
on the project ID and experiment ID specified in `config.yaml`.
3. Launcher then starts a job, in this case on Beaker, to train the model.
4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading
the code. The image contains a copy of the code too, but it is overwritten with the
latest code from the user's codebase.
5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to
a canonical folder on GCS.
6. If the job is pre-empted and resumes, it will automatically load the latest
checkpoint and W&B run ID from GCS. It will also load these in calls to `model test`
or `model predict`.


Setup
-----

rslp expects an environment variable specifying the GCS bucket to write prepared
rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file.

RSLP_PREFIX=gs://rslearn-eai
RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai

You will also need to setup GCP credentials that have access to this bucket.

Training additionally depends on credentials for W&B. If you train directly using
`rslp.rslearn_main`, then you will need to setup these credentials. If you use a
launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are
already configured as secrets on the platform, but you would need to setup your Beaker
or other platform credentials to be able to launch the jobs.

TODO: update GCP/W&B to use service accounts.

Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config
files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need
to set up environment variables to provide the appropriate credentials:

S3_ACCESS_KEY_ID=GOOG...
S3_SECRET_ACCESS_KEY=...

You can create these credentials at
https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1
under "Access keys for your user account".


Usage
-----

Create an environment for rslearn and setup with rslearn_projects requirements:

conda create -n rslearn python=3.12
conda activate rslearn
pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt
pip install -r rslearn_projects/requirements.txt

For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects
in editable mode, e.g.:

export PYTHONPATH=.:/path/to/rslearn/rslearn

Execute a data processing pipeline:

python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32

Launch training on Beaker:

python -m rslp.main maldives_ecosystem_mapping train_maxar

Manually train locally:

python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml


Projects
--------

- [Forest Loss Driver](rslp/forest_loss_driver/README.md)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
56 changes: 56 additions & 0 deletions docs/sentinel2_vessels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Sentinel-2 Vessel Detection
---------------------------

The Sentinel-2 vessel detection model detects ships in Sentinel-2 L1C scenes.

TODO: insert example image

It is trained on a dataset consisting of 43,443 image patches (ranging from 300x300 to
1000x1000) with 37,145 ship labels.


Inference
---------

First, download the model checkpoint to the `RSLP_PREFIX` directory.

cd rslearn_projects
mkdir -p project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/
wget XYZ -O project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/best.ckpt

The easiest way to apply the model is using the prediction pipeline in
`rslp/sentinel2_vessels/predict_pipeline.py`. It accepts a Sentinel-2 scene ID and
automatically downloads the scene images from a
[public Google Cloud Storage bucket](https://cloud.google.com/storage/docs/public-datasets/sentinel-2).

mkdir output_crops
mkdir scratch_dir
python -m rslp.main sentinel2_vessels predict '["scene_id": "S2A_MSIL1C_20180904T110621_N0206_R137_T30UYD_20180904T133425", "json_path": "out.json", "crop_path": "output_crops/"]' scratch_dir/

Then, `out.json` will contain a JSON list of detected ships while `output_crops` will
contain corresponding crops centered around those ships (showing the RGB B4/B3/B2
bands).


Training
--------

First, download the training dataset:

cd rslearn_projects
mkdir -p project_data/datasets/sentinel2_vessels/
wget XYZ -O project_data/datasets/sentinel2_vessels.tar
tar xvf project_data/datasets/sentinel2_vessels.tar --directory project_data/datasets/sentinel2_vessels/

It is an rslearn dataset consisting of window folders like
`windows/sargassum_train/1186117_1897173_158907/`. Inside each window folder:

- `layers/sentinel2/` contains different Sentinel-2 bands used by the model, such as
`layers/sentinel2/R_G_B/image.png`.
- `layers/label/data.geojson` contains the positions of ships. These are offset from
the bounds of the window which are in `metadata.json`, so subtract the window's
bounds to get pixel coordinates relative to the image.

To train the model, run:

python -m rslp.rslearn_main model fit --config data/sentinel2_vessels/config.yaml --data.init_args.path project_data/datasets/sentinel2_vessels/

0 comments on commit 4c1c93d

Please sign in to comment.