-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
start working on public-facing documentation
- Loading branch information
Showing
10 changed files
with
172 additions
and
74 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,90 +1,42 @@ | ||
Overview | ||
-------- | ||
|
||
rslearn_projects contains Ai2-specific tooling for managing remote sensing projects | ||
built on top of rslearn, as well as project-specific code and configuration files. | ||
|
||
|
||
Tooling | ||
------- | ||
|
||
The additional tooling comes into play when training and deploying models. This is an | ||
outline of the steps the tooling takes care of when training models: | ||
|
||
1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`. | ||
2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based | ||
on the project ID and experiment ID specified in `config.yaml`. | ||
3. Launcher then starts a job, in this case on Beaker, to train the model. | ||
4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading | ||
the code. The image contains a copy of the code too, but it is overwritten with the | ||
latest code from the user's codebase. | ||
5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to | ||
a canonical folder on GCS. | ||
6. If the job is pre-empted and resumes, it will automatically load the latest | ||
checkpoint and W&B run ID from GCS. It will also load these in calls to `model test` | ||
or `model predict`. | ||
rslearn_projects contains the training datasets, model weights, and corresponding code | ||
for machine learning applications built on top of | ||
[rslearn](https://github.com/allenai/rslearn/) at Ai2. | ||
|
||
|
||
Setup | ||
----- | ||
|
||
rslp expects an environment variable specifying the GCS bucket to write prepared | ||
rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file. | ||
|
||
RSLP_PREFIX=gs://rslearn-eai | ||
RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai | ||
|
||
You will also need to setup GCP credentials that have access to this bucket. | ||
|
||
Training additionally depends on credentials for W&B. If you train directly using | ||
`rslp.rslearn_main`, then you will need to setup these credentials. If you use a | ||
launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are | ||
already configured as secrets on the platform, but you would need to setup your Beaker | ||
or other platform credentials to be able to launch the jobs. | ||
|
||
TODO: update GCP/W&B to use service accounts. | ||
|
||
Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config | ||
files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need | ||
to set up environment variables to provide the appropriate credentials: | ||
|
||
S3_ACCESS_KEY_ID=GOOG... | ||
S3_SECRET_ACCESS_KEY=... | ||
|
||
You can create these credentials at | ||
https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1 | ||
under "Access keys for your user account". | ||
|
||
|
||
Usage | ||
----- | ||
|
||
Create an environment for rslearn and setup with rslearn_projects requirements: | ||
Install rslearn: | ||
|
||
conda create -n rslearn python=3.12 | ||
conda activate rslearn | ||
pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt | ||
pip install -r rslearn_projects/requirements.txt | ||
git clone https://github.com/allenai/rslearn.git | ||
cd rslearn | ||
pip install .[extra] | ||
|
||
For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects | ||
in editable mode, e.g.: | ||
Install requirements: | ||
|
||
export PYTHONPATH=.:/path/to/rslearn/rslearn | ||
git clone https://github.com/allenai/rslearn_projects.git | ||
cd rslearn_projects | ||
pip install -r requirements.txt | ||
|
||
Execute a data processing pipeline: | ||
rslearn_projects includes tooling that expects model checkpoints and auxiliary files to | ||
be stored in an `RSLP_PREFIX` directory. Create a file `.env` to set the `RSLP_PREFIX` | ||
environment variable: | ||
|
||
python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32 | ||
mkdir project_data | ||
echo "RSLP_PREFIX=project_data/" > .env | ||
|
||
Launch training on Beaker: | ||
|
||
python -m rslp.main maldives_ecosystem_mapping train_maxar | ||
|
||
Manually train locally: | ||
|
||
python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml | ||
|
||
|
||
Projects | ||
-------- | ||
Applications | ||
------------ | ||
|
||
- [Forest Loss Driver](rslp/forest_loss_driver/README.md) | ||
- [Sentinel-2 Vessel Detection](docs/sentinel2_vessels.md) | ||
- [Sentinel-2 Vessel Attribute Prediction](docs/sentinel2_vessel_attribute.md) | ||
- [Landsat Vessel Detection](docs/landsat_vessels.md) | ||
- [Satlas: Solar Farm Segmentation](docs/satlas_solar_farm.md) | ||
- [Satlas: Wind Turbine Detection](docs/satlas_wind_turbine.md) | ||
- [Satlas: Marine Infrastructure Detection](docs/satlas_marine_infra.md) | ||
- [Forest Loss Driver Classification](docs/forest_loss_driver.md) | ||
- [Maldives Ecosystem Mapping](docs/maldives_ecosystem_mapping.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
Overview | ||
-------- | ||
|
||
rslearn_projects contains Ai2-specific tooling for managing remote sensing projects | ||
built on top of rslearn, as well as project-specific code and configuration files. | ||
|
||
|
||
Tooling | ||
------- | ||
|
||
The additional tooling comes into play when training and deploying models. This is an | ||
outline of the steps the tooling takes care of when training models: | ||
|
||
1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`. | ||
2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based | ||
on the project ID and experiment ID specified in `config.yaml`. | ||
3. Launcher then starts a job, in this case on Beaker, to train the model. | ||
4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading | ||
the code. The image contains a copy of the code too, but it is overwritten with the | ||
latest code from the user's codebase. | ||
5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to | ||
a canonical folder on GCS. | ||
6. If the job is pre-empted and resumes, it will automatically load the latest | ||
checkpoint and W&B run ID from GCS. It will also load these in calls to `model test` | ||
or `model predict`. | ||
|
||
|
||
Setup | ||
----- | ||
|
||
rslp expects an environment variable specifying the GCS bucket to write prepared | ||
rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file. | ||
|
||
RSLP_PREFIX=gs://rslearn-eai | ||
RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai | ||
|
||
You will also need to setup GCP credentials that have access to this bucket. | ||
|
||
Training additionally depends on credentials for W&B. If you train directly using | ||
`rslp.rslearn_main`, then you will need to setup these credentials. If you use a | ||
launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are | ||
already configured as secrets on the platform, but you would need to setup your Beaker | ||
or other platform credentials to be able to launch the jobs. | ||
|
||
TODO: update GCP/W&B to use service accounts. | ||
|
||
Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config | ||
files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need | ||
to set up environment variables to provide the appropriate credentials: | ||
|
||
S3_ACCESS_KEY_ID=GOOG... | ||
S3_SECRET_ACCESS_KEY=... | ||
|
||
You can create these credentials at | ||
https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1 | ||
under "Access keys for your user account". | ||
|
||
|
||
Usage | ||
----- | ||
|
||
Create an environment for rslearn and setup with rslearn_projects requirements: | ||
|
||
conda create -n rslearn python=3.12 | ||
conda activate rslearn | ||
pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt | ||
pip install -r rslearn_projects/requirements.txt | ||
|
||
For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects | ||
in editable mode, e.g.: | ||
|
||
export PYTHONPATH=.:/path/to/rslearn/rslearn | ||
|
||
Execute a data processing pipeline: | ||
|
||
python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32 | ||
|
||
Launch training on Beaker: | ||
|
||
python -m rslp.main maldives_ecosystem_mapping train_maxar | ||
|
||
Manually train locally: | ||
|
||
python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml | ||
|
||
|
||
Projects | ||
-------- | ||
|
||
- [Forest Loss Driver](rslp/forest_loss_driver/README.md) |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
Sentinel-2 Vessel Detection | ||
--------------------------- | ||
|
||
The Sentinel-2 vessel detection model detects ships in Sentinel-2 L1C scenes. | ||
|
||
TODO: insert example image | ||
|
||
It is trained on a dataset consisting of 43,443 image patches (ranging from 300x300 to | ||
1000x1000) with 37,145 ship labels. | ||
|
||
|
||
Inference | ||
--------- | ||
|
||
First, download the model checkpoint to the `RSLP_PREFIX` directory. | ||
|
||
cd rslearn_projects | ||
mkdir -p project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/ | ||
wget XYZ -O project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/best.ckpt | ||
|
||
The easiest way to apply the model is using the prediction pipeline in | ||
`rslp/sentinel2_vessels/predict_pipeline.py`. It accepts a Sentinel-2 scene ID and | ||
automatically downloads the scene images from a | ||
[public Google Cloud Storage bucket](https://cloud.google.com/storage/docs/public-datasets/sentinel-2). | ||
|
||
mkdir output_crops | ||
mkdir scratch_dir | ||
python -m rslp.main sentinel2_vessels predict '["scene_id": "S2A_MSIL1C_20180904T110621_N0206_R137_T30UYD_20180904T133425", "json_path": "out.json", "crop_path": "output_crops/"]' scratch_dir/ | ||
|
||
Then, `out.json` will contain a JSON list of detected ships while `output_crops` will | ||
contain corresponding crops centered around those ships (showing the RGB B4/B3/B2 | ||
bands). | ||
|
||
|
||
Training | ||
-------- | ||
|
||
First, download the training dataset: | ||
|
||
cd rslearn_projects | ||
mkdir -p project_data/datasets/sentinel2_vessels/ | ||
wget XYZ -O project_data/datasets/sentinel2_vessels.tar | ||
tar xvf project_data/datasets/sentinel2_vessels.tar --directory project_data/datasets/sentinel2_vessels/ | ||
|
||
It is an rslearn dataset consisting of window folders like | ||
`windows/sargassum_train/1186117_1897173_158907/`. Inside each window folder: | ||
|
||
- `layers/sentinel2/` contains different Sentinel-2 bands used by the model, such as | ||
`layers/sentinel2/R_G_B/image.png`. | ||
- `layers/label/data.geojson` contains the positions of ships. These are offset from | ||
the bounds of the window which are in `metadata.json`, so subtract the window's | ||
bounds to get pixel coordinates relative to the image. | ||
|
||
To train the model, run: | ||
|
||
python -m rslp.rslearn_main model fit --config data/sentinel2_vessels/config.yaml --data.init_args.path project_data/datasets/sentinel2_vessels/ |