start working on public-facing documentation

allenai · Jan 13, 2025 · 4c1c93d · 4c1c93d
1 parent 8a39971
commit 4c1c93d
Show file tree

Hide file tree

Showing 10 changed files with 172 additions and 74 deletions.
diff --git a/README.md b/README.md
@@ -1,90 +1,42 @@
 Overview
 --------
 
-rslearn_projects contains Ai2-specific tooling for managing remote sensing projects
-built on top of rslearn, as well as project-specific code and configuration files.
-
-
-Tooling
--------
-
-The additional tooling comes into play when training and deploying models. This is an
-outline of the steps the tooling takes care of when training models:
-
-1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`.
-2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based
-   on the project ID and experiment ID specified in `config.yaml`.
-3. Launcher then starts a job, in this case on Beaker, to train the model.
-4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading
-   the code. The image contains a copy of the code too, but it is overwritten with the
-   latest code from the user's codebase.
-5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to
-   a canonical folder on GCS.
-6. If the job is pre-empted and resumes, it will automatically load the latest
-   checkpoint and W&B run ID from GCS. It will also load these in calls to `model test`
-   or `model predict`.
+rslearn_projects contains the training datasets, model weights, and corresponding code
+for machine learning applications built on top of
+[rslearn](https://github.com/allenai/rslearn/) at Ai2.
 
 
 Setup
 -----
 
-rslp expects an environment variable specifying the GCS bucket to write prepared
-rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file.
-
-    RSLP_PREFIX=gs://rslearn-eai
-    RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai
-
-You will also need to setup GCP credentials that have access to this bucket.
-
-Training additionally depends on credentials for W&B. If you train directly using
-`rslp.rslearn_main`, then you will need to setup these credentials. If you use a
-launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are
-already configured as secrets on the platform, but you would need to setup your Beaker
-or other platform credentials to be able to launch the jobs.
-
-TODO: update GCP/W&B to use service accounts.
-
-Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config
-files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need
-to set up environment variables to provide the appropriate credentials:
-
-    S3_ACCESS_KEY_ID=GOOG...
-    S3_SECRET_ACCESS_KEY=...
-
-You can create these credentials at
-https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1
-under "Access keys for your user account".
-
-
-Usage
------
-
-Create an environment for rslearn and setup with rslearn_projects requirements:
+Install rslearn:
 
-    conda create -n rslearn python=3.12
-    conda activate rslearn
-    pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt
-    pip install -r rslearn_projects/requirements.txt
+    git clone https://github.com/allenai/rslearn.git
+    cd rslearn
+    pip install .[extra]
 
-For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects
-in editable mode, e.g.:
+Install requirements:
 
-    export PYTHONPATH=.:/path/to/rslearn/rslearn
+    git clone https://github.com/allenai/rslearn_projects.git
+    cd rslearn_projects
+    pip install -r requirements.txt
 
-Execute a data processing pipeline:
+rslearn_projects includes tooling that expects model checkpoints and auxiliary files to
+be stored in an `RSLP_PREFIX` directory. Create a file `.env` to set the `RSLP_PREFIX`
+environment variable:
 
-    python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32
+    mkdir project_data
+    echo "RSLP_PREFIX=project_data/" > .env
 
-Launch training on Beaker:
 
-    python -m rslp.main maldives_ecosystem_mapping train_maxar
-
-Manually train locally:
-
-    python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml
-
-
-Projects
---------
+Applications
+------------
 
-- [Forest Loss Driver](rslp/forest_loss_driver/README.md)
+- [Sentinel-2 Vessel Detection](docs/sentinel2_vessels.md)
+- [Sentinel-2 Vessel Attribute Prediction](docs/sentinel2_vessel_attribute.md)
+- [Landsat Vessel Detection](docs/landsat_vessels.md)
+- [Satlas: Solar Farm Segmentation](docs/satlas_solar_farm.md)
+- [Satlas: Wind Turbine Detection](docs/satlas_wind_turbine.md)
+- [Satlas: Marine Infrastructure Detection](docs/satlas_marine_infra.md)
+- [Forest Loss Driver Classification](docs/forest_loss_driver.md)
+- [Maldives Ecosystem Mapping](docs/maldives_ecosystem_mapping.md)
diff --git a/ai2_docs/README.md b/ai2_docs/README.md
@@ -0,0 +1,90 @@
+Overview
+--------
+
+rslearn_projects contains Ai2-specific tooling for managing remote sensing projects
+built on top of rslearn, as well as project-specific code and configuration files.
+
+
+Tooling
+-------
+
+The additional tooling comes into play when training and deploying models. This is an
+outline of the steps the tooling takes care of when training models:
+
+1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`.
+2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based
+   on the project ID and experiment ID specified in `config.yaml`.
+3. Launcher then starts a job, in this case on Beaker, to train the model.
+4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading
+   the code. The image contains a copy of the code too, but it is overwritten with the
+   latest code from the user's codebase.
+5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to
+   a canonical folder on GCS.
+6. If the job is pre-empted and resumes, it will automatically load the latest
+   checkpoint and W&B run ID from GCS. It will also load these in calls to `model test`
+   or `model predict`.
+
+
+Setup
+-----
+
+rslp expects an environment variable specifying the GCS bucket to write prepared
+rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file.
+
+    RSLP_PREFIX=gs://rslearn-eai
+    RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai
+
+You will also need to setup GCP credentials that have access to this bucket.
+
+Training additionally depends on credentials for W&B. If you train directly using
+`rslp.rslearn_main`, then you will need to setup these credentials. If you use a
+launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are
+already configured as secrets on the platform, but you would need to setup your Beaker
+or other platform credentials to be able to launch the jobs.
+
+TODO: update GCP/W&B to use service accounts.
+
+Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config
+files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need
+to set up environment variables to provide the appropriate credentials:
+
+    S3_ACCESS_KEY_ID=GOOG...
+    S3_SECRET_ACCESS_KEY=...
+
+You can create these credentials at
+https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1
+under "Access keys for your user account".
+
+
+Usage
+-----
+
+Create an environment for rslearn and setup with rslearn_projects requirements:
+
+    conda create -n rslearn python=3.12
+    conda activate rslearn
+    pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt
+    pip install -r rslearn_projects/requirements.txt
+
+For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects
+in editable mode, e.g.:
+
+    export PYTHONPATH=.:/path/to/rslearn/rslearn
+
+Execute a data processing pipeline:
+
+    python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32
+
+Launch training on Beaker:
+
+    python -m rslp.main maldives_ecosystem_mapping train_maxar
+
+Manually train locally:
+
+    python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml
+
+
+Projects
+--------
+
+- [Forest Loss Driver](rslp/forest_loss_driver/README.md)
diff --git a/docs/batch_inference.md → ai2_docs/batch_inference.md b/docs/batch_inference.md → ai2_docs/batch_inference.md
diff --git a/docs/coding_best_practices_brainstorm.md → ai2_docs/coding_best_practices_brainstorm.md b/docs/coding_best_practices_brainstorm.md → ai2_docs/coding_best_practices_brainstorm.md
diff --git a/docs/landsat_vessels/api_use.md → ai2_docs/landsat_vessels/api_use.md b/docs/landsat_vessels/api_use.md → ai2_docs/landsat_vessels/api_use.md
diff --git a/...dsat_vessels/images/missed_vessels_B8.png → ...dsat_vessels/images/missed_vessels_B8.png b/...dsat_vessels/images/missed_vessels_B8.png → ...dsat_vessels/images/missed_vessels_B8.png
diff --git a/...sat_vessels/images/missed_vessels_RGB.png → ...sat_vessels/images/missed_vessels_RGB.png b/...sat_vessels/images/missed_vessels_RGB.png → ...sat_vessels/images/missed_vessels_RGB.png
diff --git a/docs/landsat_vessels/model_summary.md → ai2_docs/landsat_vessels/model_summary.md b/docs/landsat_vessels/model_summary.md → ai2_docs/landsat_vessels/model_summary.md
diff --git a/docs/landsat_vessels/train_eval.md → ai2_docs/landsat_vessels/train_eval.md b/docs/landsat_vessels/train_eval.md → ai2_docs/landsat_vessels/train_eval.md
diff --git a/docs/sentinel2_vessels.md b/docs/sentinel2_vessels.md
@@ -0,0 +1,56 @@
+Sentinel-2 Vessel Detection
+---------------------------
+
+The Sentinel-2 vessel detection model detects ships in Sentinel-2 L1C scenes.
+
+TODO: insert example image
+
+It is trained on a dataset consisting of 43,443 image patches (ranging from 300x300 to
+1000x1000) with 37,145 ship labels.
+
+
+Inference
+---------
+
+First, download the model checkpoint to the `RSLP_PREFIX` directory.
+
+    cd rslearn_projects
+    mkdir -p project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/
+    wget XYZ -O project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/best.ckpt
+
+The easiest way to apply the model is using the prediction pipeline in
+`rslp/sentinel2_vessels/predict_pipeline.py`. It accepts a Sentinel-2 scene ID and
+automatically downloads the scene images from a
+[public Google Cloud Storage bucket](https://cloud.google.com/storage/docs/public-datasets/sentinel-2).
+
+    mkdir output_crops
+    mkdir scratch_dir
+    python -m rslp.main sentinel2_vessels predict '["scene_id": "S2A_MSIL1C_20180904T110621_N0206_R137_T30UYD_20180904T133425", "json_path": "out.json", "crop_path": "output_crops/"]' scratch_dir/
+
+Then, `out.json` will contain a JSON list of detected ships while `output_crops` will
+contain corresponding crops centered around those ships (showing the RGB B4/B3/B2
+bands).
+
+
+Training
+--------
+
+First, download the training dataset:
+
+    cd rslearn_projects
+    mkdir -p project_data/datasets/sentinel2_vessels/
+    wget XYZ -O project_data/datasets/sentinel2_vessels.tar
+    tar xvf project_data/datasets/sentinel2_vessels.tar --directory project_data/datasets/sentinel2_vessels/
+
+It is an rslearn dataset consisting of window folders like
+`windows/sargassum_train/1186117_1897173_158907/`. Inside each window folder:
+
+- `layers/sentinel2/` contains different Sentinel-2 bands used by the model, such as
+  `layers/sentinel2/R_G_B/image.png`.
+- `layers/label/data.geojson` contains the positions of ships. These are offset from
+  the bounds of the window which are in `metadata.json`, so subtract the window's
+  bounds to get pixel coordinates relative to the image.
+
+To train the model, run:
+
+    python -m rslp.rslearn_main model fit --config data/sentinel2_vessels/config.yaml --data.init_args.path project_data/datasets/sentinel2_vessels/