diff --git a/README.md b/README.md index 563fe89f..7e5643bc 100644 --- a/README.md +++ b/README.md @@ -1,90 +1,42 @@ Overview -------- -rslearn_projects contains Ai2-specific tooling for managing remote sensing projects -built on top of rslearn, as well as project-specific code and configuration files. - - -Tooling -------- - -The additional tooling comes into play when training and deploying models. This is an -outline of the steps the tooling takes care of when training models: - -1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`. -2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based - on the project ID and experiment ID specified in `config.yaml`. -3. Launcher then starts a job, in this case on Beaker, to train the model. -4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading - the code. The image contains a copy of the code too, but it is overwritten with the - latest code from the user's codebase. -5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to - a canonical folder on GCS. -6. If the job is pre-empted and resumes, it will automatically load the latest - checkpoint and W&B run ID from GCS. It will also load these in calls to `model test` - or `model predict`. +rslearn_projects contains the training datasets, model weights, and corresponding code +for machine learning applications built on top of +[rslearn](https://github.com/allenai/rslearn/) at Ai2. Setup ----- -rslp expects an environment variable specifying the GCS bucket to write prepared -rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file. - - RSLP_PREFIX=gs://rslearn-eai - RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai - -You will also need to setup GCP credentials that have access to this bucket. - -Training additionally depends on credentials for W&B. If you train directly using -`rslp.rslearn_main`, then you will need to setup these credentials. If you use a -launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are -already configured as secrets on the platform, but you would need to setup your Beaker -or other platform credentials to be able to launch the jobs. - -TODO: update GCP/W&B to use service accounts. - -Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config -files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need -to set up environment variables to provide the appropriate credentials: - - S3_ACCESS_KEY_ID=GOOG... - S3_SECRET_ACCESS_KEY=... - -You can create these credentials at -https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1 -under "Access keys for your user account". - - -Usage ------ - -Create an environment for rslearn and setup with rslearn_projects requirements: +Install rslearn: - conda create -n rslearn python=3.12 - conda activate rslearn - pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt - pip install -r rslearn_projects/requirements.txt + git clone https://github.com/allenai/rslearn.git + cd rslearn + pip install .[extra] -For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects -in editable mode, e.g.: +Install requirements: - export PYTHONPATH=.:/path/to/rslearn/rslearn + git clone https://github.com/allenai/rslearn_projects.git + cd rslearn_projects + pip install -r requirements.txt -Execute a data processing pipeline: +rslearn_projects includes tooling that expects model checkpoints and auxiliary files to +be stored in an `RSLP_PREFIX` directory. Create a file `.env` to set the `RSLP_PREFIX` +environment variable: - python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32 + mkdir project_data + echo "RSLP_PREFIX=project_data/" > .env -Launch training on Beaker: - python -m rslp.main maldives_ecosystem_mapping train_maxar - -Manually train locally: - - python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml - - -Projects --------- +Applications +------------ -- [Forest Loss Driver](rslp/forest_loss_driver/README.md) +- [Sentinel-2 Vessel Detection](docs/sentinel2_vessels.md) +- [Sentinel-2 Vessel Attribute Prediction](docs/sentinel2_vessel_attribute.md) +- [Landsat Vessel Detection](docs/landsat_vessels.md) +- [Satlas: Solar Farm Segmentation](docs/satlas_solar_farm.md) +- [Satlas: Wind Turbine Detection](docs/satlas_wind_turbine.md) +- [Satlas: Marine Infrastructure Detection](docs/satlas_marine_infra.md) +- [Forest Loss Driver Classification](docs/forest_loss_driver.md) +- [Maldives Ecosystem Mapping](docs/maldives_ecosystem_mapping.md) diff --git a/ai2_docs/README.md b/ai2_docs/README.md new file mode 100644 index 00000000..563fe89f --- /dev/null +++ b/ai2_docs/README.md @@ -0,0 +1,90 @@ +Overview +-------- + +rslearn_projects contains Ai2-specific tooling for managing remote sensing projects +built on top of rslearn, as well as project-specific code and configuration files. + + +Tooling +------- + +The additional tooling comes into play when training and deploying models. This is an +outline of the steps the tooling takes care of when training models: + +1. User runs e.g. `python -m rslp.launch_beaker --config_path path/to/config.yaml`. +2. Launcher uploads the code to a canonical path on Google Cloud Storage (GCS), based + on the project ID and experiment ID specified in `config.yaml`. +3. Launcher then starts a job, in this case on Beaker, to train the model. +4. `rslp.docker_entrypoint` is the entrypoint for the job, and starts by downloading + the code. The image contains a copy of the code too, but it is overwritten with the + latest code from the user's codebase. +5. It then saves W&B run ID to GCS. It also configures rslearn to write checkpoints to + a canonical folder on GCS. +6. If the job is pre-empted and resumes, it will automatically load the latest + checkpoint and W&B run ID from GCS. It will also load these in calls to `model test` + or `model predict`. + + +Setup +----- + +rslp expects an environment variable specifying the GCS bucket to write prepared +rslearn datasets, model checkpoints, etc. The easiest way is to create a `.env` file. + + RSLP_PREFIX=gs://rslearn-eai + RSLP_WEKA_PREFIX=weka://dfive-default/rslearn-eai + +You will also need to setup GCP credentials that have access to this bucket. + +Training additionally depends on credentials for W&B. If you train directly using +`rslp.rslearn_main`, then you will need to setup these credentials. If you use a +launcher like `rslp.launch_beaker`, then it isn't needed since the credentials are +already configured as secrets on the platform, but you would need to setup your Beaker +or other platform credentials to be able to launch the jobs. + +TODO: update GCP/W&B to use service accounts. + +Currently, until https://github.com/allenai/rslearn/issues/33 is resolved, model config +files use S3-compatable API to access GCS rather than GCS directly. Therefore, you need +to set up environment variables to provide the appropriate credentials: + + S3_ACCESS_KEY_ID=GOOG... + S3_SECRET_ACCESS_KEY=... + +You can create these credentials at +https://console.cloud.google.com/storage/settings;tab=interoperability?hl=en&project=skylight-proto-1 +under "Access keys for your user account". + + +Usage +----- + +Create an environment for rslearn and setup with rslearn_projects requirements: + + conda create -n rslearn python=3.12 + conda activate rslearn + pip install -r rslearn/requirements.txt -r rslearn/extra_requirements.txt + pip install -r rslearn_projects/requirements.txt + +For development it is easier to use PYTHONPATH or install rslearn and rslearn_projects +in editable mode, e.g.: + + export PYTHONPATH=.:/path/to/rslearn/rslearn + +Execute a data processing pipeline: + + python -m rslp.main maldives_ecosystem_mapping data --dp_config.workers 32 + +Launch training on Beaker: + + python -m rslp.main maldives_ecosystem_mapping train_maxar + +Manually train locally: + + python -m rslp.rslearn_main model fit --config_path data/maldives_ecosystem_mapping/config.yaml + + +Projects +-------- + +- [Forest Loss Driver](rslp/forest_loss_driver/README.md) diff --git a/docs/batch_inference.md b/ai2_docs/batch_inference.md similarity index 100% rename from docs/batch_inference.md rename to ai2_docs/batch_inference.md diff --git a/docs/coding_best_practices_brainstorm.md b/ai2_docs/coding_best_practices_brainstorm.md similarity index 100% rename from docs/coding_best_practices_brainstorm.md rename to ai2_docs/coding_best_practices_brainstorm.md diff --git a/docs/landsat_vessels/api_use.md b/ai2_docs/landsat_vessels/api_use.md similarity index 100% rename from docs/landsat_vessels/api_use.md rename to ai2_docs/landsat_vessels/api_use.md diff --git a/docs/landsat_vessels/images/missed_vessels_B8.png b/ai2_docs/landsat_vessels/images/missed_vessels_B8.png similarity index 100% rename from docs/landsat_vessels/images/missed_vessels_B8.png rename to ai2_docs/landsat_vessels/images/missed_vessels_B8.png diff --git a/docs/landsat_vessels/images/missed_vessels_RGB.png b/ai2_docs/landsat_vessels/images/missed_vessels_RGB.png similarity index 100% rename from docs/landsat_vessels/images/missed_vessels_RGB.png rename to ai2_docs/landsat_vessels/images/missed_vessels_RGB.png diff --git a/docs/landsat_vessels/model_summary.md b/ai2_docs/landsat_vessels/model_summary.md similarity index 100% rename from docs/landsat_vessels/model_summary.md rename to ai2_docs/landsat_vessels/model_summary.md diff --git a/docs/landsat_vessels/train_eval.md b/ai2_docs/landsat_vessels/train_eval.md similarity index 100% rename from docs/landsat_vessels/train_eval.md rename to ai2_docs/landsat_vessels/train_eval.md diff --git a/docs/sentinel2_vessels.md b/docs/sentinel2_vessels.md new file mode 100644 index 00000000..792119cc --- /dev/null +++ b/docs/sentinel2_vessels.md @@ -0,0 +1,56 @@ +Sentinel-2 Vessel Detection +--------------------------- + +The Sentinel-2 vessel detection model detects ships in Sentinel-2 L1C scenes. + +TODO: insert example image + +It is trained on a dataset consisting of 43,443 image patches (ranging from 300x300 to +1000x1000) with 37,145 ship labels. + + +Inference +--------- + +First, download the model checkpoint to the `RSLP_PREFIX` directory. + + cd rslearn_projects + mkdir -p project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/ + wget XYZ -O project_data/projects/sentinel2_vessels/data_20240927_satlaspretrain_patch512_00/checkpoints/best.ckpt + +The easiest way to apply the model is using the prediction pipeline in +`rslp/sentinel2_vessels/predict_pipeline.py`. It accepts a Sentinel-2 scene ID and +automatically downloads the scene images from a +[public Google Cloud Storage bucket](https://cloud.google.com/storage/docs/public-datasets/sentinel-2). + + mkdir output_crops + mkdir scratch_dir + python -m rslp.main sentinel2_vessels predict '["scene_id": "S2A_MSIL1C_20180904T110621_N0206_R137_T30UYD_20180904T133425", "json_path": "out.json", "crop_path": "output_crops/"]' scratch_dir/ + +Then, `out.json` will contain a JSON list of detected ships while `output_crops` will +contain corresponding crops centered around those ships (showing the RGB B4/B3/B2 +bands). + + +Training +-------- + +First, download the training dataset: + + cd rslearn_projects + mkdir -p project_data/datasets/sentinel2_vessels/ + wget XYZ -O project_data/datasets/sentinel2_vessels.tar + tar xvf project_data/datasets/sentinel2_vessels.tar --directory project_data/datasets/sentinel2_vessels/ + +It is an rslearn dataset consisting of window folders like +`windows/sargassum_train/1186117_1897173_158907/`. Inside each window folder: + +- `layers/sentinel2/` contains different Sentinel-2 bands used by the model, such as + `layers/sentinel2/R_G_B/image.png`. +- `layers/label/data.geojson` contains the positions of ships. These are offset from + the bounds of the window which are in `metadata.json`, so subtract the window's + bounds to get pixel coordinates relative to the image. + +To train the model, run: + + python -m rslp.rslearn_main model fit --config data/sentinel2_vessels/config.yaml --data.init_args.path project_data/datasets/sentinel2_vessels/