Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add marine infrastructure dataset and model config #49

Merged
merged 69 commits into from
Feb 24, 2025
Merged
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
6f7ce55
wind turbine: compare new split vs old split performance
favyen2 Sep 26, 2024
c2ec853
try with 384x384 patches plus freeze the model for the first couple e…
favyen2 Sep 27, 2024
90b26fe
fix freezing code
favyen2 Sep 27, 2024
726ad8e
remove unused old config for wind turbine training
favyen2 Sep 30, 2024
9b0d4bb
add webmercator version of the wind turbine dataset
favyen2 Sep 30, 2024
91dd7f9
Add script to add the bounds metadata for layers using SingleImageRas…
favyen2 Sep 30, 2024
f03249c
update name for the 384x384 experiment
favyen2 Sep 30, 2024
036bf0b
Fix wind turbine webmercator training (labels were not being populated)
favyen2 Sep 30, 2024
d6f8b02
Use six Sentinel-2 images from diverse months of the year
uakfdotb Oct 16, 2024
8cc663a
Add marine infrastructure dataset config and model config.
favyen2 Oct 18, 2024
2b29949
add files needed by the model config for training
favyen2 Oct 18, 2024
82a9149
maybe close to same performance
favyen2 Oct 22, 2024
a5953a6
Merge remote-tracking branch 'origin/master' into favyen/turbine-2024…
favyen2 Oct 22, 2024
24e6093
json formatting
favyen2 Oct 22, 2024
cb04f2d
fix missing sections in config.json
favyen2 Oct 22, 2024
6a42948
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Oct 23, 2024
91274c1
Add job launcher and prediction pipeline for Satlas applications.
favyen2 Oct 23, 2024
5249d27
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Nov 6, 2024
ac8eb32
marine infra updates
favyen2 Nov 6, 2024
4fb0a69
gcp rtree index not working after august 2024 ...
favyen2 Nov 11, 2024
01deb1f
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Dec 5, 2024
7fccfb3
latest changes
favyen2 Dec 5, 2024
f6bbc78
sync
favyen2 Dec 10, 2024
07565da
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Dec 12, 2024
7085a2f
sync
favyen2 Dec 12, 2024
480473a
Merge branch 'favyen/turbine-20240926' into favyen/marine-infra
favyen2 Dec 12, 2024
cf57d21
sync
favyen2 Dec 19, 2024
c193308
remove unneeded marine infra model configs
favyen2 Dec 19, 2024
cba9484
clarify that convert_satlas_webmercator_to_rslearn is intended to be …
favyen2 Dec 19, 2024
9f94514
add readme for wind turbine configs
favyen2 Dec 19, 2024
0f1a7db
add readme for worker system
favyen2 Dec 19, 2024
31421ad
update readme
favyen2 Dec 19, 2024
cd07cfe
add azure configs
favyen2 Jan 7, 2025
f916d5c
upgrade solar farm ingestion
favyen2 Jan 9, 2025
463fa94
remove debug configs
favyen2 Jan 9, 2025
976def9
add documentation and test (wip)
favyen2 Jan 9, 2025
0bd47f5
fix test
favyen2 Jan 9, 2025
f8ee3ec
move convert_satlas_webmercator_to_rslean to one_off_porjects
favyen2 Jan 9, 2025
693f9a0
update readme
favyen2 Jan 9, 2025
522c210
Merge branch 'master' of github.com:allenai/rslearn_projects into fav…
favyen2 Jan 9, 2025
4d200c8
fix
favyen2 Jan 13, 2025
31576ab
fix
favyen2 Jan 14, 2025
298de6a
only start satlas jobs that weren't already completed
favyen2 Jan 17, 2025
3c9083c
enable satlas prediction pipeline to run on jupiter (using /data disk)
uakfdotb Jan 31, 2025
1efd544
add solar farm config
uakfdotb Jan 31, 2025
8be2dd9
Merge branch 'master' into favyen/marine-infra
favyen2 Feb 5, 2025
9a3d757
add documentation about viterbi smoothing step
favyen2 Feb 6, 2025
0d922d6
Merge branch 'favyen/marine-infra' of github.com:allenai/rslearn_proj…
favyen2 Feb 6, 2025
ec95814
add documentation
favyen2 Feb 6, 2025
ead3e22
remove unused launch_worker
favyen2 Feb 6, 2025
a71f3f6
add test for bkt
favyen2 Feb 7, 2025
1eb9d78
add test for rslp/common/worker.py
favyen2 Feb 7, 2025
c4af2a5
add doc string
favyen2 Feb 7, 2025
df45d4c
add tests
favyen2 Feb 7, 2025
731cd0c
add test for apply_nms
favyen2 Feb 7, 2025
98d0925
add merge_points test
favyen2 Feb 7, 2025
d7810bf
remove unused nms stuff
favyen2 Feb 7, 2025
fed1b51
fix test
favyen2 Feb 7, 2025
eeb63e4
add tests for smoothing
uakfdotb Feb 8, 2025
df3341d
refactor prediction pipeline
favyen2 Feb 10, 2025
9b86ca4
fix tests
favyen2 Feb 11, 2025
cbf1f1e
working tests
uakfdotb Feb 12, 2025
a4d4ebf
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Feb 13, 2025
a56c3f5
fix broken Dockerfile (missing wget)
favyen2 Feb 18, 2025
b23985b
fix import for updated planetary computer data source
favyen2 Feb 18, 2025
436bc00
fix tests
favyen2 Feb 20, 2025
8c713ca
fix tests x2
favyen2 Feb 20, 2025
f73761d
don't run bkt test in ci since bigtable instance is expensive to keep…
favyen2 Feb 24, 2025
4a251f8
Merge remote-tracking branch 'origin/master' into favyen/marine-infra
favyen2 Feb 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
only start satlas jobs that weren't already completed
  • Loading branch information
favyen2 committed Jan 17, 2025
commit 298de6a399726d146cb7b63fc627ca886c1e1a5a
121 changes: 82 additions & 39 deletions rslp/satlas/write_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import json
import random
from collections.abc import Generator
from datetime import datetime, timedelta, timezone

import shapely
Expand All @@ -11,10 +12,11 @@
from rslearn.const import WGS84_PROJECTION
from rslearn.utils.geometry import PixelBounds, Projection, STGeometry
from rslearn.utils.get_utm_ups_crs import get_proj_bounds
from upath import UPath

from rslp.log_utils import get_logger

from .predict_pipeline import Application, PredictTaskArgs
from .predict_pipeline import Application, PredictTaskArgs, get_output_fname

logger = get_logger(__name__)

Expand Down Expand Up @@ -54,6 +56,43 @@ def __init__(
self.time_range = time_range
self.out_path = out_path

def get_output_fname(self) -> UPath:
"""Get the output filename that will be used for this task."""
# The filename format is defined by get_output_fname in predict_pipeline.py.
return get_output_fname(
self.application, self.out_path, self.projection, self.bounds
)


def enumerate_tiles_in_zone(utm_zone: CRS) -> Generator[tuple[int, int], None, None]:
"""List all of the tiles in the zone where outputs should be computed.

The tiles are all TILE_SIZE x TILE_SIZE so only the column/row of the tile along
that grid are returned.

Args:
utm_zone: the CRS which must correspond to a UTM EPSG.

Returns:
generator of (column, row) of the tiles that are needed.
"""
# We use get_proj_bounds to get the bounds of the UTM zone in CRS units.
# We then convert to pixel units in order to determine the tiles that are needed.
crs_bbox = STGeometry(
Projection(utm_zone, 1, 1),
shapely.box(*get_proj_bounds(utm_zone)),
None,
)
projection = Projection(utm_zone, RESOLUTION, -RESOLUTION)
pixel_bbox = crs_bbox.to_projection(projection)

# Convert the resulting shape to integer bbox.
zone_bounds = tuple(int(value) for value in pixel_bbox.shp.bounds)

for col in range(zone_bounds[0] // TILE_SIZE, zone_bounds[2] // TILE_SIZE + 1):
for row in range(zone_bounds[1] // TILE_SIZE, zone_bounds[3] // TILE_SIZE + 1):
yield (col, row)


def get_jobs(
application: Application,
Expand All @@ -66,6 +105,8 @@ def get_jobs(
) -> list[list[str]]:
"""Get batches of tasks for Satlas prediction.

Tasks where outputs have already been computed are excluded.

Args:
application: which application to run.
time_range: the time range to run within. Must have timezone.
Expand All @@ -91,17 +132,10 @@ def get_jobs(

tasks: list[Task] = []
for utm_zone in tqdm.tqdm(utm_zones, desc="Enumerating tasks across UTM zones"):
# get_proj_bounds returns bounds in CRS units so we need to convert to pixel
# units.
crs_bbox = STGeometry(
Projection(utm_zone, 1, 1),
shapely.box(*get_proj_bounds(utm_zone)),
None,
)
projection = Projection(utm_zone, RESOLUTION, -RESOLUTION)
pixel_bbox = crs_bbox.to_projection(projection)
zone_bounds = tuple(int(value) for value in pixel_bbox.shp.bounds)

# If the user provided WGS84 bounds, then we convert it to pixel coordinates so
# we can check each tile easily.
user_bounds_in_proj: PixelBounds | None = None
if wgs84_bounds is not None:
dst_geom = STGeometry(
Expand All @@ -114,42 +148,51 @@ def get_jobs(
int(dst_geom.shp.bounds[3]),
)

for col in range(zone_bounds[0] // TILE_SIZE, zone_bounds[2] // TILE_SIZE + 1):
for row in range(
zone_bounds[1] // TILE_SIZE, zone_bounds[3] // TILE_SIZE + 1
):
if user_bounds_in_proj is not None:
# Check if this task intersects the bounds specified by the user.
if (col + 1) * TILE_SIZE < user_bounds_in_proj[0]:
continue
if col * TILE_SIZE >= user_bounds_in_proj[2]:
continue
if (row + 1) * TILE_SIZE < user_bounds_in_proj[1]:
continue
if row * TILE_SIZE >= user_bounds_in_proj[3]:
continue

tasks.append(
Task(
application=application,
projection=projection,
bounds=(
col * TILE_SIZE,
row * TILE_SIZE,
(col + 1) * TILE_SIZE,
(row + 1) * TILE_SIZE,
),
time_range=time_range,
out_path=out_path,
)
for col, row in enumerate_tiles_in_zone(utm_zone):
if user_bounds_in_proj is not None:
# Check if this task intersects the bounds specified by the user.
if (col + 1) * TILE_SIZE < user_bounds_in_proj[0]:
continue
if col * TILE_SIZE >= user_bounds_in_proj[2]:
continue
if (row + 1) * TILE_SIZE < user_bounds_in_proj[1]:
continue
if row * TILE_SIZE >= user_bounds_in_proj[3]:
continue

tasks.append(
Task(
application=application,
projection=projection,
bounds=(
col * TILE_SIZE,
row * TILE_SIZE,
(col + 1) * TILE_SIZE,
(row + 1) * TILE_SIZE,
),
time_range=time_range,
out_path=out_path,
)
)

logger.info("Got %d total tasks", len(tasks))

print(f"Got {len(tasks)} total tasks")
# Remove tasks where outputs are already computed.
existing_output_fnames = {out_fname.name for out_fname in UPath(out_path).iterdir()}
tasks = [
task
for task in tasks
if task.get_output_fname().name not in existing_output_fnames
]
logger.info("Got %d tasks that are uncompleted", len(tasks))

# Sample tasks down to user-provided count (max # tasks to run), if provided.
if count is not None and len(tasks) > count:
tasks = random.sample(tasks, count)
logger.info("Randomly sampled %d tasks", len(tasks))

# Convert tasks to jobs for use with rslp.common.worker.
# This is what will be written to the Pub/Sub topic.
jobs = []
for i in range(0, len(tasks), batch_size):
cur_tasks = tasks[i : i + batch_size]
Expand Down
Loading