Skip to content

Prepare satellite images and training data for use with deep learning models

License

Notifications You must be signed in to change notification settings

tomwilsonsco/rs-chip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rschip

PyPI version License Build Status codecov

Split satellite images into smaller fixed-sized tiles, for input into convolutional neural networks (cnn), or vision transformers (ViT) such as Segment Anything.

Features

  • Tile Satellite Images: Split large satellite images into smaller chips of specified dimensions. Can min-max normalise or standard scale before writing chips as required.
  • Mask Segmentation: Generate segmentation mask images from geopackage or shapefile features for supervised segmentation, e.g using U-Net.
  • Remove Background Chips: Filter out image chips containing only background. Useful for when preparing training and testing datasets.

Installation

Install rschip with pip:

pip install rschip

Requires rasterio, numpy, geopandas, and shapely.

Usage

1. ImageChip Class

The ImageChip class provides functionality for creating tiles (also known as chips) from large satellite images.

from rschip import ImageChip

# Initialize the ImageChip instance for 128 by 128 tiles
image_chipper = ImageChip(
    input_image_path="path/to/large_image.tif",
    output_path="path/to/output_directory_image",
    pixel_dimensions=128,
    offset=64,
    output_format="tif",
)

# set a min max normaliser 
# e.g for 16 bit Sentinel 2 RGB might use
image_chipper.set_normaliser(min_val=500, max_val=3000)

# Generate chips
image_chipper.chip_image()

With the output_format parameter set to "tif", each resulting tile is named using a suffix that represents the bottom left (x, y) pixel coordinate position. If output_format is set to "npz", the resulting .npz zip file contains a dictionary of arrays, where the keys are the same as these tile names. By default, the prefix of each tile name is taken from the input image file name (input_image_path), unless you specify output_name.

Using the parameter use_multiprocessing=True (default) makes chipping process faster by using multiple cores.

2. SegmentationMask Class

The SegmentationMask class is used to create a segmentation mask images from geopackage or shapefile using an input image as extent and pixel size reference.

Once the segmentation mask has been created, the segmentation image can also be split into tiles. Some deep learning frameworks expect images and corresponding masks to have the same file name in separate directories. The output_name argument of ImageChip can ensure this is the case.

from rschip import SegmentationMask, ImageChip

# Initialize the SegmentationMask
seg_mask = SegmentationMask(
    input_image_path="path/to/large_image.tif",
    input_features_path="path/to/geopackage_features.gpkg",
    output_path="path/to/output_mask.tif",
    class_field="ml_class"
)

# Generate segmentation mask image
seg_mask.create_mask()

# Chip the segmentation image to match satellite image
image_chipper = ImageChip(
    input_image_path="path/to/output_mask.tif",
    output_path="path/to/output_directory_mask",
    output_name="large_image",
    pixel_dimensions=128,
    offset=64,
    output_format="tif",
    standard_scale=False,
)
image_chipper.chip_image()

3. RemoveBackgroundOnly Class

The RemoveBackgroundOnly class provides functionality to remove image chips (either could be tifs or numpy arrays inside npz file) that contain only background. Filtering out images only containing background helps to prepare a dataset more suitable for training models.

from rschip import RemoveBackgroundOnly

# Initialize the RemoveBackgroundOnly instance
remover = RemoveBackgroundOnly(background_val=0, non_background_min=100)

# Remove chips with only background
remover.remove_background_only_files(
    class_chips_dir="path/to/mask_directory",
    image_chips_dir="path/to/image_directory"
)

The default assumption is that image and mask equivalent have the same file names as shown in example 2. above. If that is not the case, use the masks_prefix, images_prefix arguments which are prefix strings which are removed on checking for image to mask equivalent using the bottom left (x,y) indices found in the outputs generated by ImageChip.create_chips().

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Prepare satellite images and training data for use with deep learning models

Resources

License

Stars

Watchers

Forks

Languages