Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Classes and Testing Implementation #688

Closed
wants to merge 2 commits into from

Conversation

vschaffn
Copy link
Contributor

Resolves #681.

Overview

This PR introduces a set of classes designed to handle the execution of tasks in either a basic or a multi-processing cluster. It also includes a series of tests that verify the correct behavior of the cluster functionalities.

Key changes

  • ClusterGenerator: A factory class that decides which type of cluster to create based on the given configuration (basic or multiprocessing). It can initialize a BasicCluster or MpCluster with a specified number of worker processes.
  • AbstractCluster: The base class for both BasicCluster and MpCluster. It defines methods to launch tasks, retrieve results, and manage resources such as opening and closing the cluster.
  • BasicCluster: A subclass of AbstractCluster where tasks are run synchronously (i.e., in a single process). The launch_task method simply executes the provided function immediately.
  • MpCluster: A subclass of AbstractCluster that utilizes Python’s multiprocessing module for parallel task execution. The cluster pool is initialized with a defined number of worker processes, and tasks are executed asynchronously using apply_async. The results of tasks can be retrieved via the get_res method.

Test Suite

TestClusterGenerator: A test suite implemented with pytest to validate the functionality of the cluster system.

  • test_basic_cluster: Verifies that tasks are executed synchronously in the BasicCluster.
  • test_mp_cluster_task: Confirms that tasks are launched asynchronously in the MpCluster.
  • test_mp_cluster_parallelism: Tests that multiple tasks can be executed concurrently with multiple workers in MpCluster.
  • test_mp_cluster_termination: Ensures that the cluster terminates correctly and no tasks can be launched after termination.

@rhugonnet
Copy link
Member

Looks good, thanks! 🙂

I only have two generic comments:

  1. Won't we need this in GeoUtils for delayed_multiprocessing?
  2. Adding short comments for each class method could make the code easier to read. For instance, I suppose the AbstractCluster methods with pass are meant to always be subclassed? This could be described in a short function description "Meant to be subclassed only", or by raising something like a NotImplementedError("This function should never be called, meant to be subclassed") instead of pass, which would also make it more directly readable.

@adebardo
Copy link
Contributor

@rhugonnet

  1. It's true that I hadn't extended the use of this cluster class to reprojection because I had really separated reprojection from the rest, but it is possible and makes sense to move it, yes.

@vschaffn could you propose the changes ? thks

@vschaffn
Copy link
Contributor Author

@adebardo @rhugonnet I added comments for each class method, and moved PR to geoutils

GlacioHack/geoutils#656

@adebardo adebardo closed this Feb 26, 2025
@vschaffn vschaffn deleted the 681-cluster branch February 26, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[POC] cluster module
3 participants