Major environment refactoring (draft version) #166

cbhua · 2024-04-25T17:07:09Z

Important

The merge of this pull request is postponed because it contains sensitive modifications to the environment logic, which may cause hidden bugs. We should be careful to update them. Therefore, this full version of environment refactoring will be kept as a draft. We opened another base version refactor pull request: #169, which only touches the environment structure and adds the generator without changing any logic for a safe refactor in the current state. In the future, we will based on this draft's full version, go further refactor environments step by step.

Description

Together with Major modeling refactoring #165, this PR is for major, long-due refactoring to the RL4CO environments codebase.

Motivation and Context

This refactoring is driven by following motivations:

New Feature Integration: We aim to support a data generator capable of producing various distributions for initialized instances.
Standardization of Environments: Before our environments were developed at different times, so there were inconsistencies in content, logic, and formatting. This refactoring tried to standardize these environments.
Code Cleanup: Our earlier versions included redundant code, functions, and calculation logic. This refactoring effort will clean up these elements, enhancing the codebase's readability and maintainability.

Changelog

Environment Structure Refactoring

The refactored structure for environments is as following:

rl4co
├── models/
└── envs/
    ├── eda/
    ├── scheduling/
    └── routing/
        ├── tsp/
        │   ├── env.py
        │   ├── generator.py
        │   └── render.py
        ├── cvrp/
        │   ├── env.py
        │   ├── generator.py
        │   └── render.py
        └── ...

We have restructured the organization of the environment files for improved modularity and clarity. Each environment has its own directory, comprising three components:

env.py: The core framework of the environment, managing functions such as _reset(), _step(), and others. For a comprehensive understanding, please refer to the documentation.
generator.py: Replace the previous generate_data() function; this module works for randomly initializing instances within the environment. The updated version now supports custom data distributions. See the following sections for more details.
render.py: For visualization of the solution. Its separation from the main environment file enhances overall code readability.

Data Generator Supporting

Each environment generator will be based on the base Generator() class with the following functions:

class Generator():
    def __init__(self, **kwargs):
        self.kwargs = kwargs

    def __call__(self, batch_size) -> TensorDict:
        batch_size = [batch_size] if isinstance(batch_size, int) else batch_size
        return self._generate(batch_size)

    def _generate(self, batch_size, **kwargs) -> TensorDict:
        raise NotImplementedError

__init_() will record all the environment instance initialize parameters, for example, num_loc, min_loc, max_loc, etc.

Thus, you will see how the __init__() function for the environment (e.g. CVRPEnv.__init__(...)) only takes generator and generator_params as input. Now, the environment initialize example would be
```
env = CVRPEnv(generator_params={num_loc=20})

# Another way
generator = CVRPGenerator(num_loc=20)
env = CVRPEnv(generator)
```
Various samplers will be initialized here. We provide the get_sampler() function to based on the input variables to return a torch.distributions class. By default, we support distributions Uniform, Normal, Exponential, and Poisson for locations and center, corner, for depots. You can also pass your won distribution sampler. See the following sections for more details.
__call__() is a middle wrapper; at the moment, it is used to regularize the batch_size format supported by the TorchRL (i.e., in a list format). Note that in this refactor version, we would finalize the dimension of batch_size to be 1 for easier implementation and clearer understanding since even multi-batch-size dimensions can be easily transferred to a single dimension.
__generate() is the part you would like to implement for your own environment data generator.

New `get_sampler()` function

This implementation mainly refers to @ngastzepeda's code. In the current version, we support the following distributions:

center: For depots. All depots will be initialized in the center of the space.
corner: For depots. All depots will be initialized in the bottom left corner of the space.
Uniform: Takes min_val and max_val as input.
Exponential and Poisson: Take mean_val and std_val as input.

You can also use your own Callable function as the sampler. This function will take the batch_size: List[int] as input and return the sampled torch.Tensor.

Modification for `RL4COEnvBase()`

We move the checking for batch_size and device from every environment to the base class for clarity, as shown in

rl4co/rl4co/envs/common/base.py

Lines 130 to 138 in b70566b

    
           def reset(self, td: Optional[TensorDict] = None, batch_size=None) -> TensorDict: 
        
               """Reset function to call at the beginning of each episode""" 
        
               if batch_size is None: 
        
                   batch_size = self.batch_size if td is None else td.batch_size 
        
               if td is None or td.is_empty(): 
        
                   td = self.generator(batch_size=batch_size) 
        
               batch_size = [batch_size] if isinstance(batch_size, int) else batch_size 
        
               self.to(td.device) 
        
               return super().reset(td, batch_size=batch_size)

We added a new _get_reward() function aside from the original get_reward() function and moved the check_solution_validity() from every environment to the base class for clarity, as shown in

rl4co/rl4co/envs/common/base.py

Lines 175 to 187 in b70566b

    
               def get_reward(self, td, actions) -> TensorDict: 
        
                   """Function to compute the reward. Can be called by the agent to compute the reward of the current state 
        
                   This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks 
        
                   """ 
        
                   if self.check_solution: 
        
                       self.check_solution_validity(td, actions) 
        
                   return self._get_reward(td, actions) 
        
               def _get_reward(self, td, actions) -> TensorDict: 
        
                   """Function to compute the reward. Can be called by the agent to compute the reward of the current state 
        
                   This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks 
        
                   """ 
        
                   raise NotImplementedError

Standardization

We standardize the contents of env.py with the following functions:

class EnvName(RL4COEnvBase):
	name = "env_name"
	def __init__(self, generator: EnvGenerator, generator_params: dict): pass
	
	def _step(self, td: TensorDict) -> Tensordict: pass
	
	@staticmethod
	def get_action_mask(td: TensorDict) -> torch.Tensor: pass
	
	def _reset(self, td: Optional[TensorDict] = None, batch_size: Optional[list] = None) -> TensorDict: pass
	
	def _get_reward(self, td: TensorDict, actions: torch.Tensor) -> torch.Tensor: pass
	
	@staticmethod
	def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None: pass
	
	@staticmethod
	def render(td: TensorDict, actions: torch.Tensor = None, ax = None): pass
	
	def _make_spec(self, generator: EnvGenerator): pass

The order is considered to be natural and easy to follow, and we expected all environments to follow the same order for easier reference and matinees. In more detail, we have the following standardization:

We changed the variable name available to visited for more intuitive understanding. In the step() and get_action_mask() calculation, visited records which nodes are visited, and the action_mask is based on it with environment constraints (e.g., capacity, time window, etc.). Separating these two variables would be clearer for the calculation logic.
For some environments, change the _step() function to a nonstatic method. Follow the TorchRL style.
Standardize the get_action_mask() calculation logic, which generally contains three parts: (a) initialize the action_mask based on visited; (b) update cities action_mask based on the state; (c) update the depot action_mask finally. Based on experience, this logic would cause fewer conflicts and mass.
All 1-D features, e.g., i, capacity, used_capacity, etc., are initialized with the size of [*batch_size, 1] instead of [*batch_size, ]. The reason is that in many masking operations, we need to do logic calculations between this 1-D feature and 2-D features, e.g., capacity with demand. Also, stay consistent with TorchRL implementation.
Rewrite comments on environments with descriptions of observations, constraints, finish conditions, rewards, and args so that a user can better understand the environment. Also, move data-related parameters (e.g., num_loc, min_loc, max_loc) to the generator for clarity.
Add the cost variable to the get_reward function for an intuitive understanding. In this case, the return (reward) is -cost.

Other Fixes

In CVRP, change the variable name vehicle_capacity → capacity, capacity → unnorm_capacity to clarify.
[⚠️ Sensitive Change] Now, the demand variable will also contain the depot. For example, in the previous CVRPEnv(), given num_loc=50, the td[”locs”] has the size of [batch_size, 51, 2] (with the depot), and the td[”demand”] has the size of [batch_size, 50, 2]. This causes index shifting in the get_action_mask() function, which requires a few padding operations.
Fix the SDVRP environment action mask calculation bug.
Adding numerical calculation error bound (0 → 1e-5), for example, in SDVRP done = ~(demand > 0).any(-1) → done = ~(demand > 1e-5).any(-1) for better robustness to avoid edge cases.
In CVRP, OP, and PCTSP environments, getting variables from tables with num_loc, e.g., CVRP CAPACITIES, if the given num_loc is not in the table, we will find the closest num_loc as replace and raise a warning to increase the running robustness.
Fix the return type of get_reward().

Notes

In The current version, we don’t support the distribution of int values, e.g., num_depot, num_agents. These values are initialized by torch.randint().
In the reward calculation, for environments with the constraint starting and ending at the depot, actions should pad 0 to the start and end.
In the current version, only routing environments have been refactored. We will also refactor the EDA and Scheduling environments soon.

Here is the summary of the refractory status for each environment:

Decompose: decompose environments into folder with env.py, generator.py, render.py; fix the __init__() and _reset() functions;
Training Checking: checking the training of refactored environments;
Documentation: cleanup and fix environment documents and logic comments;
Solution Validity: check if the environment contains a check_solution_validity() function;
Clean up Logic: check if the _step() and get_action_maks() function are cleaned up with the standard pipeline.

	Decompose	Training Checking	Documentation	Solution Validity	Clean up Logic
TSP	✅	✅	✅	✅	✅
CVRP	✅	✅	✅	✅	✅
CVRPTW	✅	✅	✅	✅
PCTSP	✅	✅		✅	✅
OP	✅		✅	✅
SDVRP	✅	✅	✅	✅	✅
SVRP	✅		✅	✅
ATSP	✅	✅	✅	✅	✅
MTSP	✅	✅	✅
SPCTSP	✅	✅		✅	✅
PDP	✅	✅	✅	✅	✅
MPDP	✅	✅	✅
MDCPDP	✅	✅

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

Thanks, and need your help

Thanks for @ngastzepeda's base code for this refactoring!

If you have time, welcome to provide your ideas/feedback on this PR.
CC: @Furffico @henry-yeh @bokveizen @LTluttmann

There are quite a few remaining works for this PR, and I will actively update them here.

… pr)

ngastzepeda · 2024-04-26T10:47:03Z

rl4co/envs/common/base.py

+        if batch_size is None:
+            batch_size = self.batch_size if td is None else td.batch_size
+        if td is None or td.is_empty():
+            td = self.generator(batch_size=batch_size)


Couldn't we include the generator as a parameter in the base environment already and set it in __init__()?

It's just that we're calling it here (and further below), but in the base environment it doesn't actually exist.

Yes the reason why they are here is because they could get passed to the environment itself, as in TorchRL here. This is the function signature for EnvBase in TorchRL:

def __init__( self, *, device: DEVICE_TYPING = None, batch_size: Optional[torch.Size] = None, run_type_checks: bool = False, allow_done_after_reset: bool = False, ): ``` So I guess we should make the above explicit in `RL4COEnvBase` since ours is a child class!

@ngastzepeda I think it makes sense that now all environments have the generator and generator_params as the input in the __init__() function. We could move them to the RL4COEnvBase().

And also as @fedebotu said, it's better to show other useful parameters from torchrl.EnvBase in our RL4COEnvBase(). Easier for users to know the provided APIs not only from the documentation.

@ngastzepeda I rethink about adding generator and generator_params to the RL4COEnvBase class. I prefer to keep them for each environment at the current moment for two reasons:

We want users can initialize the environment with simply calling env = <EnvName>(), e.g. env = TSPEnv() without any parameters. In this case, there is a required generator initialization for each environment with respective generator class, e.g.

rl4co/rl4co/envs/routing/cvrp/env.py

Lines 59 to 61 in a9943c9

if generator is None:

generator = CVRPGenerator(**generator_params)

self.generator = generator

It would be hard or at least massive for users to understand if we implement this part in the base class;

Putting the generator initializing for each environment could be a hint for users to understand the "generate data" -> "reset instance as a tensordict" -> "step rollout, ..." pipeline.

What do you think about this? 🤔

rl4co/envs/common/utils.py

ngastzepeda · 2024-04-26T11:12:02Z

rl4co/envs/common/utils.py

+        assert kwargs.get("mean_"+val_name, None) is not None, "mean is required for Normal distribution"
+        assert kwargs.get(val_name+"_std", None) is not None, "std is required for Normal distribution"
+        return Normal(mean=kwargs[val_name+"_mean"], std=kwargs[val_name+"_std"])
+    elif distribution == Exponential or distribution == "exponential":
+        assert kwargs.get(val_name+"_rate", None) is not None, "rate is required for Exponential/Poisson distribution"
+        return Exponential(rate=kwargs[val_name+"_rate"])
+    elif distribution == Poisson or distribution == "poisson":
+        assert kwargs.get(val_name+"_rate", None) is not None, "rate is required for Exponential/Poisson distribution"
+        return Poisson(rate=kwargs[val_name+"_rate"])


Do I understand correctly we're assuming a specific format for these parameters then (i.e. we expect parameters val_name_mean, val_name_std, etc.?) and it's not enough to simply pass f.e. mean = 5, std = 2?

Good question! Your understand is correct.I have thought about this for some time. The thing is: this get_sampler() function will be called in the generator multiple times for different features, e.g. in the OPGenerator()

self.depot_sampler = get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs) self.prize_sampler = get_sampler("prize", prize_distribution, min_prize, max_prize, **kwargs)

If the user wants to init the location with a Normal distribution and the prize with a Poisson distribution, 3 parameters are required:

The mean of location;

The std of location;

The rate of prize.

We have two options to consider these parameters in the OPGenerator():

Adding all of them explicitly to the __init__() inputs;

def __init__(self, min_loc, max_loc, mean_loc, std_loc, rate_loc, loc_distribution,\ min_prize, max_prize, mean_prize, std_prize, rate_prize, prize_distribution)

Supported by the kwargs in the __init__() inputs, i.e. the user should follow the rule that: if you want to use the Normal distribution for <val_name>, you have to give extra parameters with exact the name mean_<val_name> and std_<val_name>.

Actually, both will work, but for clarity and flexibility, I chose the second way. However, I understand that this would be confusing for users, we should have a clear documentation for the standard rule for the parameter name.

If you have a better implementation, please tell me 🤔 I do think the current implementation may not be the optimal.

ngastzepeda

I commented at first that instead of doing the same things in every single environment (specifically getting the visited,current_node, donetensors, etc.) we should define that in the parent class function such that the child classes can simply call the parent's class method. Then I noticed that the base class is for all environments, not just routing. Maybe it would make sense, though, to have a base class for routing (and one for scheduling, etc.), even within the base.py file, which would inherit from RL4COEnvBase and could define these things that are the same for all routing envs so we don't have to do the same thing multiple times...
Apart from that I only left a few minor comments :)

rl4co/envs/routing/atsp/env.py

rl4co/envs/routing/atsp/generator.py

rl4co/envs/routing/cvrp/env.py

ngastzepeda · 2024-04-26T12:32:22Z

rl4co/envs/routing/spctsp/env.py

@@ -1,6 +1,6 @@
 from rl4co.utils.pylogger import get_pylogger

-from .pctsp import PCTSPEnv
+from ..pctsp.env import PCTSPEnv


Since the only difference between this class and PCTSP seems to be that this one is stochastic, but there is no additional logic implemented to PCTSP, why even have two separate environments and not just differentiate via the stochasticboolean parameter?

I agree but conceptually they are a bit different, so it might be worth to keep the difference. Technically you could call PCTSP with the stochastic parameter on too

ngastzepeda · 2024-04-26T12:33:07Z

rl4co/envs/routing/svrp/env.py

+        visited = td["visited"].scatter(
+            -1, current_node.expand_as(td["action_mask"]), 1
+        )
+        print(current_node)


Oops, I had forgotten to delete the print statement^^

ngastzepeda · 2024-04-26T12:36:27Z

rl4co/envs/routing/svrp/env.py

+log = get_pylogger(__name__)
+
+
+class SVRPEnv(RL4COEnvBase):


Now that we`re doing the refactoring anyways, we might as well rename this environment to SkillVRP to avoid confusion with the Stochastich VRP :)

ngastzepeda · 2024-04-26T12:36:41Z

rl4co/envs/routing/svrp/generator.py

+log = get_pylogger(__name__)
+
+
+class SVRPGenerator(Generator):


Rename to SkillVRPGenerator

ngastzepeda · 2024-04-26T12:37:54Z

rl4co/envs/routing/svrp/render.py

+log = get_pylogger(__name__)
+
+
+def render(td, actions=None, ax=None):


I have to admit, I've never actually rendered a Skill VRP problem, so no idea if this runs without problems

fedebotu

Great job!
Left some comments here and there. Additionally as @hyeok9855 is doing, there should be an additional (optional) file called local_search.py

fedebotu · 2024-04-26T13:53:27Z

rl4co/envs/routing/atsp/generator.py

+        dms[..., torch.arange(self.num_loc), torch.arange(self.num_loc)] = 0
+
+        log.info("Using TMAT class (triangle inequality): {}".format(self.tmat_class))
+        if self.tmat_class:


Shouldn't this be inside of the sampler itself?

fedebotu · 2024-04-26T13:57:49Z

rl4co/envs/routing/spctsp/env.py

@@ -1,6 +1,6 @@
 from rl4co.utils.pylogger import get_pylogger

-from .pctsp import PCTSPEnv
+from ..pctsp.env import PCTSPEnv


I agree but conceptually they are a bit different, so it might be worth to keep the difference. Technically you could call PCTSP with the stochastic parameter on too

fedebotu · 2024-04-26T14:03:58Z

tests/test_policy.py

@@ -8,7 +8,8 @@
 # Main autorergressive policy: rollout over multiple envs since it is the base
 @pytest.mark.parametrize(
    "env_name",
-    ["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp", "dpp", "mdpp", "smtwtp"],
+    # ["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp", "dpp", "mdpp", "smtwtp"],
+    ["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp"],


Why were tests from the above environments removed?

In the current refactoring version I just finished the routing environments part. Since we modified the RL4COEnvBase(), this will affect the running for EDA environments.

I will finish the refactoring for EDA environments in the coming commits, and put these checks back, don't worry.

fedebotu · 2024-04-30T08:17:58Z

Let's remember also to fix the shifts in the torch.roll distance calculation as @ngastzepeda noticed, e.g. here. These do not affect calculations in euclidean problems, but it's best to have it conceptually correct

fedebotu · 2024-05-01T00:23:56Z

Notice that we moved most of the above in here #169 (without modification to environment logic or variables)! We will address the comments and merge soon~

fedebotu · 2024-06-07T14:10:56Z

There have been too many changes to track recently, and it seems that several features have already been added.

I will be closing this for now and come back to this for a fresh PR if needed!

cbhua added 18 commits April 25, 2024 04:59

[Feat] modify rl4co env base class for generator

8ad85fe

[Feat] adding utils for data generator

9864a95

[Refactor] refactoring for TSP

d97a387

[Refactor] refactoring for CVRP

c280d7f

[Refactor] refactoring for CVRPTW

5f4eec6

[Refactor] refactoring for ATSP

1b2bc37

[Refactor] refactoring for mTSP

1507ba7

[Refactor] refactoring for OP

11c5619

[Refactor] refactoring for PDP

6f53c73

[Refactor] refactoring for mPDP

3f9d58f

[Refactor] refactoring for MDCPDP

cfbb0a5

[Refactor] refactoring for PCTSP

844e2c8

[Refactor] refactoring for SDVRP

fb88031

[Refactor] refactoring for SPCTSP

4243184

[Refactor] refactoring for SVRP

e302a30

[Refactor, Fix, Minor] embedding changes

50d873d

[BugReport, Temp] temp mute svrp and eda scheduling env (refer to the…

6450ac6

… pr)

[BugReport, Temp] temp mute eda envs for routing env refactory validity

b70566b

ngastzepeda reviewed Apr 26, 2024

View reviewed changes

fedebotu reviewed Apr 26, 2024

View reviewed changes

cbhua added 8 commits April 28, 2024 23:53

[Refactor] refactoring for DPP

784344b

[Refactor] refactoring for mDPP

943d3bd

[Refactor] refactoring for FFSP

7a4a174

[Refactor] refactoring for SMTWTP

6e28a52

[Refactor] fix relative path

a9943c9

[BugFix] fix bugs and finalize the refactoring

c0cd817

[Test] sync test script from main

904c2e6

[BugFix] fix rllout error for the atsp

f30dcd8

fedebotu changed the title ~~Major environment refactoring~~ Major environment refactoring (draft version) Apr 30, 2024

cbhua mentioned this pull request Apr 30, 2024

Major environment refactoring (base version) #169

Merged

8 tasks

cbhua added 2 commits April 30, 2024 23:23

[Merge] sync refactor-env with main

11afd40

[BugFix] update spec for the latest torchrl

d93ff0d

fedebotu closed this Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major environment refactoring (draft version) #166

Major environment refactoring (draft version) #166

cbhua commented Apr 25, 2024 •

edited

Loading

ngastzepeda Apr 26, 2024

ngastzepeda Apr 26, 2024

fedebotu Apr 26, 2024

cbhua Apr 27, 2024

cbhua Apr 29, 2024 •

edited

Loading

ngastzepeda Apr 26, 2024

cbhua Apr 27, 2024

ngastzepeda left a comment

ngastzepeda Apr 26, 2024

fedebotu Apr 26, 2024

ngastzepeda Apr 26, 2024

ngastzepeda Apr 26, 2024

ngastzepeda Apr 26, 2024

ngastzepeda Apr 26, 2024

fedebotu left a comment

fedebotu Apr 26, 2024

fedebotu Apr 26, 2024

fedebotu Apr 26, 2024

cbhua Apr 27, 2024

fedebotu commented Apr 30, 2024

fedebotu commented May 1, 2024

fedebotu commented Jun 7, 2024 •

edited

Loading

	def reset(self, td: Optional[TensorDict] = None, batch_size=None) -> TensorDict:
	"""Reset function to call at the beginning of each episode"""
	if batch_size is None:
	batch_size = self.batch_size if td is None else td.batch_size
	if td is None or td.is_empty():
	td = self.generator(batch_size=batch_size)
	batch_size = [batch_size] if isinstance(batch_size, int) else batch_size
	self.to(td.device)
	return super().reset(td, batch_size=batch_size)

	def get_reward(self, td, actions) -> TensorDict:
	"""Function to compute the reward. Can be called by the agent to compute the reward of the current state
	This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
	"""
	if self.check_solution:
	self.check_solution_validity(td, actions)
	return self._get_reward(td, actions)

	def _get_reward(self, td, actions) -> TensorDict:
	"""Function to compute the reward. Can be called by the agent to compute the reward of the current state
	This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
	"""
	raise NotImplementedError

	if generator is None:
	generator = CVRPGenerator(**generator_params)
	self.generator = generator

		log = get_pylogger(__name__)


		def render(td, actions=None, ax=None):

Major environment refactoring (draft version) #166

Major environment refactoring (draft version) #166

Conversation

cbhua commented Apr 25, 2024 • edited Loading

Description

Motivation and Context

Changelog

Environment Structure Refactoring

Data Generator Supporting

New get_sampler() function

Modification for RL4COEnvBase()

Standardization

Other Fixes

Notes

Types of changes

Checklist

Thanks, and need your help

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbhua Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngastzepeda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebotu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebotu commented Apr 30, 2024

fedebotu commented May 1, 2024

fedebotu commented Jun 7, 2024 • edited Loading

cbhua commented Apr 25, 2024 •

edited

Loading

New `get_sampler()` function

Modification for `RL4COEnvBase()`

cbhua Apr 29, 2024 •

edited

Loading

fedebotu commented Jun 7, 2024 •

edited

Loading