Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce POT for energy files #396

Open
wants to merge 25 commits into
base: staging
Choose a base branch
from
Open

Introduce POT for energy files #396

wants to merge 25 commits into from

Conversation

Sarkosos
Copy link
Contributor

No description provided.

Sarkosos added 6 commits March 8, 2025 14:09
- Implement validation of intermediate checkpoints during MD simulation
- Add new methods to get and validate intermediate checkpoints
- Update constants with checkpoint validation parameters
- Modify validator functions to support intermediate checkpoint validation
- Enhance error handling and logging for checkpoint validation process
…king

- Implement intermediate checkpoint submission method in FoldingMiner
- Add SequentialCheckpointReporter to track checkpoints across simulation states
- Update SimulationManager to calculate sequential checkpoint numbering
- Enable retrieval and submission of specific checkpoint files
- Replace MDOutput Pydantic model with a standard dictionary
- Update JobSubmissionSynapse to work with plain dictionary instead of Pydantic model
- Modify deserialization logic to handle dictionary-based md_output
- Remove unnecessary Pydantic import
@@ -78,12 +76,17 @@ def deserialize(self) -> int:
self.md_output = {}
else:
md_output = {}
for k, v in self.md_output.items():
try:
md_output[k] = base64.b64decode(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you just have done md_output[k] = base64.b64decode(v) if v is not None else None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I just copied what we had for the JobSubmissionSynapse

responses = validator.dendrite.query(
synapse=synapse, axons=[axon], deserialize=True
)
return responses[0].cpt_files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the code this looks like responses[0].cpt_files is a tuple... Is this true? If true, then I think it would be better to return each value individually to be clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a dictionary

def name(self) -> str:
return "SyntheticMD"

def validate_intermediate_checkpoints(self, validator, job_id, axon):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soooo much of this class is the same at the regular pipeline we have. Can we abstract the pipeline such that we don't have to iterate within this, but we give the pipeline a checkpoint file and whatever else it needs to do the validation? Then we would iterate at the higher level and just call the pipeline? this was my original idea when creating the registry, so you can just call

for data in X:
   evaluation_registry[task].validate(**data)

Sarkosos added 12 commits March 12, 2025 09:51
- Improve file attachment logic in attach_files_to_synapse
- Add robust log file handling with header validation and combination
- Update checkpoint saving in SimulationManager
- Fix checkpoint reporter filename generation
- Add more comprehensive error handling and logging
- Add traceback logging for intermediate checkpoint validation errors
- Add logging for participation and step execution in validators
- Update checkpoint selection logic to use the last step number
- Add axons parameter to get_energies function
Adjust the random checkpoint selection range to prevent potential index out of bounds error by subtracting 1 from the maximum checkpoint number
…e submission

- Refactor log file combination using pandas for more robust processing
- Add sorting of log files by step number
- Simplify file attachment logic
- Add logging for intermediate checkpoint submission
- Convert validation methods in SyntheticMDEvaluator to async
- Update method calls to use await for async operations
- Modify checkpoint validation to check for exact number of checkpoints
- Update error message for insufficient intermediate checkpoints
- Adjust forward method calls to use async syntax in validators
… FoldingMiner

- Extend BaseMinerNeuron with a new intermediate_submission_forward method
- Update FoldingMiner to base64 encode checkpoint files for submission
- Modify checkpoint file naming to use sequential numbering
- Add cleanup for temporary combined log file after processing
- Move the check for running simulations to a later point in the method
- Ensure the event is updated correctly before returning simulation status
- Maintain functionality for checking local storage for simulation data
@Sarkosos Sarkosos marked this pull request as ready for review March 17, 2025 10:46
…idation

- Added logging for validity checks on final and intermediate checkpoints.
- Updated checkpoint validation to include specific checkpoint numbers.
- Refactored energy extraction logic to handle final and intermediate checkpoints separately.
- Improved event dictionary structure to accommodate new energy data formats.
- Update the energy retrieval logic to use final miner energies instead of intermediate steps.
- Remove redundant code related to max_step calculations for improved clarity.
- Remove the creation of the state file during initialization and instead generate it dynamically based on the checkpoint number.
- Update the save and load state methods to use the newly constructed state file path.
- Clean up redundant comments related to state file management.
- Update the log file naming convention to use the checkpoint number instead of the current state for improved clarity and consistency.
- Remove the unused max_step variable to streamline the energy extraction logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants