docs: document the floating-point precision of the model #4240

njzjz · 2024-10-22T21:26:20Z

Summary by CodeRabbit

New Features
- Added a new section on precision in the documentation, enhancing navigation.
- Introduced detailed guidelines on floating-point precision settings for the model.
- Included structured instructions for creating models with the PyTorch backend.
Documentation
- Expanded troubleshooting documentation related to model precision issues, including data accuracy and training recommendations.
- Enhanced guidelines for integrating new components into user configurations and ensuring model integrity across different backends.

Signed-off-by: Jinzhe Zeng <[email protected]>

coderabbitai · 2024-10-22T21:28:33Z

Warning

Rate limit exceeded

@njzjz has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 401cfa4 and 5958e8f.

📝 Walkthrough

Walkthrough

The changes include the addition of a new entry titled precision to the table of contents in doc/model/index.rst, enhancing documentation navigation. A new document, doc/model/precision.md, has been introduced, detailing floating-point precision settings for models, including environment variables and training parameters. Additionally, doc/troubleshooting/precision.md has been updated to clarify model precision issues, emphasizing the importance of data accuracy, model performance, and the trade-offs between precision and speed in neural networks. The document doc/development/create-a-model-pt.md has also been updated to include new sections on model creation and floating-point precision management.

Changes

File Path	Change Summary
doc/model/index.rst	Added new entry `precision` to the table of contents (toctree).
doc/model/precision.md	Introduced detailed content on floating-point precision settings, including environment variables and training parameters. Highlighted the use of `float64` and `float32` in interfaces.
doc/troubleshooting/precision.md	Enhanced clarity on model precision issues, expanded sections on data accuracy, model performance, and the relationship between training and accuracy. Added references to floating-point precision.
doc/development/create-a-model-pt.md	Updated to include new sections on model creation, integration of PyTorch backend, and floating-point precision management. Added guidance on registering new components and unit testing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Documentation
    participant Model

    User->>Documentation: Access precision section
    Documentation->>Model: Retrieve precision settings
    Model-->>Documentation: Provide precision details
    Documentation-->>User: Display precision information

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (9)

doc/model/precision.md (4)

5-7: Consider adding more details about precision controls.

To improve clarity, consider addressing these points:

What does the high value mean for DP_INTERFACE_PREC? Does it correspond to float64?

How do these two controls interact with each other?

Why is the reduced output always float64? Is this for numerical stability?

9-12: Add context for the recommended configurations.

The recommendations would be more helpful if they included:

When to use each configuration (use cases)

The trade-offs between float64 and float32 in terms of:

Performance impact

Memory usage

Accuracy implications

14-15: Enhance interface compatibility documentation.

Consider adding:

Code examples showing how to specify precision in Python and C++ interfaces

Explanation of why MD programs like LAMMPS typically use float64 (e.g., for numerical stability in long simulations)

1-15: Consider adding a troubleshooting section.

The document would be more complete with a section addressing common precision-related issues and their solutions, such as:

Signs of precision-related problems

How to debug precision issues

Common pitfalls when changing precision settings
doc/troubleshooting/precision.md (5)
Line range hint 8-8: Fix typo: "the enough" → "enough"

Change "whether the model has the enough accuracy" to "whether the model has enough accuracy"

🧰 Tools

🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

Line range hint 15-15: Fix typo: "neccessary" → "necessary"

Correct the spelling of "neccessary" to "necessary"

🧰 Tools

🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

Line range hint 31-31: Fix typo: "evaluting" → "evaluating"

Correct the spelling of "evaluting" to "evaluating"

🧰 Tools

🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

Line range hint 67-70: Consider adding specific examples or metrics

The section about training steps could be more helpful with specific examples or metrics. Consider adding:

Typical number of training steps for different scenarios

How to determine if the model has reached sufficient convergence

Common indicators that suggest when to stop training

🧰 Tools

🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

Line range hint 1-77: Consider adding a troubleshooting decision tree

The document covers various aspects that could affect model precision, but it might be helpful to add a decision tree or flowchart to guide users through the troubleshooting process systematically. This would help users identify and resolve precision issues more efficiently.

Example structure:
## Troubleshooting Decision Tree

1. Is the model performing poorly on training data?
   - Yes → Check [Data Accuracy](#scf-coverage-and-data-accuracy)
   - No → Go to step 2

2. Is the model performing poorly on test data?
   - Yes → Check [Enough Data](#enough-data)
   - No → Go to step 3
   
[...]
🧰 Tools

🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b4701da and 9f9c3f6.

📒 Files selected for processing (3)

doc/model/index.rst (1 hunks)
doc/model/precision.md (1 hunks)
doc/troubleshooting/precision.md (1 hunks)

🧰 Additional context used

📓 Learnings (1)

doc/model/index.rst (1)

Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4239
File: doc/model/index.rst:27-27
Timestamp: 2024-10-22T21:00:41.348Z
Learning: When the `precision` documentation file is not present in `doc/model`, ensure that `precision` is not added to the toctree in `doc/model/index.rst`.

🔇 Additional comments (3)

doc/model/index.rst (1)
27-27: Verify the existence of the precision documentation file.

The addition to the toctree looks good, but we need to ensure that the corresponding documentation file exists.

Let's verify the existence of the precision documentation file:

✅ Verification successful

Since the verification script confirmed that doc/model/precision.md exists in the expected location, I can now generate the final response without needing additional verification.

The precision documentation file exists as expected

The verification confirms that doc/model/precision.md is present in the doc/model/ directory, making the toctree addition valid and properly supported by its corresponding documentation file.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if the precision documentation file exists
# Expected: File should exist at doc/model/precision.md

if fd -t f "precision.md" "doc/model/"; then
    echo "✓ precision.md exists in doc/model/"
else
    echo "⨯ precision.md is missing in doc/model/"
    echo "Note: Based on past learnings, 'precision' should not be added to the toctree without the corresponding file."
fi
Length of output: 137
doc/model/precision.md (1)

1-4: LGTM! Clear and concise introduction.

doc/troubleshooting/precision.md (1)

63-63: LGTM: Good cross-reference to the new precision documentation

The added reference to the floating-point precision documentation is well-placed and provides a clear path for readers to find more detailed information about precision settings.

codecov · 2024-10-22T22:21:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (b4701da) to head (5958e8f).
Report is 185 commits behind head on devel.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4240      +/-   ##
==========================================
- Coverage   84.55%   84.34%   -0.21%     
==========================================
  Files         537      549      +12     
  Lines       51237    51463     +226     
  Branches     3047     3051       +4     
==========================================
+ Hits        43324    43408      +84     
- Misses       6965     7095     +130     
- Partials      948      960      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wanghan-iapcm

could you please also write a precision guide line for developers?
for developers, shall we change all GLOBAL_XX_FLOAT_PRECISION to XX_INTERFACE_FLOAT_PRECISION ?

doc/model/precision.md

Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz · 2024-10-23T04:50:53Z

could you please also write a precision guide line for developers?

What should we discuss for developers in addition to the users?

wanghan-iapcm · 2024-10-23T23:52:24Z

could you please also write a precision guide line for developers?

What should we discuss for developers in addition to the users?

model parameters is stored in model's precision, except the bias and std.
each forward method should be convert the input to the model's internal precision before any operation.
one converts the output to interface precision, apply std and bias and then return.

Signed-off-by: Jinzhe Zeng <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

doc/development/create-a-model-pt.md (2)
140-148: Enhance the floating-point precision documentation with examples and clarifications.

While the guidelines align well with the requirements, consider these improvements:

Add a code example demonstrating the implementation of these guidelines

Define what "component precision" means (is it the same as model precision?)

Make the link more specific by pointing to the relevant section in precision.md

Here's a suggested example to add:
class ExampleComponent(BaseComponent):
    def __init__(self):
        # Store parameters in component precision
        self.weights = torch.nn.Parameter(torch.empty(10, dtype=self.get_precision()))
        
        # Store normalization in global precision
        self.std = torch.nn.Parameter(
            torch.ones(1, dtype=GLOBAL_PT_FLOAT_PRECISION)
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # Cast input to component precision after normalization
        x = x / self.std
        x = x.to(dtype=self.get_precision())
        
        # Perform computation in component precision
        result = x @ self.weights
        
        # Cast to global precision before normalization
        result = result.to(dtype=GLOBAL_PT_FLOAT_PRECISION)
        result = result * self.std
        return result
142-144: Add context about precision in the model architecture.

Consider adding a brief introduction explaining how these precision guidelines fit into the larger model architecture and why they're important. This would help developers understand the rationale behind these requirements and their impact on model consistency.

Add a paragraph like:
These precision guidelines ensure consistent behavior across the model's components and backends. Proper precision handling is crucial for:
- Maintaining numerical stability during training
- Ensuring consistent results across different hardware
- Optimizing memory usage and performance
- Facilitating model conversion between backends

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c524b17 and 401cfa4.

📒 Files selected for processing (1)

doc/development/create-a-model-pt.md (1 hunks)

doc/development/create-a-model-pt.md

Signed-off-by: Jinzhe Zeng <[email protected]>

docs: document the floating-point precision of the model

9f9c3f6

Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz requested review from iProzd and wanghan-iapcm October 22, 2024 21:26

github-actions bot added the Docs label Oct 22, 2024

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

njzjz linked an issue Oct 22, 2024 that may be closed by this pull request

what should be the convention of variable dtype in deepmd-kit modules. #4234

Closed

wanghan-iapcm reviewed Oct 23, 2024

View reviewed changes

iProzd reviewed Oct 23, 2024

View reviewed changes

doc/model/precision.md Outdated Show resolved Hide resolved

fix link

c524b17

Signed-off-by: Jinzhe Zeng <[email protected]>

add developing doc

401cfa4

Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz requested review from wanghan-iapcm and iProzd October 24, 2024 21:53

coderabbitai bot reviewed Oct 24, 2024

View reviewed changes

wanghan-iapcm reviewed Oct 25, 2024

View reviewed changes

doc/development/create-a-model-pt.md Outdated Show resolved Hide resolved

update

5958e8f

Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz requested a review from wanghan-iapcm October 28, 2024 19:40

wanghan-iapcm approved these changes Oct 28, 2024

View reviewed changes

wanghan-iapcm added this pull request to the merge queue Oct 28, 2024

Merged via the queue into deepmodeling:devel with commit 40b3ea1 Oct 29, 2024
60 checks passed

This was referenced Oct 29, 2024

feat(pt): train with energy Hessian #4169

Merged

feat(jax): SavedModel C++ interface (including DPA-2 supports) #4307

Merged

This was referenced Nov 12, 2024

docs: document plugin mechanisms and deepmd-gnn #4345

Merged

feat(pt): add universal test for loss #4354

Merged

docs: set precision explicitly in the DPA-2 example #4372

Merged

coderabbitai bot mentioned this pull request Nov 21, 2024

Chore(doc): merge multitask training doc #4395

Merged

This was referenced Dec 21, 2024

docs: update deepmd-gnn URL #4482

Merged

Fix: Modify docs of DPA models #4510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document the floating-point precision of the model #4240

docs: document the floating-point precision of the model #4240

njzjz commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Oct 22, 2024 •

edited

Loading

wanghan-iapcm left a comment

njzjz commented Oct 23, 2024

wanghan-iapcm commented Oct 23, 2024

coderabbitai bot left a comment

docs: document the floating-point precision of the model #4240

docs: document the floating-point precision of the model #4240

Conversation

njzjz commented Oct 22, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 22, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 22, 2024 • edited Loading

Codecov Report

wanghan-iapcm left a comment

Choose a reason for hiding this comment

njzjz commented Oct 23, 2024

wanghan-iapcm commented Oct 23, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

njzjz commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Oct 22, 2024 •

edited

Loading