Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: document the floating-point precision of the model #4240

Merged
merged 4 commits into from
Oct 29, 2024

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Oct 22, 2024

Summary by CodeRabbit

  • New Features

    • Added a new section on precision in the documentation, enhancing navigation.
    • Introduced detailed guidelines on floating-point precision settings for the model.
    • Included structured instructions for creating models with the PyTorch backend.
  • Documentation

    • Expanded troubleshooting documentation related to model precision issues, including data accuracy and training recommendations.
    • Enhanced guidelines for integrating new components into user configurations and ensuring model integrity across different backends.

@njzjz njzjz requested review from iProzd and wanghan-iapcm October 22, 2024 21:26
@github-actions github-actions bot added the Docs label Oct 22, 2024
Copy link
Contributor

coderabbitai bot commented Oct 22, 2024

Warning

Rate limit exceeded

@njzjz has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 401cfa4 and 5958e8f.

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes include the addition of a new entry titled precision to the table of contents in doc/model/index.rst, enhancing documentation navigation. A new document, doc/model/precision.md, has been introduced, detailing floating-point precision settings for models, including environment variables and training parameters. Additionally, doc/troubleshooting/precision.md has been updated to clarify model precision issues, emphasizing the importance of data accuracy, model performance, and the trade-offs between precision and speed in neural networks. The document doc/development/create-a-model-pt.md has also been updated to include new sections on model creation and floating-point precision management.

Changes

File Path Change Summary
doc/model/index.rst Added new entry precision to the table of contents (toctree).
doc/model/precision.md Introduced detailed content on floating-point precision settings, including environment variables and training parameters. Highlighted the use of float64 and float32 in interfaces.
doc/troubleshooting/precision.md Enhanced clarity on model precision issues, expanded sections on data accuracy, model performance, and the relationship between training and accuracy. Added references to floating-point precision.
doc/development/create-a-model-pt.md Updated to include new sections on model creation, integration of PyTorch backend, and floating-point precision management. Added guidance on registering new components and unit testing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Documentation
    participant Model

    User->>Documentation: Access precision section
    Documentation->>Model: Retrieve precision settings
    Model-->>Documentation: Provide precision details
    Documentation-->>User: Display precision information
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (9)
doc/model/precision.md (4)

5-7: Consider adding more details about precision controls.

To improve clarity, consider addressing these points:

  1. What does the high value mean for DP_INTERFACE_PREC? Does it correspond to float64?
  2. How do these two controls interact with each other?
  3. Why is the reduced output always float64? Is this for numerical stability?

9-12: Add context for the recommended configurations.

The recommendations would be more helpful if they included:

  1. When to use each configuration (use cases)
  2. The trade-offs between float64 and float32 in terms of:
    • Performance impact
    • Memory usage
    • Accuracy implications

14-15: Enhance interface compatibility documentation.

Consider adding:

  1. Code examples showing how to specify precision in Python and C++ interfaces
  2. Explanation of why MD programs like LAMMPS typically use float64 (e.g., for numerical stability in long simulations)

1-15: Consider adding a troubleshooting section.

The document would be more complete with a section addressing common precision-related issues and their solutions, such as:

  1. Signs of precision-related problems
  2. How to debug precision issues
  3. Common pitfalls when changing precision settings
doc/troubleshooting/precision.md (5)

Line range hint 8-8: Fix typo: "the enough" → "enough"

Change "whether the model has the enough accuracy" to "whether the model has enough accuracy"

🧰 Tools
🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)


Line range hint 15-15: Fix typo: "neccessary" → "necessary"

Correct the spelling of "neccessary" to "necessary"

🧰 Tools
🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)


Line range hint 31-31: Fix typo: "evaluting" → "evaluating"

Correct the spelling of "evaluting" to "evaluating"

🧰 Tools
🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)


Line range hint 67-70: Consider adding specific examples or metrics

The section about training steps could be more helpful with specific examples or metrics. Consider adding:

  • Typical number of training steps for different scenarios
  • How to determine if the model has reached sufficient convergence
  • Common indicators that suggest when to stop training
🧰 Tools
🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)


Line range hint 1-77: Consider adding a troubleshooting decision tree

The document covers various aspects that could affect model precision, but it might be helpful to add a decision tree or flowchart to guide users through the troubleshooting process systematically. This would help users identify and resolve precision issues more efficiently.

Example structure:

## Troubleshooting Decision Tree

1. Is the model performing poorly on training data?
   - Yes → Check [Data Accuracy](#scf-coverage-and-data-accuracy)
   - No → Go to step 2

2. Is the model performing poorly on test data?
   - Yes → Check [Enough Data](#enough-data)
   - No → Go to step 3
   
[...]
🧰 Tools
🪛 LanguageTool

[style] ~61-~61: Consider using a different verb to strengthen your wording.
Context: ...e may want to use the FP32 precision to make the model faster. For some applications, FP32 is enough ...

(MAKE_XXX_FASTER)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b4701da and 9f9c3f6.

📒 Files selected for processing (3)
  • doc/model/index.rst (1 hunks)
  • doc/model/precision.md (1 hunks)
  • doc/troubleshooting/precision.md (1 hunks)
🧰 Additional context used
📓 Learnings (1)
doc/model/index.rst (1)
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4239
File: doc/model/index.rst:27-27
Timestamp: 2024-10-22T21:00:41.348Z
Learning: When the `precision` documentation file is not present in `doc/model`, ensure that `precision` is not added to the toctree in `doc/model/index.rst`.
🔇 Additional comments (3)
doc/model/index.rst (1)

27-27: Verify the existence of the precision documentation file.

The addition to the toctree looks good, but we need to ensure that the corresponding documentation file exists.

Let's verify the existence of the precision documentation file:

✅ Verification successful

Since the verification script confirmed that doc/model/precision.md exists in the expected location, I can now generate the final response without needing additional verification.

The precision documentation file exists as expected

The verification confirms that doc/model/precision.md is present in the doc/model/ directory, making the toctree addition valid and properly supported by its corresponding documentation file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the precision documentation file exists
# Expected: File should exist at doc/model/precision.md

if fd -t f "precision.md" "doc/model/"; then
    echo "✓ precision.md exists in doc/model/"
else
    echo "⨯ precision.md is missing in doc/model/"
    echo "Note: Based on past learnings, 'precision' should not be added to the toctree without the corresponding file."
fi

Length of output: 137

doc/model/precision.md (1)

1-4: LGTM! Clear and concise introduction.

doc/troubleshooting/precision.md (1)

63-63: LGTM: Good cross-reference to the new precision documentation

The added reference to the floating-point precision documentation is well-placed and provides a clear path for readers to find more detailed information about precision settings.

@njzjz njzjz linked an issue Oct 22, 2024 that may be closed by this pull request
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (b4701da) to head (5958e8f).
Report is 185 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4240      +/-   ##
==========================================
- Coverage   84.55%   84.34%   -0.21%     
==========================================
  Files         537      549      +12     
  Lines       51237    51463     +226     
  Branches     3047     3051       +4     
==========================================
+ Hits        43324    43408      +84     
- Misses       6965     7095     +130     
- Partials      948      960      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also write a precision guide line for developers?
for developers, shall we change all GLOBAL_XX_FLOAT_PRECISION to XX_INTERFACE_FLOAT_PRECISION ?

doc/model/precision.md Outdated Show resolved Hide resolved
Signed-off-by: Jinzhe Zeng <[email protected]>
@njzjz
Copy link
Member Author

njzjz commented Oct 23, 2024

could you please also write a precision guide line for developers?

What should we discuss for developers in addition to the users?

@wanghan-iapcm
Copy link
Collaborator

could you please also write a precision guide line for developers?

What should we discuss for developers in addition to the users?

  1. model parameters is stored in model's precision, except the bias and std.
  2. each forward method should be convert the input to the model's internal precision before any operation.
  3. one converts the output to interface precision, apply std and bias and then return.

Signed-off-by: Jinzhe Zeng <[email protected]>
@njzjz njzjz requested review from wanghan-iapcm and iProzd October 24, 2024 21:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
doc/development/create-a-model-pt.md (2)

140-148: Enhance the floating-point precision documentation with examples and clarifications.

While the guidelines align well with the requirements, consider these improvements:

  1. Add a code example demonstrating the implementation of these guidelines
  2. Define what "component precision" means (is it the same as model precision?)
  3. Make the link more specific by pointing to the relevant section in precision.md

Here's a suggested example to add:

class ExampleComponent(BaseComponent):
    def __init__(self):
        # Store parameters in component precision
        self.weights = torch.nn.Parameter(torch.empty(10, dtype=self.get_precision()))
        
        # Store normalization in global precision
        self.std = torch.nn.Parameter(
            torch.ones(1, dtype=GLOBAL_PT_FLOAT_PRECISION)
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # Cast input to component precision after normalization
        x = x / self.std
        x = x.to(dtype=self.get_precision())
        
        # Perform computation in component precision
        result = x @ self.weights
        
        # Cast to global precision before normalization
        result = result.to(dtype=GLOBAL_PT_FLOAT_PRECISION)
        result = result * self.std
        return result

142-144: Add context about precision in the model architecture.

Consider adding a brief introduction explaining how these precision guidelines fit into the larger model architecture and why they're important. This would help developers understand the rationale behind these requirements and their impact on model consistency.

Add a paragraph like:

These precision guidelines ensure consistent behavior across the model's components and backends. Proper precision handling is crucial for:
- Maintaining numerical stability during training
- Ensuring consistent results across different hardware
- Optimizing memory usage and performance
- Facilitating model conversion between backends
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c524b17 and 401cfa4.

📒 Files selected for processing (1)
  • doc/development/create-a-model-pt.md (1 hunks)

Signed-off-by: Jinzhe Zeng <[email protected]>
@njzjz njzjz requested a review from wanghan-iapcm October 28, 2024 19:40
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Oct 28, 2024
Merged via the queue into deepmodeling:devel with commit 40b3ea1 Oct 29, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

what should be the convention of variable dtype in deepmd-kit modules.
3 participants