# New PyTorch Logging System

## **Summary**
Create a message logging system for PyTorch with the following requirements:

### Consistency 

* The C++ and Python APIs should match each other as closely as possible.

* All errors, warnings, and other messages generated by PyTorch should be
  emitted using the the logging system API.


### Severity level and message classes

* Offer different message severity levels, including at least the following:

  - **Info**: Emits a message without creating a warning or error. By default,
    this gets printed to stdout.

  - **Warning**: Emits a message as a warning. If a warning is never caught,
    it gets printed to stderr by default.

  - **Error**: Emits a message as an error. If an error is never caught, the
    application will print the error to stderr and quit.

* Offer different message classes under each severity level.

  - Every message is emitted as an instance of a message class.

  - Each message class has both a C++ class and a Python class, and when a
    C++ message is propagated to Python, it is converted to its corresponding
    Python class.

  - Whenever it makes sense, the Python class should be one of the builtin
    Python error/warning classes. For instance, currently in PyTorch, the C++
    error class `c10::Error` gets converted to the Python `RuntimeError` class.

* Adding new message classes and severity levels should be easy

### Configurability and filtering

* Ability to turn warnings into errors. This is already possible with the
  Python `warnings` module filter, but the PyTorch docs should mention it and
  we should probably have unit tests for it.
  See [documentation](https://docs.python.org/3/library/warnings.html#the-warnings-filter)

* Settings to disable specific **Warning** or **Info** classes

  - Disabling warnings in Python is already possible with the `warnings`
    module filter. See [documentation](https://docs.python.org/3/library/warnings.html#the-warnings-filter).
    There is no similar system in C++ at the moment, and building one is probably
    low priority.

  - Filtering out **Info** messages would be nice to have because excessive
    printouts can degrade the user experience. Related to issue
    [#68768](https://github.com/pytorch/pytorch/issues/68768)

* Settings to enable/disable emitting duplicate messages generated by multiple
  `torch.distributed` ranks. Related to issue
  [#68768](https://github.com/pytorch/pytorch/issues/68768)

* Ability to make a particular **Warning** or **Info** message only emit once.
  Warn-once should be the default for most warnings.

  - Currently `TORCH_WARN_ONCE` does this in C++, but there is no Python
    equivalent

  - Offer a filter to override warn- and log-once, so that they always emit.
    The filter could work similarly to the Python `warnings` filter. This is
    a low priority feature.

  - TODO: `torch.set_warn_always()` currently controls some warnings (maybe
    only the ones from C++? I need to find out for sure.)

* Settings can be changed from Python, C++, or environment variables

  - Filtering warnings with Python command line arguments should
    remain possible. For instance, the following turns a `DeprecationWarning`
    into an error: `python -W error::DeprecationWarning your_script.py`

### Compatibility

* Should integrate with Meta's internal logging system, which is
  [glog](https://github.com/google/glog)

  - TODO: What are all the requirements that define "integrating with glog"

* Must be OSS-friendly, so it shouldn't require libraries (like glog) which may
  cause incompatibility issues for projects that use PyTorch

### Other requirements

* Continue using warning/error APIs and message classes that currently exist in
  PyTorch wherever possible. For instance, `TORCH_CHECK`, `TORCH_WARN`, and
  `TORCH_WARN_ONCE` should continue to be used in C++

* TODO: Determine the requirements for the following concepts:

  - Log files? (default behavior and any settings)


## **Motivation**

Original issue: [link](https://github.com/pytorch/pytorch/issues/72948)

Currently, it is challenging for PyTorch developers to provide messages that
act consistently between Python and C++.

It is also challenging for PyTorch users to manage the messages that PyTorch
emits. For instance, if a PyTorch user happens to be calling PyTorch functions
that emit lots of messages, it can be difficult for them to filter out those
messages so that their project's users don't get bombarded with warnings and
printouts that they don't need to see.


## **Proposed Implementation**

### Message classes

At least the following message classes should be available. The name of the
C++ class appears first in all the listed entries below, with the Python class
to the right of it.

Each severity level has a default class. All other classes within a given
severity level inherit from the corresponding default class.

NOTE: Most of the error classes below already exist in PyTorch. However,
info classes do not currently exist. Also, only one type of warning currently
exists in C++, and it is not implemented as a C++ class that can be inherited
(as far as I understand).

#### Error message classes:
  
* **`c10::Error`** - Python `RuntimeError`
  - Default error class. Other error classes inherit from it.

* **`c10::IndexError`** - Python `IndexError`
  - Emitted when attempting to access an element that is not present in
    a list-like object.

* **`c10::ValueError`** - Python `ValueError`
  - Emitted when a function receives an argument with correct type but
    incorrect value.

* **`c10::TypeError`** - Python `TypeError`
  - Emitted when a function receives an argument with incorrect type.

* **`c10:NotImplementedError`** - Python `NotImplementedError`
  - Emitted when a feature that is not implemented is called.

* **`c10::LinAlgError`** - Python `torch.linalg.LinAlgError`
  - Emitted from the `torch.linalg` module when there is a numerical error.

* **`c10::NondeterministicError`** - Python `torch.NondeterministicError`
  - Emitted when `torch.use_deterministic_algorithms(True)` and
    `torch.set_deterministic_debug_mode('error')` are set, and a
    nondeterministic operation is called.
  

#### Warning message classes:

* **`c10::UserWarning`** - Python `UserWarning`
  - Default warning class. Other warning classes inherit from it.

* **`c10::BetaWarning`** - Python `torch.BetaWarning`
  - Emitted when a beta feature is called. See
    [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/).
  - TODO: This warning type might not be very useful--find out if we really
    want this

* **`c10::PrototypeWarning`** - Python `torch.PrototypeWarning`
  - Emitted when a prototype feature is called. See
    [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/).
  - TODO: This warning type might not be very useful--find out if we really
    want this

* **`c10::NondeterministicWarning`** - Python `torch.NondeterministicWarning`
  - Emitted when `torch.use_deterministic_algorithms(True)` and
    `torch.set_deterministic_debug_mode('warn')` are set, and a
    nondeterministic operation is called.

* **`c10::DeprecationWarning`** - Python `DeprecationWarning`
  - Emitted when a deprecated function is called.
  - TODO: `DeprecationWarning`s are ignored by default in Python, so we may
    actually want to use a different Python class for this.


#### Info message classes:
  
* **`c10::Info`** - Python `torch.Info`
  - Default info class. Other info classes inherit from it.


### Message APIs

In order to emit messages, developers can use the APIs defined in this section.

These APIs all have a variable length argument list, `...` in C++ and `*args`
in Python. When a message is emitted, these arguments are concatenated into
a string, and the string becomes the body of the message.

In C++, the arguments in `...` must all have the `std::ostream& operator<<`
function defined so that they can be concatenated.

In Python, each element in `*args` must either have a `__str__` function or it
must be a callable that, when called, produces another object that has
a `__str__` fuction. Providing the body of a message as a callable can provide
better performance in cases where the message would not be emitted, as in
`torch.check(True, lambda: expensive_function())` if `cond == True`, since the
`expensive_function()` would not be called in that case.


#### Error APIs

The APIs for raising errors all check a boolean condition, the `cond` argument
in the following signatures, and throw an error if that condition is false.

The error APIs are listed below, with the C++ signature on the left and the
corresponding Python signature on the right.

**`TORCH_CHECK(cond, ...)`** - `torch.check(cond, *args)`
  - C++ error: `c10::Error`
  - Python error: `RuntimeError`

**`TORCH_CHECK_INDEX(cond, ...)`** - `torch.check_index(cond, *args)`
  - C++ error: `c10::IndexError`
  - Python error: `IndexError`

**`TORCH_CHECK_VALUE(cond, ...)`** - `torch.check_value(cond, *args)`
  - C++ error: `c10::ValueError`
  - Python error: `IndexError`

**`TORCH_CHECK_TYPE(cond, ...)`** - `torch.check_type(cond, *args)`
  - C++ error: `c10::TypeError`
  - Python error: `TypeError`

**`TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)`** - `torch.check_not_implemented(cond, *args)`
  - C++ error: `c10::NotImplementedError`
  - Python error: `NotImplementedError`

**`TORCH_CHECK_WITH(error_t, cond, ...)`** - `torch.check_with(error_type, cond, *args)`
  - C++ error: Specified by `error_t` argument
  - Python error: Specified by `error_type` argument


#### Warning APIs

**`TORCH_WARN(...)`** - `torch.warn(*args)`
  - C++ warning: `c10::UserWarning`
  - Python warning: `UserWarning`

**`TORCH_WARN_ONCE(...)`** - `torch.warn_once(*args)`
  - C++ warning: `c10::UserWarning`
  - Python warning: `UserWarning`
  - For a given callsite, the warning is emitted only upon the first time it is
    called.

**`TORCH_WARN_DEPRECATION(...)`** - `torch.warn_deprecation(*args)`
  - C++ warning: `c10::DeprecationWarning`
  - Python warning: `UserWarning`

**`TORCH_WARN_DEPRECATION_ONCE(...)`** - `torch.warn_deprecation_once(*args)`
  - C++ warning: `c10::DeprecationWarning`
  - Python warning: `DeprecationWarning`
  - For a given callsite, the warning is emitted only upon the first time it is
    called.

**`TORCH_WARN_WITH(warning_t, ...)`** - `torch.warn_with(warning_type, ...)`
  - C++ warning: Specified by `warning_t` argument
  - Python warning: Specified by `warning_type` argument

**`TORCH_WARN_ONCE_WITH(warning_t, ...)`** - `torch.warn_with(warning_type, ...)`
  - C++ warning: Specified by `warning_t` argument
  - Python warning: Specified by `warning_type` argument
  - For a given callsite, the warning is emitted only upon the first time it is
    called.

TODO: In C++, `TORCH_WARN_ONCE` is implemented as a macro that defines a local
static variable to track whether the warning has been emitted from each
callsite. It is not possible to implement it this way in Python, so need to
think of some other way to do it. Of course the Python `warnings` module's
[`"default"` filter](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
prevents duplicate warnings from being emitted, but it acts a little
differently--if two warning messages emitted from the same location differ even
slightly (for instance, if the value of some variable is included in the
message and that value differs between two different `warnings.warn` calls),
then both warnings are emitted. `TORCH_WARN_ONCE` does not check whether
messages differ. But we could probably implement `torch.warn_once` in a similar
way to how the `warnings` module filter is implemented.


#### Info APIs

Just like the error and warning APIs, the info APIs each have a variable length
argument list, `...` in C++ and `*args` in Python. These arguments are
concatenated into the info message.

**`TORCH_LOG_INFO(...)`** - `torch.log_info(*args)`
  - C++ info class: `c10::Info`
  - Python warning: `torch.Info`
  - TODO: Is there a better name than `log_info`? I didn't want to call it
    `torch.info`, because
    [`numpy.info`](https://numpy.org/doc/stable/reference/generated/numpy.info.html)
    has a completely different functionality. And obviously
    [`torch.log`](https://pytorch.org/docs/stable/generated/torch.log.html?highlight=torch%20log#torch.log)
    is already taken.

**`TORCH_LOG_INFO_WITH(info_t, ...)`** - `torch.log_info_with(info_type, *args)`
  - C++ info class: Specified by `info_t` argument
  - Python info class: Specified by `info_type` argument


### Multi-process messaging APIs

Currently, when running subprocesses that use PyTorch, some messages are
emitted by every running subprocess. See
[issue #68768](https://github.com/pytorch/pytorch/issues/68768) for specific
examples.  Avoiding emitting duplicate messages from each subprocess by default
would give a better user experience.

In issue #68768, the duplicate messages related to `cpp_extension.load` can be
modified to only be emitted by subprocess rank 0, simply by checking the node's
rank first. For instance, where there is a `warnings.warn(...)`, call we can
replace with:

```python
if rank == 0:
    warnings.warn(...)
```

This successfully avoids duplicate warnings. A few concrete examples can be
seen in [this draft PR](https://github.com/pytorch/pytorch/pull/79288).

However, implementing the duplicate filter like this is not ideal. It would be
better to have dedicated message system API calls for this. In the case of
warnings, the following signature could be used:

**`torch.warn_rank(my_rank, *args, warn_rank=0)`**
  * Args:
    - `my_rank` - Rank of the subprocess calling this function
    - `args` - Warning message
    - `warn_rank` - Rank that should emit the message
  * The warning is only emitted if `my_rank == warn_rank`

TODO: Add APIs for the rest of the message classes, like
`torch.log_info_rank()`, etc.

TODO: There should also be a global setting to enable emitting the duplicates.
`torch.warn_rank` could check the setting, and if it's turned on, then it would
emit the warning for all ranks.

TODO: Should we have a `TOCH_WARN_RANK` (and others) in C++ as well? Is there
an existing use case for it?


# PyTorch's current messaging API

The rest of this document contains details about the current messaging API in
PyTorch. This is included to give better context about what will change and
what will stay the same in the new messaging system.

At the moment, PyTorch has some APIs in place to make a lot of aspects of
message logging easy, from the perspective of a developer working on PyTorch.
Messages can be either printouts, warnings, or errors.

Errors are created with the standard `raise` statement in Python
([documentation](https://docs.python.org/3/tutorial/errors.html#raising-exceptions)).
In C++, PyTorch offers macros for creating errors (which are listed later in
this document). When a C++ function propagates to Python, any errors that were
generated get converted to Python errors.

Warnings are created with `warnings.warn` in Python
([documentation](https://docs.python.org/3/library/warnings.html)). In C++,
PyTorch offers macros for creating warnings (which are listed later in this
document). When a C++ function propagates to Python, any warnings that were
generated get converted to Python warnings.

Printouts (or what is called "Info" severity messages in the new system) are
created with just `print` in Python and `std::cout` in C++.

PyTorch's C++ warning/error macros are declared in
[`c10/util/Exception.h`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h).

## PyTorch C++ Errors

In C++, there are several different types of errors that can be used, but
PyTorch developers typically don't deal with these error classes directly.
Instead, they use macros that offer a concise interface for raising different
error classes.

### C++ error macros

Each of the error macros evaluate a boolean conditional expression, `cond`. If
the condition is false, the error is raised, and whatever extra arguments are
in `...` get concatenated into the error message with `operator<<`.

| Macro                                    | C++ Error class                |
| ---------------------------------------- | ------------------------------ |
| `TORCH_CHECK(cond, ...)`                 | `c10::Error`                   |
| `TORCH_CHECK_WITH(error_t, cond, ...)`   | caller specifies `error_t` arg |
| `TORCH_CHECK_LINALG(cond, ...)`          | `c10::LinAlgError`             |
| `TORCH_CHECK_INDEX(cond, ...)`           | `c10::IndexError`              |
| `TORCH_CHECK_VALUE(cond, ...)`           | `c10::ValueError`              |
| `TORCH_CHECK_TYPE(cond, ...)`            | `c10::TypeError`               |
| `TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)` | `c10::NotImplementedError`     |

There is some documentation on error macros [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L344-L362)

The reason why C++ preprocessor macros are used, rather than function calls, is
to ensure that the compiler can optimize for the `cond == true` branch. In
other words, if an error does not get raised, overhead is minimized.

### C++ error classes

The primary error class in C++ is `c10::Error`. Documentation and declaration
are
[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L21-L28).
`c10::Error` is a subclass of `std::exception`.

There are other error classes which are child classes of `c10::Error`, defined
[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L195-L236).

When these errors propagate to Python, they are each converted to a different
Python error class:

| C++ error class                 | Python error class         |
| ------------------------------- | -------------------------- |
| `std::exception`                | `RuntimeError`             |
| `c10::Error`                    | `RuntimeError`             |
| `c10::IndexError`               | `IndexError`               |
| `c10::ValueError`               | `ValueError`               |
| `c10::TypeError`                | `TypeError`                |
| `c10::NotImplementedError`      | `NotImplementedError`      |
| `c10::EnforceFiniteError`       | `ExitException`            |
| `c10::OnnxfiBackendSystemError` | `ExitException`            |
| `c10::LinAlgError`              | `torch.linalg.LinAlgError` |


## PyTorch C++ Warnings

When warnings propagate from C++ to Python, they are converted to a Python
`UserWarning`. Whatever is in `...` will get concatenated into the warning
message using `operator<<`.

* `TORCH_WARN(...)`
  - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L515-L530)

* `TORCH_WARN_ONCE(...)`
  - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L557-L562)
  - This macro only generates a warning the first time it is encountered during
    run time.


## Implementation details

### C++ to Python Error Translation

`c10::Error` and its subclasses are translated into their corresponding Python
errors [in `CATCH_CORE_ERRORS`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L54-L100).

However, not all of the `c10::Error` subclasses in the table above appear here,
which could just be an oversight.

`CATCH_CORE_ERRORS` is included within the `END_HANDLE_TH_ERRORS` macro that
most Python-bound C++ functions use for handling errors. For instance,
`THPVariable__is_view` uses the error handling macro
[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/tools/autograd/templates/python_variable_methods.cpp#L76).
There is also a similar `END_HANDLE_TH_ERRORS_PYBIND` macro that is used for
pybind-based bindings.


#### `torch::PyTorchError`

There's also an extra error class in `CATCH_CORE_ERRORS`,
`torch::PyTorchError`. I'm not sure yet why it exists and how it differs from
`c10::Error`. `torch::PyTorchError` has several overloads:

* `torch::IndexError`
* `torch::TypeError`
* `torch::ValueError`
* `torch::NotImplementedError`
* `torch::AttributeError`
* `torch::LinAlgError`


### C++ to Python Warning Translation

The conversion of warnings from C++ to Python is described [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L25-L48)


## Misc Notes

[PyTorch Developer Podcast - Python exceptions](https://pytorch-dev-podcast.simplecast.com/episodes/python-exceptions)
explains how C++ errors/warnings are converted to Python. TODO: listen to it
again and take notes.