From 05307795286f0669b2f05d34e003cdeb246c8e9d Mon Sep 17 00:00:00 2001 From: Tomer Kaftan Date: Thu, 21 Feb 2019 07:06:47 -0800 Subject: [PATCH] RFC: New tf.print (#14) * New tf.print proposal * Attempt to fix table of contents * Removed not-working TOC label * Minor updates to the doc. * Update tf.print to be accepted --- rfcs/20180824-tf-print-v2.md | 551 +++++++++++++++++++++++++++++++++++ 1 file changed, 551 insertions(+) create mode 100644 rfcs/20180824-tf-print-v2.md diff --git a/rfcs/20180824-tf-print-v2.md b/rfcs/20180824-tf-print-v2.md new file mode 100644 index 000000000..6645a6702 --- /dev/null +++ b/rfcs/20180824-tf-print-v2.md @@ -0,0 +1,551 @@ +# New tf.print + +Status | Accepted +:------------ | :--------------------------------- +**Author(s)** | Tomer Kaftan (Google) +**Sponsor** | Asim Shankar (Google) +**Updated** | 2018-08-24 + +## Background + +Printing is a core component of any language or system. From their first hello +world application to in-depth debugging of complex workloads, developers rely on +printing and logging as some of their most important tools. Unfortunately, +setting up printing of Tensors when building TensorFlow graphs doesn't align +with the natural usage of print primitives most programmers are used to. This +has led to various bugs, questions on online forums, and entire blog posts +explaining tf.Print (e.g. +[blog post 1](https://towardsdatascience.com/using-tf-print-in-tensorflow-aa26e1cff11e), +[blog post 2](http://www.heyuhang.com/blog/2018/01/31/the-secret-of-tf-dot-print-in-tensorflow/), +[Github question](https://github.com/tensorflow/tensorflow/issues/1988), +[Quora question](https://www.quora.com/How-does-the-tf-Print-statement-work-for-TensorFlow), +[Stack Overflow question](https://stackoverflow.com/questions/33633370/how-to-print-the-value-of-a-tensor-object-in-tensorflow), +...). Much of the confusion comes from the existing `tf.Print` operator being an +identity operator with a side effect of printing. This is a very non-standard +API for printing, and the operator graph must be carefully set up to ensure the +operator gets executed. This is at odds with printing/logging users are used to +from elsewhere, where a print method produces no outputs and immediately prints +the desired value. + +Eager execution promises a more interactive, easier to debug experience that +works with Python's `print` method. However, even with eager execution enabled, +there is need to wrap parts of execution and optimized graph functions +(currently `tf.contrib.eager.defun`). The Python `print` method placed in these +graph function will run at graph construction time, not graph execution time, +creating confusion. + +This doc proposes a new `tf.print` TensorFlow printing approach that is very +similar to the standard python `print` API whether or not executing eagerly. It +also provides long-requested functionality for both eager and session-based +execution, such as a more meaningful Tensor summarization, support for printing +nested data structures that contain tensors, and controllable logging levels. + +## Overview + +We introduce two methods to the public Python API: `tf.strings.format` and +`tf.print`. (We may have to use `tf.strings.Format` or `tf.strings.fmt` instead +of `tf.strings.format` if the built-in python `format` method causes issues with +the naming.) These are internally backed by two C++ operators: `StringFormat` +and `PrintV2`. The method headers & docstrings are as follows: + +### Python Public API Methods + +````python +@tf_export("print") +def print_v2(*inputs, **kwargs): + """Print the specified inputs. + Returns an operator that prints the specified inputs to a desired + output stream or logging level. The inputs may be dense or sparse Tensors, + primitive python objects, data structures that contain Tensors, and printable + python objects. Printed tensors will recursively show the first and last + `summarize` elements of each dimension. + + With eager execution enabled and/or inside a `tf.contrib.eager.defun` this + operator will automatically execute, and users only need to call `tf.print` + without using the return value. When constructing graphs outside of a + `tf.contrib.eager.defun`, one must either include the returned op + in the input to `session.run`, or use the operator as a control dependency for + executed ops by specifying `with tf.control_dependencies([print_op])`. + + @compatibility(python2) + In python 2.7, make sure to import the following: + `from __future__ import print_function` + @end_compatibility + + Example: + Single-input usage: + ```python + tf.enable_eager_execution() + tensor = tf.range(10) + tf.print(tensor, output_stream=sys.stderr) + ``` + (This prints "[0 1 2 ... 7 8 9]" to sys.stderr) + + Multi-input usage: + ```python + tf.enable_eager_execution() + tensor = tf.range(10) + tf.print("tensors:", tensor, {2: tensor * 2}, output_stream=sys.stdout) + ``` + (This prints "tensors: [0 1 2 ... 7 8 9] {2: [0 2 4 ... 14 16 18]}" to sys.stdout) + + Usage when constructing graphs: + ```python + sess = tf.Session() + with sess.as_default(): + tensor = tf.range(10) + print_op = tf.print("tensors:", tensor, {2: tensor * 2}, + output_stream=sys.stdout) + with tf.control_dependencies([print_op]): + doubled_tensor = tensor * 2 + sess.run(doubled_tensor) + + ``` + (This prints "tensors: [0 1 2 ... 7 8 9] {2: [0 2 4 ... 14 16 18]}" to + sys.stdout) + + Note: This op is only partially compatible with Jupyter notebooks and colabs. + Because it prints to the C++ standard out / standard error, this will go + in the notebook kernel's console output, not in the notebook cell output. + + Args: + *inputs: Positional arguments that are the inputs to print. Inputs in the + printed output will be separated by spaces. Inputs may be python + primitives, tensors, data structures such as dicts and lists that + may contain tensors (with the data structures possibly nested in + arbitrary ways), and printable python objects. + + output_stream: The output stream or logging level to print to. Defaults to + sys.stderr, but sys.stdout, tf.logging.info, tf.logging.warning, + tf.logging.error, and tf.logging.fatal are also supported. + + summarize: The first and last `summarize` elements within each dimension are + recursively printed per Tensor. If None, then the first 3 and last 3 + elements of each dimension are printed for each tensor. If set to -1, it + will print all elements of every tensor. + + name: A name for the operation (optional). + + Returns: + A print operator that prints the specified inputs in the specified output + stream or logging level. + + Raises: + ValueError: If an unsupported output stream is specified. + """ +```` + +````python +@tf_export("strings.format") +def string_format(template, inputs, placeholder="{}", summarize=3, name=None): + r"""Formats a string template using a list of tensors. + Formats a string template using a list of tensors, abbreviating tensors by + only printing the first and last `summarize` elements of each dimension + (recursively). If formatting only one tensor into a template, the tensor does + not have to be wrapped in a list. + + Example: + Formatting a single-tensor template: + ```python + sess = tf.Session() + with sess.as_default(): + tensor = tf.range(10) + formatted = tf.strings.format("tensor: %s, suffix", tensor) + out = sess.run(formatted) + expected = "tensor: [0 1 2 ... 7 8 9], suffix" + assert(out.decode() == expected) + ``` + + Formatting a multi-tensor template: + ```python + sess = tf.Session() + with sess.as_default(): + tensor_one = tf.reshape(tf.range(100), [10, 10]) + tensor_two = tf.range(10) + formatted = tf.strings.format("first: %s, second: %s, suffix", + (tensor_one, tensor_two)) + + out = sess.run(formatted) + expected = ("first: [[0 1 2 ... 7 8 9]\n" + " [10 11 12 ... 17 18 19]\n" + " [20 21 22 ... 27 28 29]\n" + " ...\n" + " [70 71 72 ... 77 78 79]\n" + " [80 81 82 ... 87 88 89]\n" + " [90 91 92 ... 97 98 99]], second: [0 1 2 ... 7 8 9], suffix") + assert(out.decode() == expected) + ``` + + Args: + template: A string template to format tensor values into. + + inputs: A list of `Tensor` objects, or a single Tensor. + The list of tensors to format into the template string. If a solitary + tensor is passed in, the input tensor will automatically be wrapped as a + list. + + placeholder: An optional `string`. Defaults to `"{}"`. + At each placeholder occurring in the template, a subsequent tensor + will be inserted. + + summarize: An optional `int`. Defaults to `3`. + When formatting the tensors, show the first and last `summarize` + entries of each tensor dimension (recursively). If set to -1, all + elements of the tensor will be shown. + + name: A name for the operation (optional). + + Returns: + A scalar `Tensor` of type `string`. + + Raises: + ValueError: if the number of placeholders does not match the number of + inputs. + """ +```` + +### C++ Ops & Implementation Overview + +These two python methods are backed by two new C++ operators: `StringFormat` and +`PrintV2`. + +StringFormat takes a template string attr, a placeholder string attr, a list of +tensor inputs, and nicely formats the tensor summarizations inside of the +template string where the placeholders are. It outputs a string scalar tensor. + +``` +REGISTER_OP("StringFormat") + .Input("inputs: T") + .Output("output: string") + .Attr("T: list(type) >= 0") + .Attr("template: string = '%s'") + .Attr("placeholder: string = '%s'") + .Attr("summarize: int = 3") + .SetShapeFn(...) +``` + +PrintV2 takes an input string scalar tensor, and a string constant specifying an +output stream/logging level, and produces no outputs. When it executes, it +prints the string scalar to the specified output stream / logging level. It is +marked as stateful to ensure it gets executed in graph functions without being +pruned, and to ensure prints get executed in program order in eager mode and +graph functions. + +``` +REGISTER_OP("PrintV2") + .Input("input: string") + .SetIsStateful() + .Attr("output_stream: string") + .SetShapeFn(...) +``` + +The `tf.strings.format` method calls directly into the StringFormat operator +with a bit of extra syntactic sugar to wrap a single tensor input as a list +automatically. + +The `tf.print` method maps python logging level strings or logging methods or +standard out/err streams to the appropriate output stream string constant. If +there is only one string scalar tensor as the input, it directly calls into +`PrintV2`. Otherwise, it: + +1. Uses the TensorFlow `nest` utilities to build a placeholder-less template + for the inputs to print and to extract a flattened list of tensors. + * Also detects sparse tensors and extracts their components / sets up the + template appropriately +1. Generates a placeholder that won't conflict with the template +1. Rebuilds the template with the placeholder inserted where the tensors should + go +1. Creates a StringFormat op to convert the tensor inputs into an appropriate + string scalar +1. Passes the output of the StringFormat op to a PrintV2 op (that it returns). + +## Major Design alternatives: + +### Device Locations + +The location at which tensors actually get printed (& where formatting of large +tensors happens) plays a major role in the user experience. + +In the current approach, printing and formatting will happen on whatever machine +is currently specified in the device scope when entering `tf.print` / +`tf.strings.format`. + +UPDATE: Following the design review we have decided to *not* specify cpu:0 by default. +Outdated: Because the new operators only have C++ kernels, we nest +`with tf.device('/CPU:0'):` inside of `tf.print` and `tf.strings.format` to +avoid crashes when the device scope is referring to a non-CPU device. + + +Alternatives to this are: + +* Don't automatically specify cpu:0 for tf.print and tf.strings.format + * This provides full user control over devices & which cpu device prints + (if they want different cpu devices to be printing different things). + * Will cause potentially hard-to-interpret errors if users try printing in + non-cpu device scopes as-is (w/o explicitly changing the device scope + before printing). +* Automatically specify a client device for print & format according to + context parameters + * Users may want all printing & logging to happen on a single client + device, without having to muck with device scopes around each `tf.print` + call. + * This would most closely match behavior programmers are used to with + printing & logging + * Lowers user flexibility if they want printing to happen on other devices + * Would somehow have to either automatically detect the client device, or + require users to specify it in configuration when starting TensorFlow + (which increases the required user effort). +* Wrap each input tensor to print in a separate `tf.strings.format` before + formatting tensors into the main template, then co-locate the formatting + device locations w/ the tensors. + * This would add extra formatting ops to execute, but it would have the + benefit of never transferring tensors over the network to print. It + would only ever transfer formatted strings. + * This would still have to force a cpu:0 device for formatting ops, just + on the worker machines that the tensors appear on + * If there are risky subtleties to ops.colocate_with and ops.device, they + could pose issues w/ this approach. + +### Functionality to Operator Breakdown + +We use two public python methods and two C++ ops (StringFormat & PrintV2). Some +alternatives to how we break down this functionality are: + +* Use a single c++ op / a single public api method for both formatting and + printing + * However, it's good to have a separation of functionality, and having one + op would cause issues if users want to format colocated with a giant + tensor, and print on a different machine. +* In the StringFormat op do not take a template and only format one tensor + nicely. Then, make tf.print build a separate format op for each extracted + input tensor, and construct the final printed string using a TensorFlow + string join/concat operator. + * This would encourage writing the `tf.print` op in a way that co-locates + tensor formatting with the tensor locations. (not necessary for doing + that though) + * However, it may be nice for users to be able to format using explicit + string templates instead of having to rely on the print method. +* Allow the tf.strings.format python method to support formatting python + objects & data structures (that may contain tensors) as opposed to just + formatting tensors. tf.print would then feed almost straight into + tf.strings.format. + * This would move a lot of the complexity into the tf.strings.format + method, and force the format method to have to re-generate a different + template for the StringFormat op than the user-input template. (seems + somewhat unnatural). + +### Update legacy `tf.Print` or Not + +We are specifically choosing to make no changes to the legacy `tf.Print`, and to +deprecate it for removal in TF 2.0. Although we could apply some of this +improved functionality to `tf.Print` (or even make `tf.Print` call into the new +print/format operators), it comes with many backwards & compatibility concerns. +This is especially the case if people were already relying on the exact print +format of `tf.Print`. We also want to encourage people to move over to the +newer, more natural print methods, so that we don't have to maintain two going +forward. + +## Caveats & Open User Experience Risks + +1. To use the lower-case name `tf.print` in python 2.7, users have to import + `from __future__ import print_function` + * Because Python 2 is on the way out, this might not be a major issue. + * Alternatively, we could provide an additional python2-safe alias. +1. Relative order of prints will not be guaranteed inside of session-based + graph mode (Ordering should be correct during eager execution / inside + `tf.contrib.eager.defun` graph functions). + * When session-based graphs are compiled/executed, execution order may be + changed / is not guaranteed +1. When using the various logging levels, the logged line will capture the + PrintV2 Op Kernel call-site in the C++ code, not the python line at which + `tf.print` actually appeared. + * We could solve this by capturing the python call-site in the `tf.print` + when the various `tf.logging` streams are used, and include it as part + of the template passed to the format string. +1. In Colab/Jupyter notebooks, printing to the C++/OS standard out & standard + error goes to the notebook kernel's console output, not the notebook cell + output. (This is an issue w/ the legacy tf.Print as well) + * This is a known issue with python notebooks. + * Requires complex capturing logic to send output to notebook cells, and + the solutions are often not totally portable across operating systems / + C++ runtimes. + * If we want to do something about this, a possible approach could be to + have the PrintV2 kernel execute a python print if we detect a + jupyter/colab notebook environment and that the current device has a + python runtime, rather than using the C++ logging/printing methods. + * Alternatively, we could provide utilities to capture the C++ + stdout/stderr in Jupyter notebook outputs as a part of TensorFlow. + * We would have to be very careful w/ device placement in distributed + multi-machine/multi-device settings to ensure that the print device is + the notebook kernel CPU. +1. Users might get too comfortable w/ support for python data structure + printing, then run into various nest utility quirks like OrderedDicts being + re-ordered, or tensors not being extracted correctly from sets. (See + Supported Input Types in Extra Details for more info) +1. In graph functions & session-based run mode the python values printed won't + change w/ subsequent executions of that PrintV2 (See Supported Input Types + in Extra Details for more info) +1. Because printing & formatting follows the current device context, printing + may happen on a different machine than users intend, and then they have a + hard time finding the printed data. Or, it may unintentionally transfer a + large tensor over the network before formatting it. + * Developers may be more inclined to have printing/logging happening in + their client as opposed to on various machines in a distributed system. +1. To format & print on different devices, users must explicitly call both + `tf.strings.format` and `tf.print` +1. Someone using normal python print in eager mode python notebooks or repl + might get used to it working, and not realize that there's a separate + `tf.print`. They may then try using the standard python `print` in a Graph Function and have their + printing not work. Alternatively, in a colab they might switch to tf.print then suddenly be + confused why nothing is printing. + * It may make sense for eager graph functions to automatically hook the + python print method to replace it with calls to `tf.Print`. This would + still be problematic in python notebooks though (for above reasons). +1. Various user code could still end up calling EagerTensor `__str__` + unintentionally, which calls .numpy() on the tensor then prints it. This + will copy the full tensor over then format it w/ numpy, which will have + similar but slightly mismatched formatting from tf.print. + * For consistency, maybe make `__str__` and `__repr__` in EagerTensors use + the tf.strings.format op on whatever device the tensor is on? This could + avoid transferring giant tensors over the network unintentionally. +1. The `tf.print` functionality to deal w/ arbitrary nested structures will be + python-specific. Other language bindings will have to manually chain format + & print. + +## Extra Details + +### Supported Input Types + +Any Python object can be passed as input to `tf.print`, and it will be printed. + +Tensors will automatically be extracted from python lists, dicts, and other data +structures supported by the TensorFlow `nest` utilities. These extracted tensors +will print with their summarization format whenever `tf.print` executes. +TensorFlow variables also work correctly inside of `tf.print` (even when nested +inside data structures). + +However, there are the following caveats! + +* In session mode and inside graph functions (and possibly in eager loops as + well), the printed value of non-tensor python objects will always match what + it was the very first time that a given `tf.print` is executed. It will not + update when that specific `tf.print` is called again. +* The TensorFlow nest utilities do not support *all* standard python data + structures. For example, tensors nested inside of python sets will not be + properly and their values will not be shown/formatted when the `tf.print` + kernels run. +* Tensors also can't be extracted from arbitrary python objects, even ones + that define a __str__ or __repr__ method that tries to print the contained + tensor with the normal python `str`/`print` methods. + +### Summarization Format + +The strategy StringFormat uses to format tensors is heavily inspired by numpy. +The first and last `summarize` entries in each tensor dimension will recursively +be printed, separated by a separator string (in this case `...`). + +Each dimension is bordered by open and close square brackets `[` and `]`. The +inner-most dimension will separate entries using just spaces, and other +dimensions will separate entries using new-lines (with the number of new-lines +matching the number of nested dimensions the current dimension has). Spaces will +be used to match indentation after new-lines. + +Example of how a 2d tensor may be printed: + +``` +[[0 1 2 ... 7 8 9] + [10 11 12 ... 17 18 19] + [20 21 22 ... 27 28 29] + ... + [70 71 72 ... 77 78 79] + [80 81 82 ... 87 88 89] + [90 91 92 ... 97 98 99]] + +``` + +Example of how a 3d tensor may be printed: + +``` +[[[0 1] + [2 3]] + + [[4 5] + [6 7]]] +``` + +### Sparse Tensor Printing + +The `tf.print` method will detect when SparseTensors are provided as inputs or +are nested inside of lists/dicts, and convert them to the template +appropriately. The following is an eager example: + +``` +ind = [[0, 0], [1, 0], [1, 3], [4, 1], [1, 4], [3, 2], [3, 3]] +val = [0, 10, 13, 4, 14, 32, 33] +shape = [5, 6] +sparse = tf.SparseTensor( + tf.constant(ind, dtypes.int64), + tf.constant(val, dtypes.int64), + tf.constant(shape, dtypes.int64)) + +tf.print(sparse) +``` + +This will print: + +``` +SparseTensor(indices=[[0 0] + [1 0] + [1 3] + ... + [1 4] + [3 2] + [3 3]], values=[0 10 13 ... 14 32 33], shape=[5 6]) +``` + +### Output Streams & Logging Levels + +The output streams passed into `tf.print` are converted to string constants +supported by the C++ PrintV2 op as follows: + +```python +{ + sys.stdout: "stdout", + sys.stderr: "stderr", + tf.logging.INFO: "log(info)", + tf.logging.info: "log(info)", + tf.logging.WARN: "log(warning)", + tf.logging.warning: "log(warning)", + tf.logging.warn: "log(warning)", + tf.logging.ERROR: "log(error)", + tf.logging.error: "log(error)", + tf.logging.FATAL: "log(fatal)", + tf.logging.fatal: "log(fatal)", + } + +``` + +### Device Locations + +The new C++ operators only have CPU kernels. So, `tf.print` and +`tf.strings.format` will place the operators wherever the current device scope +is set to, but also nesting `with tf.device('/CPU:0'):` inside of the scope. +This leaves it on the current specified device machine/worker, but avoids +crashes when the device context is a non-CPU device such as a GPU. It may cause +issues if users want to print on different CPU devices, or if the current worker +machine has no available CPUs. + +## Minor Design Alternatives / Possible Extra Features + +* Print to single line (as opposed to multi-line output), or make it + configurable +* Make print a context manager for extra syntactic sugar in session-based + graph mode. +* Change the default output stream / logging level choices +* Support writing to arbitrary file descriptors? +* Don't have defaults for template or placeholder for the c++ StringFormat op +* Support a debug flag to include shape & device & type in the printed format +* In format op: make template and placeholder inputs as opposed to attributes +* In format op: Support more complex absl-type templates (e.g. positional + substitutes) +