Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31415: Add -X importtime option. #3490

Merged
merged 9 commits into from
Oct 3, 2017
Merged

Conversation

methane
Copy link
Member

@methane methane commented Sep 11, 2017

It shows show import time of each module.
It's useful for optimizing startup time.

Sample (full version):

$ ./python -X importtime -c 'import traceback'
...
import time:         - enum 6922 us (self 6922 us)
import time:           - _sre 227 us (self 227 us)
import time:             - sre_constants 1018 us (self 1018 us)
import time:           - sre_parse 2004 us (self 987 us)
import time:         - sre_compile 3218 us (self 988 us)
import time:         - _locale 416 us (self 416 us)
import time:         - copyreg 1194 us (self 1194 us)
import time:       - re 16589 us (self 4842 us)
import time:       - token 548 us (self 548 us)
import time:     - tokenize 21429 us (self 4293 us)
import time:   - linecache 29349 us (self 533 us)
import time: - traceback 37196 us (self 896 us)

https://bugs.python.org/issue31415

It shows show import time of each module.
It's useful for optimizing startup time.
@brettcannon
Copy link
Member

This should probably be added to the equivalent code in importlib as some other interpreters use that code as their direct import implementation.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea! First review.

@@ -407,6 +407,8 @@ Miscellaneous options
* ``-X showalloccount`` to output the total count of allocated objects for
each type when the program finishes. This only works when Python was built with
``COUNT_ALLOCS`` defined.
* ``-X importprofile`` to show how long each import takes. Nested imports are
indented. It can be used to optimize application's startup time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, maybe rename the option to "-X importime"? I don't know.

Maybe explain that the displayed time is the cumulative time, including import time of sub-imports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed importprofile to importtime.
And self time is shown now, as Serhiy's suggestion.

Python/import.c Outdated
fprintf(stderr, "%*s- %s %ld [us]\n",
import_level*2, "",
PyUnicode_AsUTF8(abs_name),
(long)(t2 - t1) / 1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the ms unit if us >= 10 000? It would make the output easier to read, no?

I don't understand the "123 [us]" format. Why not "123 us"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for "123 us" too, or "[123 us]".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Square brackets is used to make it clear it's not variable (123 * u * s), but physical unit (123 in μs).
In English, it's rarely used, only when it can be confusing.

In Japan, some community use "use square bracket always for physical unit" habit.
And I studied in such college.
I just know it's local habit and abuse in general!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the ms unit if us >= 10 000? It would make the output easier to read, no?

I prefer fixed unit. It makes post analysis (e.g. generate Flame Graph) easy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @methane. Mixed units are confusing when values are meant to be compared.

Python/import.c Outdated

if (import_profile) {
import_level--;
t2 = _PyTime_GetMonotonicClock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyTime_t has no unit. You cannot use directly t2-t1.

I suggest you to use _PyTime_AsMicroseconds(t2 - t1, _PyTime_ROUND_CEILING). Moreover, it will round "correctly" ;-) (towards +inf, rather than towards zero).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Python/import.c Outdated
if (import_profile) {
import_level--;
t2 = _PyTime_GetMonotonicClock();
fprintf(stderr, "%*s- %s %ld [us]\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your sample, the output looks nice. But imports can emit warnings, importlib can log extrat messages (try python3 -v), etc. It would help parsing to prefix messages. For example: "import_time: (...)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated format, and output example in pull request comment.

@methane
Copy link
Member Author

methane commented Sep 24, 2017

This should probably be added to the equivalent code in importlib as some other interpreters use that code as their direct import implementation.

I don't know about importlib much. Do you mean importlib/_bootstrap.py?
Can I use time.perf_counter() and sys._xoptions in it?

Anyway, I prefer C implementation because I don't want to worry about performance overhead of this feature.

@vstinner
Copy link
Member

Crazy idea: would it be possible to display also a timestamp related to Python startup time? Get monotonic time at startup, then compute the delta.

[ 0.000123] import sys: 23 us
[ 0.000250 ] import os: 124 ms

@methane methane changed the title bpo-31415: Add -X importprofile option. bpo-31415: Add -X importtime option. Sep 24, 2017
@methane
Copy link
Member Author

methane commented Sep 24, 2017

Crazy idea: would it be possible to display also a timestamp related to Python startup time? Get monotonic time at startup, then compute the delta.

It is possible, but is it worth enough to increase output size?

@pitrou
Copy link
Member

pitrou commented Sep 24, 2017

It is possible, but is it worth enough to increase output size?

At least it would be interesting to know the impact of interpreter initialization excluding stdlib imports.

@serhiy-storchaka serhiy-storchaka added skip news type-bug An unexpected behavior, bug, or error and removed skip news type-bug An unexpected behavior, bug, or error labels Sep 24, 2017
@vstinner
Copy link
Member

It is possible, but is it worth enough to increase output size?
At least it would be interesting to know the impact of interpreter initialization excluding stdlib imports.

Yeah, I expect that Python startup is made of many things, not only imports. I expect that timestamps would help to understand what happens.

@methane
Copy link
Member Author

methane commented Sep 25, 2017

Travis failed because suspicious check. "Inline literal start-string without end-string."
Currently, I wrote "It can be used to optimize application's startup time."
How should I fix it in correct English?

  • Add it to susp-ignored.csv
  • "application startup time"
  • "startup time of application"
  • Remove or rewrite this sentence completely.

@methane
Copy link
Member Author

methane commented Sep 25, 2017

It is possible, but is it worth enough to increase output size?

At least it would be interesting to know the impact of interpreter initialization excluding stdlib imports.

Yeah, I expect that Python startup is made of many things, not only imports. I expect that timestamps would help to understand what happens.

OK. Then, adding t0 = _PyTime_GetMonotonicClock() in this function is bad idea.
There are many things done before first import. (e.g. Many PyType_Read() call for builtin types)

I think _PyTime_Init() is better place.

cpython/Python/pytime.c

Lines 791 to 793 in b1d1c42

/* ensure that the operating system provides a monotonic clock */
if (_PyTime_GetMonotonicClockWithInfo(&t, NULL) < 0)
return -1;

How do you think adding private API like this?

// Returns time after _PyTime_Init() in micro seconds.
long long _PyTime_MicrosFromStart(void);

// usage
fprintf(stderr, "[%12lld us] some message from interpreter\n",
        _PyTime_MicrosFromStart());

@methane
Copy link
Member Author

methane commented Sep 25, 2017

After considering it about several minutes, I think it should be added as separated issue.
It can be traces like

  • "start importing xxx"
  • "complete importing xxx"
  • "start initexternalimport()", "start initfsencoding()",
  • "start initsite()", "end initsite()", ...

Or, perhaps, can we use DTrace or other existing tools for it?

@vstinner
Copy link
Member

After considering it about several minutes, I think it should be added as separated issue.
It can be traces like "start importing xxx" "complete importing xxx"

I modified your PR to play with it. I added start/complete logs, and IMHO the indented output is less surprising. Rather that hacking your PR, I created a new one, just to show my proposed format: PR #3765.

Example:

haypo@selma$ ./python -X importtime -c pass
[0.007768] import time: import encodings
[0.009267] import time:   import codecs
[0.010331] import time:     import _codecs
[0.010565] import time:     import _codecs took 235 us
[0.011226] import time:   import codecs took 1725 us (cumulative: 1960 us)
[0.011294] import time:   import encodings.aliases
[0.012286] import time:   import encodings.aliases took 992 us
[0.012433] import time: import encodings took 1715 us (cumulative: 4666 us)
[0.012487] import time: import encodings.utf_8
[0.013126] import time: import encodings.utf_8 took 640 us
[0.013188] import time: import _signal
[0.013504] import time: import _signal took 317 us
[0.013553] import time: import encodings.latin_1
[0.014288] import time: import encodings.latin_1 took 736 us
[0.014318] import time: import io
[0.014789] import time:   import abc
[0.015366] import time:     import _weakrefset
[0.016324] import time:     import _weakrefset took 958 us
[0.016727] import time:   import abc took 981 us (cumulative: 1939 us)
[0.017637] import time: import io took 1381 us (cumulative: 3320 us)
[0.017888] import time: import site
[0.018758] import time:   import os
[0.019818] import time:     import errno
[0.020151] import time:     import errno took 333 us
[0.020164] import time:     import stat
[0.020890] import time:       import _stat
[0.021161] import time:       import _stat took 271 us
[0.021284] import time:     import stat took 849 us (cumulative: 1120 us)
[0.021471] import time:     import posixpath
[0.022126] import time:       import genericpath
[0.022702] import time:       import genericpath took 576 us
[0.022798] import time:     import posixpath took 752 us (cumulative: 1328 us)
[0.023306] import time:     import _collections_abc
[0.029017] import time:     import _collections_abc took 5712 us
[0.029551] import time:   import os took 2302 us (cumulative: 10793 us)
[0.029584] import time:   import _sitebuiltins
[0.030299] import time:   import _sitebuiltins took 716 us
[0.032091] import time:   import sitecustomize
[0.033053] import time:   import sitecustomize took 962 us
[0.033086] import time:   import usercustomize
[0.033740] import time:   import usercustomize took 654 us
[0.033820] import time: import site took 2811 us (cumulative: 15933 us)

Don't look at the exact implementation. I mostly created a PR to discuss the logs format.

@methane
Copy link
Member Author

methane commented Sep 26, 2017

Yeah, I expect that Python startup is made of many things, not only imports. I expect that timestamps would help to understand what happens.

But who cares?
I think this option is for application / library developers. Most of them are working on very large import tree, and it takes more than 100ms.

Even if few developers (including me and you) cares about it, some logs other than imports should be
logged in consistent format with timestamp.
That's why I don't think it's good idea for this time.

@ncoghlan
Copy link
Contributor

I agree with @methane on the timestamps question: while I do think a "time since the timing infrastructure was initialised" could be a useful thing to consistently add to internal log messages, I don't think it makes sense to consider it as part of this PR.

@ncoghlan ncoghlan closed this Sep 27, 2017
@vstinner
Copy link
Member

@ncoghlan: "ncoghlan closed this 6 hours ago"

Why did you close @methane PR?

@ncoghlan ncoghlan reopened this Sep 27, 2017
@ncoghlan
Copy link
Contributor

Oops, hit the wrong button - sorry.

@methane
Copy link
Member Author

methane commented Oct 1, 2017

Some other formats.

Replace bullet to imported and insert in before cumulative time:

import time:   imported collections.abc in 207 us (self 207 us)
import time:     imported _string in 67 us (self 67 us)
import time:         imported types in 332 us (self 332 us)
import time:       imported enum in 1189 us (self 857 us)
import time:         imported _sre in 96 us (self 96 us)
import time:           imported sre_constants in 384 us (self 384 us)
import time:         imported sre_parse in 837 us (self 454 us)
import time:       imported sre_compile in 1356 us (self 423 us)
import time:         imported _functools in 77 us (self 77 us)
import time:       imported functools in 803 us (self 726 us)
import time:       imported _locale in 98 us (self 98 us)
import time:       imported copyreg in 227 us (self 227 us)
import time:     imported re in 4553 us (self 882 us)
import time:   imported string in 6872 us (self 2253 us)
import time:   imported threading in 724 us (self 724 us)
import time:   imported atexit in 57 us (self 57 us)
import time: imported logging in 13756 us (self 1172 us)

It seems too dense. "import time" and "imported" looks duplicated.
Then, remove imported and replace in with took.

import time:   collections.abc took 211 us (self 211 us)
import time:     _string took 52 us (self 52 us)
import time:         types took 328 us (self 328 us)
import time:       enum took 1191 us (self 863 us)
import time:         _sre took 96 us (self 96 us)
import time:           sre_constants took 383 us (self 383 us)
import time:         sre_parse took 838 us (self 456 us)
import time:       sre_compile took 1356 us (self 423 us)
import time:         _functools took 82 us (self 82 us)
import time:       functools took 808 us (self 726 us)
import time:       _locale took 107 us (self 107 us)
import time:       copyreg took 237 us (self 237 us)
import time:     re took 4571 us (self 875 us)
import time:   string took 6865 us (self 2243 us)
import time:   threading took 714 us (self 714 us)
import time:   atexit took 56 us (self 56 us)
import time: logging took 13802 us (self 1178 us)

No in or took:

import time:   collections.abc (self: 204 us, cumulative: 204 us)
import time:     _string (self: 70 us, cumulative: 70 us)
import time:         types (self: 327 us, cumulative: 327 us)
import time:       enum (self: 849 us, cumulative: 1176 us)
import time:         _sre (self: 94 us, cumulative: 94 us)
import time:           sre_constants (self: 385 us, cumulative: 385 us)
import time:         sre_parse (self: 452 us, cumulative: 837 us)
import time:       sre_compile (self: 424 us, cumulative: 1355 us)
import time:         _functools (self: 78 us, cumulative: 78 us)
import time:       functools (self: 729 us, cumulative: 806 us)
import time:       _locale (self: 103 us, cumulative: 103 us)
import time:       copyreg (self: 227 us, cumulative: 227 us)
import time:     re (self: 855 us, cumulative: 4519 us)
import time:   string (self: 2236 us, cumulative: 6825 us)
import time:   threading (self: 719 us, cumulative: 719 us)
import time:   atexit (self: 56 us, cumulative: 56 us)
import time: logging (self: 1175 us, cumulative: 13721 us)

@pitrou
Copy link
Member

pitrou commented Oct 1, 2017 via email

@methane
Copy link
Member Author

methane commented Oct 1, 2017

Table format:

import time: self [us] | cumulative | imported package
import time:        73 |         73 |     _codecs
import time:       606 |        679 |   codecs       
import time:       416 |        416 |   encodings.aliases
import time:       614 |       1707 | encodings         
import time:       262 |        262 | encodings.utf_8
...
import time:       197 |        197 |   collections.abc
import time:        53 |         53 |     _string
import time:       331 |        331 |         types
import time:       835 |       1166 |       enum
import time:        98 |         98 |         _sre
import time:       387 |        387 |           sre_constants
import time:       428 |        814 |         sre_parse
import time:       400 |       1311 |       sre_compile
import time:        78 |         78 |         _functools
import time:       645 |        722 |       functools
import time:        98 |         98 |       _locale
import time:       230 |        230 |       copyreg
import time:       796 |       4321 |     re
import time:      2211 |       6584 |   string
import time:       698 |        698 |   threading
import time:        56 |         56 |   atexit
import time:      1127 |      13119 | logging

@ncoghlan
Copy link
Contributor

ncoghlan commented Oct 2, 2017

My favourite so far is the one-line nested option:

import time:   atexit (self: 56 us, cumulative: 56 us)
import time: logging (self: 1175 us, cumulative: 13721 us)

My rationale for that preference is that it means a simple grep filter can select the top-level imports:

grep "import time: \S"

Reporting depth selection is then just a matter of varying the amount of whitespace that the regex allows.

I also think that no matter what we do here, converting any single-line-per-module format into a proper display for humans is going to involve post-processing to allow the indirect imports to be reported after the top level import that initiated them.

@vstinner
Copy link
Member

vstinner commented Oct 2, 2017

I like the table output with the header.

@methane
Copy link
Member Author

methane commented Oct 2, 2017

I'll merge this PR in next day (12+ hours later), if other format gain more votes.

@methane methane merged commit 1a87de7 into python:master Oct 3, 2017
@methane methane deleted the ximportprofile branch October 3, 2017 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants