Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log_record_id runtime increases quadratically with duplicate logs #28400

Closed
iamed2 opened this issue Aug 1, 2018 · 4 comments · Fixed by #29878
Closed

log_record_id runtime increases quadratically with duplicate logs #28400

iamed2 opened this issue Aug 1, 2018 · 4 comments · Fixed by #29878
Assignees
Labels
logging The logging framework performance Must go faster

Comments

@iamed2
Copy link
Contributor

iamed2 commented Aug 1, 2018

I've copied the code in question from logging.jl here:

_log_record_ids = Set{Symbol}()
# Generate a unique, stable, short, human readable identifier for a logging
# statement.  The idea here is to have a key against which log records can be
# filtered and otherwise manipulated. The key should uniquely identify the
# source location in the originating module, but should be stable across
# versions of the originating module, provided the log generating statement
# itself doesn't change.
function log_record_id(_module, level, message_ex)
    modname = _module === nothing ?  "" : join(fullname(_module), "_")
    # Use (1<<31) to fit well within an (arbitriraly chosen) eight hex digits,
    # as we increment h to resolve any collisions.
    h = hash(string(modname, level, message_ex)) % (1<<31)
    while true
        id = Symbol(modname, '_', string(h, base = 16, pad = 8))
        # _log_record_ids is a registry of log record ids for use during
        # compilation, to ensure uniqueness of ids.  Note that this state will
        # only persist during module compilation so it will be empty when a
        # precompiled module is loaded.
        if !(id in _log_record_ids)
            push!(_log_record_ids, id)
            return id
        end
        h += 1
    end
end

Note that it will loop the number of times the log has previously been logged. This makes benchmarking logging very difficult, as this function takes up almost the entire runtime.

julia> function xparse4(str)
           @debug "xparse"
           Base.parse(Int, str)
       end

julia> @benchmark @suppress xparse4("0")
BenchmarkTools.Trial:
  memory estimate:  1.20 MiB
  allocs estimate:  22514
  --------------
  minimum time:     2.856 ms (0.00% GC)
  median time:      3.752 ms (0.00% GC)
  mean time:        3.979 ms (4.19% GC)
  maximum time:     51.694 ms (92.27% GC)
  --------------
  samples:          1255
  evals/sample:     1
julia> results = @benchmark Base.CoreLogging.log_record_id(@__MODULE__, :Debug, :("xparse"))
BenchmarkTools.Trial:
  memory estimate:  1.04 MiB
  allocs estimate:  24700
  --------------
  minimum time:     2.852 ms (0.00% GC)
  median time:      4.141 ms (0.00% GC)
  mean time:        4.366 ms (2.64% GC)
  maximum time:     46.329 ms (90.08% GC)
  --------------
  samples:          1143
  evals/sample:     1
julia> lineplot(StatsBase.trim(results.times; count=10))
           ┌────────────────────────────────────────┐
   8000000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡼⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡞⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠞⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠴⠚⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠴⠚⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⠀⠀⠀⠀⣀⡤⠴⠒⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           │⢀⡤⠖⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
   3000000 │⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
           └────────────────────────────────────────┘
           0                                     2000

Do we even need this function? Could we just have logs not have ids unless they're explicitly provided? Where are these ids being used?

@ararslan ararslan added performance Must go faster logging The logging framework labels Aug 1, 2018
@iamed2
Copy link
Contributor Author

iamed2 commented Aug 2, 2018

@StefanKarpinski I don't think you're going to get any response from Chris, he hasn't been active for half a year now

@StefanKarpinski
Copy link
Member

Fair enough. I’m still giving the responsibility to him :)

@c42f
Copy link
Member

c42f commented Oct 31, 2018

Root cause is the same as #28786 (bug introduced in #28017)

Analysis: #29355 (comment)

It's concerning that nobody seems to know what id is for, but reading back on https://docs.julialang.org/en/v1/stdlib/Logging/index.html#Log-event-structure-1 I guess I can see why; the docs don't explain in depth.

c42f added a commit that referenced this issue Nov 1, 2018
Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

Fixes #28786, #28400; replaces #29355.
fredrikekre pushed a commit that referenced this issue Nov 1, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`
KristofferC pushed a commit that referenced this issue Nov 1, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
KristofferC pushed a commit that referenced this issue Nov 2, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
@oxinabox
Copy link
Contributor

oxinabox commented Nov 4, 2018

Log id's are useful. They are used in OhMyREPL's logger to allow repeated log messages e.g. in s loop to be overwritten.

With that said because of #28786 OhMyREPL's logger also reassigned all ids to be concatenation of file name and line number.

But that is fixed now.

KristofferC pushed a commit that referenced this issue Feb 11, 2019
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
KristofferC pushed a commit that referenced this issue Feb 20, 2020
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logging The logging framework performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants