Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport JuliaLang 50144 #21

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 3 additions & 24 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,30 +38,9 @@ Language changes
Compiler/Runtime improvements
-----------------------------

* Bootstrapping time has been improved by about 25% ([#41794]).
* The LLVM-based compiler has been separated from the run-time library into a new library,
`libjulia-codegen`. It is loaded by default, so normal usage should see no changes.
In deployments that do not need the compiler (e.g. system images where all needed code
is precompiled), this library (and its LLVM dependency) can simply be excluded ([#41936]).
* Conditional type constraints are now be forwarded interprocedurally (i.e. propagated from caller to callee).
This allows inference to understand e.g. `Base.ifelse(isa(x, Int), x, 0)` returns `::Int`-value
even if the type of `x` is not known ([#42529]).
* Julia-level SROA (Scalar Replacement of Aggregates) has been improved: allowing elimination of
`getfield` calls with constant global fields ([#42355]), enabling elimination of mutable structs with
uninitialized fields ([#43208]), improving performance ([#43232]), and handling more nested `getfield`
calls ([#43239]).
* Abstract call sites can now be inlined or statically resolved as long as the call site has a single
matching method ([#43113]).
* Inference now tracks various effects such as side-effectful-ness and nothrow-ness on a per-specialization basis.
Code heavily dependent on constant propagation should see significant compile-time performance improvements and
certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance
improvements. Effects may be overwritten manually with the `Base.@assume_effects` macro ([#43852]).
* Precompilation (with explicit `precompile` directives or representative workloads) now saves more type-inferred code,
resulting in reduced time-to-first task for packages that use precompilation. This change also eliminates the
runtime performance degradation occasionally triggered by precompilation on older Julia versions. More specifically,
any newly-inferred method/type combinations needed by your package--regardless of where those methods were
defined--can now be cached in the precompile file, as long as they are inferrably called by a method owned by
your package ([#43990]).
* The `@pure` macro is now deprecated. Use `Base.@assume_effects :foldable` instead ([#48682]).
* The mark phase of the Garbage Collector is now multi-threaded ([#48600]).
* Updated GC heuristics to count allocated pages instead of individual objects ([#50144]).

Command-line option changes
---------------------------
Expand Down
2 changes: 1 addition & 1 deletion contrib/generate_precompile.jl
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ function generate_precompile_statements()
print("Total ─────── "); Base.time_print(tot_time); println()
print("Generation ── "); Base.time_print(gen_time); print(" "); show(IOContext(stdout, :compact=>true), gen_time / tot_time * 100); println("%")
print("Execution ─── "); Base.time_print(include_time); print(" "); show(IOContext(stdout, :compact=>true), include_time / tot_time * 100); println("%")

GC.gc(true)
return
end

Expand Down
76 changes: 76 additions & 0 deletions doc/src/devdocs/gc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Garbage Collection in Julia

## Introduction

Julia has a serial, stop-the-world, generational, non-moving mark-sweep garbage collector.
Native objects are precisely scanned and foreign ones are conservatively marked.

## Memory layout of objects and GC bits

An opaque tag is stored in the front of GC managed objects, and its lowest two bits are
used for garbage collection. The lowest bit is set for marked objects and the second
lowest bit stores age information (e.g. it's only set for old objects).

Objects are aligned by a multiple of 4 bytes to ensure this pointer tagging is legal.

## Pool allocation

Sufficiently small objects (up to 2032 bytes) are allocated on per-thread object
pools.

A three-level tree (analogous to a three-level page-table) is used to keep metadata
(e.g. whether a page has been allocated, whether contains marked objects, number of free objects etc.)
about address ranges spanning at least one page.
Sweeping a pool allocated object consists of inserting it back into the free list
maintained by its pool.

## Malloc'd arrays and big objects

Two lists are used to keep track of the remaining allocated objects:
one for sufficiently large malloc'd arrays (`mallocarray_t`) and one for
sufficiently large objects (`bigval_t`).

Sweeping these objects consists of unlinking them from their list and calling `free` on the
corresponding address.

## Generational and remembered sets

Field writes into old objects trigger a write barrier if the written field
points to a young object and if a write barrier has not been triggered on the old object yet.
In this case, the old object being written to is enqueued into a remembered set, and
its mark bit is set to indicate that a write barrier has already been triggered on it.

There is no explicit flag to determine whether a marking pass will scan the
entire heap or only through young objects and remembered set.
The mark bits of the objects themselves are used to determine whether a full mark happens.
The mark-sweep algorithm follows this sequence of steps:

- Objects in the remembered set have their GC mark bits reset
(these are set once write barrier is triggered, as described above) and are enqueued.

- Roots (e.g. thread locals) are enqueued.

- Object graph is traversed and mark bits are set.

- Object pools, malloc'd arrays and big objects are sweeped. On a full sweep,
the mark bits of all marked objects are reset. On a generational sweep,
only the mark bits of marked young objects are reset.

- Mark bits of objects in the remembered set are set,
so we don't trigger the write barrier on them again.

After these stages, old objects will be left with their mark bits set,
so that references from them are not explored in a subsequent generational collection.
This scheme eliminates the need of explicitly keeping a flag to indicate a full mark
(though a flag to indicate a full sweep is necessary).

## Heuristics

GC heuristics tune the GC by changing the size of the allocation interval between garbage collections.

The GC heuristics measure how big the heap size is after a collection and set the next collection to when the heap size is twice as big as the current size or to the maximum heap size.
The heuristics measure the heap size by counting the number of pages that are in use and the objects that use malloc. Previously we measured the heap size by counting
the alive objects, but that doesn't take into account fragmentation which could lead to bad decisions, that also meant that we used thread local information (allocations) to make
decisions about a process wide (when to GC), measuring pages means the decision is global.

The GC will do full collections when the heap size reaches 80% of the maximum allowed size.
19 changes: 18 additions & 1 deletion src/gc-debug.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
// This file is a part of Julia. License is MIT: https://julialang.org/license

#include "gc.h"
#include "julia.h"
#include <inttypes.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

// re-include assert.h without NDEBUG,
Expand Down Expand Up @@ -1403,7 +1406,7 @@ JL_DLLEXPORT void jl_enable_gc_logging(int enable) {
gc_logging_enabled = enable;
}

void _report_gc_finished(uint64_t pause, uint64_t freed, int full, int recollect) JL_NOTSAFEPOINT {
void _report_gc_finished(uint64_t pause, uint64_t freed, int full, int recollect, int64_t live_bytes) JL_NOTSAFEPOINT {
if (!gc_logging_enabled) {
return;
}
Expand All @@ -1412,6 +1415,20 @@ void _report_gc_finished(uint64_t pause, uint64_t freed, int full, int recollect
full ? "full" : "incr",
recollect ? "recollect" : ""
);
jl_safe_printf("Heap stats: bytes_mapped %.1f, bytes_decomitted %.1f, bytes_allocd %.1f\nbytes_freed %.1f, bytes_mallocd %.1f, malloc_bytes_freed %.1f\npages_perm_allocd %zu, heap_size %.1f, heap_target %.1f, live_bytes %1.f\n",
jl_atomic_load_relaxed(&gc_heap_stats.bytes_mapped)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.bytes_decomitted)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.bytes_allocd)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.bytes_freed)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.bytes_mallocd)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.malloc_bytes_freed)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.pages_perm_allocd),
jl_atomic_load_relaxed(&gc_heap_stats.heap_size)/1e6,
jl_atomic_load_relaxed(&gc_heap_stats.heap_target)/1e6,
live_bytes/1e6

);
jl_safe_printf("Fragmentation %1.f\n", (double)live_bytes/(double)jl_atomic_load_relaxed(&gc_heap_stats.heap_size));
}

#ifdef __cplusplus
Expand Down
7 changes: 6 additions & 1 deletion src/gc-pages.c
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ static char *jl_gc_try_alloc_pages(int pg_cnt) JL_NOTSAFEPOINT
// round data pointer up to the nearest gc_page_data-aligned
// boundary if mmap didn't already do so.
mem = (char*)gc_page_data(mem + GC_PAGE_SZ - 1);
jl_atomic_fetch_add_relaxed(&gc_heap_stats.bytes_mapped, pages_sz);
return mem;
}

Expand Down Expand Up @@ -284,6 +285,8 @@ NOINLINE jl_gc_pagemeta_t *jl_gc_alloc_page(void) JL_NOTSAFEPOINT
errno = last_errno;
current_pg_count++;
gc_final_count_page(current_pg_count);
jl_atomic_fetch_add_relaxed(&gc_heap_stats.bytes_allocd, GC_PAGE_SZ);
jl_atomic_fetch_add_relaxed(&gc_heap_stats.heap_size, GC_PAGE_SZ);
uv_mutex_unlock(&gc_perm_lock);
return info.meta;
}
Expand Down Expand Up @@ -334,7 +337,7 @@ void jl_gc_free_page(void *p) JL_NOTSAFEPOINT
#else
madvise(p, decommit_size, MADV_DONTNEED);
#endif

jl_atomic_fetch_add_relaxed(&gc_heap_stats.bytes_decomitted, GC_PAGE_SZ);
no_decommit:
// new pages are now available starting at max of lb and pagetable_i32
if (memory_map.lb > info.pagetable_i32)
Expand All @@ -344,6 +347,8 @@ void jl_gc_free_page(void *p) JL_NOTSAFEPOINT
if (info.pagetable0->lb > info.pagetable0_i32)
info.pagetable0->lb = info.pagetable0_i32;
current_pg_count--;
jl_atomic_fetch_add_relaxed(&gc_heap_stats.bytes_freed, GC_PAGE_SZ);
jl_atomic_fetch_add_relaxed(&gc_heap_stats.heap_size, -GC_PAGE_SZ);
}

#ifdef __cplusplus
Expand Down
Loading