Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation memory spikes during LTO #65431

Closed
tiagolam opened this issue Oct 15, 2019 · 4 comments
Closed

Compilation memory spikes during LTO #65431

tiagolam opened this issue Oct 15, 2019 · 4 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tiagolam
Copy link
Contributor

tiagolam commented Oct 15, 2019

Test case:

static FILLER: [u8; 60 * 1024 * 1024] = [1; 60 * 1024 * 1024];

fn main() {
    println!("Hello, world! {}", FILLER[0]);

    tokio::runtime::current_thread::Runtime::new().unwrap();
}

(with the dependency tokio = "^0.2.0-alpha.3" added to Cargo.toml)

The FILLER is there so we can simulate a binary of a certain size.

Compiling with:

cargo build --release

Generates memory spikes of around 12GiB, which sometimes means the compilation fails as it ends up being killed by the OS (signal: 9, SIGKILL: kill).

Further analysis suggests it has to do with lto cargo rustc --release -p spike -- -Ztime-passes:

time: 0.000; rss: 51MB    parsing
  time: 0.000; rss: 51MB    attributes injection
  time: 0.000; rss: 51MB    recursion limit
  time: 0.000; rss: 51MB    plugin loading
  time: 0.000; rss: 51MB    plugin registration
  time: 0.000; rss: 51MB    pre-AST-expansion lint checks
  time: 0.000; rss: 54MB    crate injection
    time: 0.009; rss: 67MB  expand crate
    time: 0.000; rss: 67MB  check unused macros
  time: 0.009; rss: 67MB    expansion
  time: 0.000; rss: 67MB    maybe building test harness
  time: 0.000; rss: 67MB    AST validation
  time: 0.000; rss: 67MB    maybe creating a macro crate
  time: 0.013; rss: 85MB    name resolution
  time: 0.000; rss: 85MB    complete gated feature checking
  time: 0.000; rss: 87MB    lowering AST -> HIR
  time: 0.000; rss: 87MB    early lint checks
    time: 0.000; rss: 87MB  validate HIR map
  time: 0.000; rss: 87MB    indexing HIR
  time: 0.000; rss: 87MB    load query result cache
  time: 0.000; rss: 89MB    dep graph tcx init
    time: 0.000; rss: 89MB  looking for entry point
    time: 0.000; rss: 89MB  looking for plugin registrar
    time: 0.000; rss: 89MB  looking for derive registrar
  time: 0.000; rss: 89MB    misc checking 1
  time: 0.000; rss: 92MB    type collecting
  time: 0.000; rss: 92MB    impl wf inference
    time: 0.000; rss: 92MB  unsafety checking
    time: 0.000; rss: 92MB  orphan checking
  time: 0.000; rss: 92MB    coherence checking
  time: 0.009; rss: 112MB   wf checking
  time: 0.000; rss: 112MB   item-types checking
  time: 0.013; rss: 127MB   item-bodies checking
    time: 0.000; rss: 127MB match checking
    time: 0.000; rss: 127MB liveness checking + intrinsic checking
  time: 0.000; rss: 127MB   misc checking 2
  time: 0.002; rss: 127MB   MIR borrow checking
  time: 0.000; rss: 127MB   dumping Chalk-like clauses
  time: 0.000; rss: 127MB   MIR effect checking
  time: 0.000; rss: 127MB   layout testing
    time: 0.000; rss: 127MB privacy access levels
    time: 0.000; rss: 127MB private in public
    time: 0.000; rss: 127MB death checking
    time: 0.000; rss: 127MB unused lib feature checking
      time: 3.072; rss: 199MB   crate lints
      time: 0.000; rss: 199MB   module lints
    time: 3.072; rss: 199MB lint checking
    time: 0.000; rss: 199MB privacy checking modules
  time: 3.073; rss: 199MB   misc checking 3
  time: 0.000; rss: 199MB   metadata encoding and writing
      time: 0.000; rss: 199MB   collecting roots
      time: 0.169; rss: 214MB   collecting mono items
    time: 0.169; rss: 214MB monomorphization collection
    time: 0.001; rss: 214MB codegen unit partitioning
    time: 0.000; rss: 215MB write allocator module
    time: 0.003; rss: 221MB llvm function passes [what.i4xg1d4t-cgu.0]
    time: 0.003; rss: 225MB llvm function passes [what.i4xg1d4t-cgu.3]
    time: 0.003; rss: 227MB llvm function passes [what.i4xg1d4t-cgu.8]
    time: 0.002; rss: 228MB llvm function passes [what.i4xg1d4t-cgu.15]
    time: 0.002; rss: 229MB llvm function passes [what.i4xg1d4t-cgu.9]
    time: 0.002; rss: 231MB llvm function passes [what.i4xg1d4t-cgu.4]
    time: 0.003; rss: 233MB llvm function passes [what.i4xg1d4t-cgu.2]
    time: 0.023; rss: 233MB llvm module passes [what.i4xg1d4t-cgu.15]
    time: 0.002; rss: 234MB llvm function passes [what.i4xg1d4t-cgu.6]
    time: 0.002; rss: 235MB llvm function passes [what.i4xg1d4t-cgu.13]
    time: 0.023; rss: 235MB llvm module passes [what.i4xg1d4t-cgu.4]
    time: 0.039; rss: 235MB llvm module passes [what.i4xg1d4t-cgu.8]
    time: 0.031; rss: 235MB llvm module passes [what.i4xg1d4t-cgu.9]
    time: 0.060; rss: 236MB llvm module passes [what.i4xg1d4t-cgu.0]
    time: 0.055; rss: 237MB llvm module passes [what.i4xg1d4t-cgu.3]
    time: 0.026; rss: 237MB llvm module passes [what.i4xg1d4t-cgu.2]
    time: 0.014; rss: 237MB llvm module passes [what.i4xg1d4t-cgu.13]
    time: 0.025; rss: 238MB llvm module passes [what.i4xg1d4t-cgu.6]
    time: 0.002; rss: 301MB llvm function passes [what.i4xg1d4t-cgu.12]
    time: 0.001; rss: 301MB llvm function passes [what.i4xg1d4t-cgu.11]
    time: 0.001; rss: 301MB llvm function passes [what.i4xg1d4t-cgu.1]
    time: 0.001; rss: 301MB llvm function passes [what.i4xg1d4t-cgu.5]
    time: 0.001; rss: 301MB llvm function passes [what.i4xg1d4t-cgu.7]
    time: 0.001; rss: 300MB llvm function passes [what.i4xg1d4t-cgu.14]
    time: 0.160; rss: 300MB codegen to LLVM IR
    time: 0.000; rss: 300MB assert dep graph
    time: 0.000; rss: 300MB serialize dep graph
  time: 0.335; rss: 300MB   codegen
    time: 0.008; rss: 300MB llvm module passes [what.i4xg1d4t-cgu.11]
    time: 0.001; rss: 300MB llvm function passes [what.i4xg1d4t-cgu.10]
    time: 0.006; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.7]
    time: 0.011; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.1]
    time: 0.004; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.10]
    time: 0.007; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.14]
    time: 0.017; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.5]
    time: 0.027; rss: 301MB llvm module passes [what.i4xg1d4t-cgu.12]
    time: 0.001; rss: 1359MB    LTO passes
    time: 0.002; rss: 1369MB    LTO passes
    time: 0.001; rss: 1371MB    codegen passes [what.i4xg1d4t-cgu.1]
    time: 0.003; rss: 1386MB    codegen passes [what.i4xg1d4t-cgu.10]
    time: 0.007; rss: 1397MB    LTO passes
    time: 0.004; rss: 1408MB    LTO passes
    time: 0.010; rss: 1416MB    LTO passes
    time: 0.008; rss: 1463MB    codegen passes [what.i4xg1d4t-cgu.14]
    time: 0.013; rss: 1480MB    codegen passes [what.i4xg1d4t-cgu.7]
    time: 0.015; rss: 1515MB    codegen passes [what.i4xg1d4t-cgu.4]
    time: 0.009; rss: 1561MB    LTO passes
    time: 0.001; rss: 1569MB    LTO passes
    time: 0.001; rss: 1575MB    codegen passes [what.i4xg1d4t-cgu.11]
    time: 0.006; rss: 1586MB    codegen passes [what.i4xg1d4t-cgu.5]
    time: 0.065; rss: 7193MB    LTO passes
    time: 0.222; rss: 8929MB    codegen passes [what.i4xg1d4t-cgu.12]
    time: 0.032; rss: 9334MB    LTO passes
    time: 0.034; rss: 9430MB    codegen passes [what.i4xg1d4t-cgu.0]
    time: 0.025; rss: 9572MB    LTO passes
    time: 0.021; rss: 9604MB    LTO passes
    time: 0.021; rss: 9620MB    codegen passes [what.i4xg1d4t-cgu.2]
    time: 0.028; rss: 9622MB    LTO passes
    time: 0.017; rss: 9668MB    codegen passes [what.i4xg1d4t-cgu.8]
    time: 0.026; rss: 9669MB    codegen passes [what.i4xg1d4t-cgu.3]
    time: 0.006; rss: 9756MB    LTO passes
    time: 0.007; rss: 9756MB    codegen passes [what.i4xg1d4t-cgu.13]
    time: 0.035; rss: 9756MB    LTO passes
    time: 0.019; rss: 9756MB    LTO passes
    time: 0.040; rss: 9756MB    LTO passes
    time: 0.013; rss: 9756MB    codegen passes [what.i4xg1d4t-cgu.9]
    time: 0.016; rss: 9756MB    codegen passes [what.i4xg1d4t-cgu.15]
    time: 0.048; rss: 9757MB    codegen passes [what.i4xg1d4t-cgu.6]
  time: 2.697; rss: 9757MB  LLVM passes
  time: 0.000; rss: 9757MB  serialize work products
    time: 0.493; rss: 9758MB    running linker
  time: 0.500; rss: 9758MB  linking
time: 6.535; rss: 9713MB        total
    Finished release [optimized] target(s) in 6.82s

The issue can be worked around by either disabling LTO with RUSTFLAGS='-C lto=no' or change the FILLER to be a static array of 0's:

static FILLER: [u8; 60 * 1024 * 1024] = [0; 60 * 1024 * 1024];
@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 15, 2019
@lrbalt
Copy link

lrbalt commented Oct 23, 2019

With the recent nightly compiler the memory usage in linking stage was reduced drastically for me. It may be that the GCC upgrade for ARM caused this improvement. No more OOM there!

#65302

@mati865
Copy link
Contributor

mati865 commented Oct 26, 2019

@tiagolam could you post your Rust version (rustc -vV)?

@tiagolam
Copy link
Contributor Author

@mati865 Here:

$ rustc -vV
rustc 1.40.0-nightly (e413dc36a 2019-10-14)
binary: rustc
commit-hash: e413dc36a83a5aad3ab6270373000693a917e92b
commit-date: 2019-10-14
host: x86_64-unknown-linux-gnu
release: 1.40.0-nightly
LLVM version: 9.0

@tiagolam
Copy link
Contributor Author

tiagolam commented Oct 1, 2020

Closing as I can no longer reproduce this on the latest nightly and tokio = { version = "0.2" }:

$ rustc -vV
rustc 1.48.0-nightly (7f7a1cbfd 2020-09-27)
binary: rustc
commit-hash: 7f7a1cbfd3b55daee191247770627afab09eece2
commit-date: 2020-09-27
host: x86_64-unknown-linux-gnu
release: 1.48.0-nightly
LLVM version: 11.0
tiago@hadean:~/Devel/examples/rust/oom_compiler$ cargo rustc --release -p oom_compiler -- -Ztime-passes
   Compiling bytes v0.5.6
   Compiling pin-project-lite v0.1.10
   Compiling tokio v0.2.22
   Compiling oom_compiler v0.1.0 (/home/tiago/Devel/examples/rust/oom_compiler)
time: 0.000; rss: 55MB	parse_crate
time: 0.000; rss: 55MB	attributes_injection
time: 0.000; rss: 55MB	recursion_limit
time: 0.000; rss: 55MB	plugin_loading
time: 0.000; rss: 55MB	plugin_registration
time: 0.000; rss: 55MB	pre_AST_expansion_lint_checks
time: 0.000; rss: 59MB	crate_injection
time: 0.003; rss: 72MB	expand_crate
time: 0.000; rss: 72MB	check_unused_macros
time: 0.003; rss: 72MB	macro_expand_crate
time: 0.000; rss: 72MB	maybe_building_test_harness
time: 0.000; rss: 72MB	AST_validation
time: 0.000; rss: 72MB	maybe_create_a_macro_crate
time: 0.000; rss: 78MB	complete_gated_feature_checking
time: 0.004; rss: 78MB	configure_and_expand
time: 0.000; rss: 78MB	prepare_outputs
time: 0.000; rss: 78MB	hir_lowering
time: 0.000; rss: 78MB	early_lint_checks
time: 0.000; rss: 82MB	setup_global_ctxt
time: 0.000; rss: 82MB	dep_graph_tcx_init
time: 0.000; rss: 82MB	create_global_ctxt
time: 0.000; rss: 85MB	looking_for_entry_point
time: 0.000; rss: 85MB	looking_for_plugin_registrar
time: 0.000; rss: 85MB	looking_for_derive_registrar
time: 0.000; rss: 85MB	misc_checking_1
time: 0.000; rss: 89MB	type_collecting
time: 0.000; rss: 89MB	impl_wf_inference
time: 0.000; rss: 89MB	unsafety_checking
time: 0.000; rss: 89MB	orphan_checking
time: 0.000; rss: 89MB	coherence_checking
time: 0.002; rss: 110MB	wf_checking
time: 0.000; rss: 110MB	item_types_checking
time: 0.003; rss: 118MB	item_bodies_checking
time: 0.006; rss: 118MB	type_check_crate
time: 0.000; rss: 118MB	match_checking
time: 0.000; rss: 118MB	liveness_and_intrinsic_checking
time: 0.000; rss: 118MB	misc_checking_2
time: 0.001; rss: 118MB	MIR_borrow_checking
time: 0.000; rss: 118MB	MIR_effect_checking
time: 0.000; rss: 118MB	layout_testing
time: 0.000; rss: 120MB	death_checking
time: 0.000; rss: 120MB	unused_lib_feature_checking
time: 2.767; rss: 262MB	crate_lints
time: 0.000; rss: 262MB	module_lints
time: 2.767; rss: 262MB	lint_checking
time: 0.000; rss: 262MB	privacy_checking_modules
time: 2.767; rss: 262MB	misc_checking_3
time: 0.000; rss: 262MB	monomorphization_collector_root_collections
time: 0.043; rss: 299MB	monomorphization_collector_graph_walk
time: 0.001; rss: 299MB	partition_and_assert_distinct_symbols
time: 0.000; rss: 300MB	write_allocator_module
time: 0.000; rss: 300MB	find_cgu_reuse
time: 0.001; rss: 307MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.3)
time: 0.012; rss: 310MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.3)
time: 0.000; rss: 311MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.5)
time: 0.001; rss: 311MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.4)
time: 0.000; rss: 312MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.1)
time: 0.076; rss: 313MB	codegen_to_LLVM_IR
time: 0.001; rss: 313MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.5)
time: 0.000; rss: 313MB	assert_dep_graph
time: 0.000; rss: 313MB	serialize_dep_graph
time: 0.122; rss: 313MB	codegen_crate
time: 0.000; rss: 311MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.2)
time: 0.001; rss: 272MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.6)
time: 0.001; rss: 259MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.0)
time: 0.002; rss: 259MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.2)
time: 0.000; rss: 259MB	LLVM_module_optimize_function_passes(oom_compiler.6c7yekoo-cgu.7)
time: 0.001; rss: 260MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.6)
time: 0.000; rss: 260MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.7)
time: 0.002; rss: 260MB	free_global_ctxt
time: 0.002; rss: 260MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.1)
time: 0.001; rss: 260MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.0)
time: 0.011; rss: 261MB	LLVM_module_optimize_module_passes(oom_compiler.6c7yekoo-cgu.4)
time: 0.002; rss: 1312MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.7)
time: 0.002; rss: 1312MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.6)
time: 0.002; rss: 1312MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.2)
time: 0.002; rss: 1312MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.0)
time: 0.003; rss: 1312MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.5)
time: 0.003; rss: 1313MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.1)
time: 0.004; rss: 1314MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.3)
time: 0.005; rss: 1317MB	LLVM_lto_optimize(oom_compiler.6c7yekoo-cgu.4)
time: 1.637; rss: 1333MB	LLVM_passes(crate)
time: 0.000; rss: 1333MB	join_worker_thread
time: 1.575; rss: 1333MB	finish_ongoing_codegen
time: 0.000; rss: 1333MB	serialize_work_products
time: 0.000; rss: 1333MB	link_binary_check_files_are_writeable
time: 0.279; rss: 1333MB	run_linker
time: 0.006; rss: 1333MB	link_binary_remove_temps
time: 0.285; rss: 1333MB	link_binary
time: 0.285; rss: 1333MB	link_crate
time: 0.000; rss: 1333MB	llvm_dump_timing_file
time: 1.860; rss: 1333MB	link
time: 4.768; rss: 1333MB		total
    Finished release [optimized] target(s) in 5.68s

@tiagolam tiagolam closed this as completed Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants