Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel #43

Closed
DawallAZllon91 opened this issue Apr 29, 2024 · 0 comments
Closed

Kernel #43

DawallAZllon91 opened this issue Apr 29, 2024 · 0 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed invalid This doesn't seem right question Further information is requested

Comments

@DawallAZllon91
Copy link
Owner

	ftrace - Function Tracer
	========================

Copyright 2008 Red Hat Inc.
Author: Steven Rostedt [email protected]
License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2)
Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
John Kacur, and David Teigland.
Written for: 2.6.28-rc2
Updated for: 3.10
Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt

Introduction

Ftrace is an internal tracer designed to help out developers and
designers of systems to find what is going on inside the kernel.
It can be used for debugging or analyzing latencies and
performance issues that take place outside of user-space.

Although ftrace is typically considered the function tracer, it
is really a frame work of several assorted tracing utilities.
There's latency tracing to examine what occurs between interrupts
disabled and enabled, as well as for preemption and from a time
a task is woken to the task is actually scheduled in.

One of the most common uses of ftrace is the event tracing.
Through out the kernel is hundreds of static event points that
can be enabled via the tracefs file system to see what is
going on in certain parts of the kernel.

See events.txt for more information.

Implementation Details

See ftrace-design.txt for details for arch porters and such.

The File System

Ftrace uses the tracefs file system to hold the control files as
well as the files to display output.

When tracefs is configured into the kernel (which selecting any ftrace
option will do) the directory /sys/kernel/tracing will be created. To mount
this directory, you can add to your /etc/fstab file:

tracefs /sys/kernel/tracing tracefs defaults 0 0

Or you can mount it at run time with:

mount -t tracefs nodev /sys/kernel/tracing

For quicker access to that directory you may want to make a soft link to
it:

ln -s /sys/kernel/tracing /tracing

 *** NOTICE ***

Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:

/sys/kernel/debug/tracing

All files located in the tracefs file system will be located in that
debugfs file system directory as well.

 *** NOTICE ***

Any selected ftrace option will also create the tracefs file system.
The rest of the document will assume that you are in the ftrace directory
(cd /sys/kernel/tracing) and will only concentrate on the files within that
directory and not distract from the content with the extended
"/sys/kernel/tracing" path name.

That's it! (assuming that you have ftrace configured into your kernel)

After mounting tracefs you will have access to the control and output files
of ftrace. Here is a list of some of the key files:

Note: all time values are in microseconds.

current_tracer:

This is used to set or display the current tracer
that is configured.

available_tracers:

This holds the different types of tracers that
have been compiled into the kernel. The
tracers listed here can be configured by
echoing their name into current_tracer.

tracing_on:

This sets or displays whether writing to the trace
ring buffer is enabled. Echo 0 into this file to disable
the tracer or 1 to enable it. Note, this only disables
writing to the ring buffer, the tracing overhead may
still be occurring.

The kernel function tracing_off() can be used within the
kernel to disable writing to the ring buffer, which will
set this file to "0". User space can re-enable tracing by
echoing "1" into the file.

Note, the function and event trigger "traceoff" will also
set this file to zero and stop tracing. Which can also
be re-enabled by user space using this file.

trace:

This file holds the output of the trace in a human
readable format (described below). Note, tracing is temporarily
disabled while this file is being read (opened).

trace_pipe:

The output is the same as the "trace" file but this
file is meant to be streamed with live tracing.
Reads from this file will block until new data is
retrieved.  Unlike the "trace" file, this file is a
consumer. This means reading from this file causes
sequential reads to display more current data. Once
data is read from this file, it is consumed, and
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data, it will display the same
information every time it is read. This file will not
disable tracing while being read.

trace_options:

This file lets the user control the amount of data
that is displayed in one of the above output
files. Options also exist to modify how a tracer
or events work (stack traces, timestamps, etc).

options:

This is a directory that has a file for every available
trace option (also in trace_options). Options may also be set
or cleared by writing a "1" or "0" respectively into the
corresponding file with the option name.

tracing_max_latency:

Some of the tracers record the max latency.
For example, the maximum time that interrupts are disabled.
The maximum time is saved in this file. The max trace will also be
stored,	and displayed by "trace". A new max trace will only be
recorded if the latency is greater than the value in this file
(in microseconds).

By echoing in a time into this file, no latency will be recorded
unless it is greater than the time in this file.

tracing_thresh:

Some latency tracers will record a trace whenever the
latency is greater than the number in this file.
Only active when the file contains a number greater than 0.
(in microseconds)

buffer_size_kb:

This sets or displays the number of kilobytes each CPU
buffer holds. By default, the trace buffers are the same size
for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
than requested, the rest of the page will be used,
making the actual allocation bigger than requested or shown.
( Note, the size may not be a multiple of the page size
  due to buffer management meta-data. )

Buffer sizes for individual CPUs may vary
(see "per_cpu/cpu0/buffer_size_kb" below), and if they do
this file will show "X".

buffer_total_size_kb:

This displays the total combined size of all the trace buffers.

free_buffer:

If a process is performing tracing, and the ring buffer	should be
shrunk "freed" when the process is finished, even if it were to be
killed by a signal, this file can be used for that purpose. On close
of this file, the ring buffer will be resized to its minimum size.
Having a process that is tracing also open this file, when the process
exits its file descriptor for this file will be closed, and in doing so,
the ring buffer will be "freed".

It may also stop tracing if disable_on_free option is set.

tracing_cpumask:

This is a mask that lets the user only trace on specified CPUs.
The format is a hex string representing the CPUs.

set_ftrace_filter:

When dynamic ftrace is configured in (see the
section below "dynamic ftrace"), the code is dynamically
modified (code text rewrite) to disable calling of the
function profiler (mcount). This lets tracing be configured
in with practically no overhead in performance.  This also
has a side effect of enabling or disabling specific functions
to be traced. Echoing names of functions into this file
will limit the trace to only those functions.

The functions listed in "available_filter_functions" are what
can be written into this file.

This interface also allows for commands to be used. See the
"Filter commands" section for more details.

set_ftrace_notrace:

This has an effect opposite to that of
set_ftrace_filter. Any function that is added here will not
be traced. If a function exists in both set_ftrace_filter
and set_ftrace_notrace,	the function will _not_ be traced.

set_ftrace_pid:

Have the function tracer only trace the threads whose PID are
listed in this file.

If the "function-fork" option is set, then when a task whose
PID is listed in this file forks, the child's PID will
automatically be added to this file, and the child will be
traced by the function tracer as well. This option will also
cause PIDs of tasks that exit to be removed from the file.

set_event_pid:

Have the events only trace a task with a PID listed in this file.
Note, sched_switch and sched_wake_up will also trace events
listed in this file.

To have the PIDs of children of tasks with their PID in this file
added on fork, enable the "event-fork" option. That option will also
cause the PIDs of tasks to be removed from this file when the task
exits.

set_graph_function:

Functions listed in this file will cause the function graph
tracer to only trace these functions and the functions that
they call. (See the section "dynamic ftrace" for more details).

set_graph_notrace:

Similar to set_graph_function, but will disable function graph
tracing when the function is hit until it exits the function.
This makes it possible to ignore tracing functions that are called
by a specific function.

available_filter_functions:

This lists the functions that ftrace has processed and can trace.
These are the function names that you can pass to
"set_ftrace_filter" or "set_ftrace_notrace".
(See the section "dynamic ftrace" below for more details.)

dyn_ftrace_total_info:

This file is for debugging purposes. The number of functions that
have been converted to nops and are available to be traced.

enabled_functions:

This file is more for debugging ftrace, but can also be useful
in seeing if any function has a callback attached to it.
Not only does the trace infrastructure use ftrace function
trace utility, but other subsystems might too. This file
displays all functions that have a callback attached to them
as well as the number of callbacks that have been attached.
Note, a callback may also call multiple functions which will
not be listed in this count.

If the callback registered to be traced by a function with
the "save regs" attribute (thus even more overhead), a 'R'
will be displayed on the same line as the function that
is returning registers.

If the callback registered to be traced by a function with
the "ip modify" attribute (thus the regs->ip can be changed),
an 'I' will be displayed on the same line as the function that
can be overridden.

If the architecture supports it, it will also show what callback
is being directly called by the function. If the count is greater
than 1 it most likely will be ftrace_ops_list_func().

If the callback of the function jumps to a trampoline that is
specific to a the callback and not the standard trampoline,
its address will be printed as well as the function that the
trampoline calls.

function_profile_enabled:

When set it will enable all functions with either the function
tracer, or if configured, the function graph tracer. It will
keep a histogram of the number of functions that were called
and if the function graph tracer was configured, it will also keep
track of the time spent in those functions. The histogram
content can be displayed in the files:

trace_stats/function<cpu> ( function0, function1, etc).

trace_stats:

A directory that holds different tracing stats.

kprobe_events:

Enable dynamic trace points. See kprobetrace.txt.

kprobe_profile:

Dynamic trace points stats. See kprobetrace.txt.

max_graph_depth:

Used with the function graph tracer. This is the max depth
it will trace into a function. Setting this to a value of
one will show only the first kernel function that is called
from user space.

printk_formats:

This is for tools that read the raw format files. If an event in
the ring buffer references a string, only a pointer to the string
is recorded into the buffer and not the string itself. This prevents
tools from knowing what that string was. This file displays the string
and address for	the string allowing tools to map the pointers to what
the strings were.

saved_cmdlines:

Only the pid of the task is recorded in a trace event unless
the event specifically saves the task comm as well. Ftrace
makes a cache of pid mappings to comms to try to display
comms for events. If a pid for a comm is not listed, then
"<...>" is displayed in the output.

If the option "record-cmd" is set to "0", then comms of tasks
will not be saved during recording. By default, it is enabled.

saved_cmdlines_size:

By default, 128 comms are saved (see "saved_cmdlines" above). To
increase or decrease the amount of comms that are cached, echo
in a the number of comms to cache, into this file.

saved_tgids:

If the option "record-tgid" is set, on each scheduling context switch
the Task Group ID of a task is saved in a table mapping the PID of
the thread to its TGID. By default, the "record-tgid" option is
disabled.

snapshot:

This displays the "snapshot" buffer and also lets the user
take a snapshot of the current running trace.
See the "Snapshot" section below for more details.

stack_max_size:

When the stack tracer is activated, this will display the
maximum stack size it has encountered.
See the "Stack Trace" section below.

stack_trace:

This displays the stack back trace of the largest stack
that was encountered when the stack tracer is activated.
See the "Stack Trace" section below.

stack_trace_filter:

This is similar to "set_ftrace_filter" but it limits what
functions the stack tracer will check.

trace_clock:

Whenever an event is recorded into the ring buffer, a
"timestamp" is added. This stamp comes from a specified
clock. By default, ftrace uses the "local" clock. This
clock is very fast and strictly per cpu, but on some
systems it may not be monotonic with respect to other
CPUs. In other words, the local clocks may not be in sync
with local clocks on other CPUs.

Usual clocks for tracing:

  # cat trace_clock
  [local] global counter x86-tsc

  The clock with the square brackets around it is the one
  in effect.

  local: Default clock, but may not be in sync across CPUs

  global: This clock is in sync with all CPUs but may
  	  be a bit slower than the local clock.

  counter: This is not a clock at all, but literally an atomic
  	   counter. It counts up one by one, but is in sync
	   with all CPUs. This is useful when you need to
	   know exactly the order events occurred with respect to
	   each other on different CPUs.

  uptime: This uses the jiffies counter and the time stamp
  	  is relative to the time since boot up.

  perf: This makes ftrace use the same clock that perf uses.
  	Eventually perf will be able to read ftrace buffers
	and this will help out in interleaving the data.

  x86-tsc: Architectures may define their own clocks. For
  	   example, x86 uses its own TSC cycle clock here.

  ppc-tb: This uses the powerpc timebase register value.
	  This is in sync across CPUs and can also be used
	  to correlate events across hypervisor/guest if
	  tb_offset is known.

  mono: This uses the fast monotonic clock (CLOCK_MONOTONIC)
	which is monotonic and is subject to NTP rate adjustments.

  mono_raw:
	This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
	which is montonic but is not subject to any rate adjustments
	and ticks at the same rate as the hardware clocksource.

  boot: This is the boot clock (CLOCK_BOOTTIME) and is based on the
	fast monotonic clock, but also accounts for time spent in
	suspend. Since the clock access is designed for use in
	tracing in the suspend path, some side effects are possible
	if clock is accessed after the suspend time is accounted before
	the fast mono clock is updated. In this case, the clock update
	appears to happen slightly sooner than it normally would have.
	Also on 32-bit systems, it's possible that the 64-bit boot offset
	sees a partial update. These effects are rare and post
	processing should be able to handle them. See comments in the
	ktime_get_boot_fast_ns() function for more information.

To set a clock, simply echo the clock name into this file.

  echo global > trace_clock

trace_marker:

This is a very useful file for synchronizing user space
with events happening in the kernel. Writing strings into
this file will be written into the ftrace buffer.

It is useful in applications to open this file at the start
of the application and just reference the file descriptor
for the file.

void trace_write(const char *fmt, ...)
{
	va_list ap;
	char buf[256];
	int n;

	if (trace_fd < 0)
		return;

	va_start(ap, fmt);
	n = vsnprintf(buf, 256, fmt, ap);
	va_end(ap);

	write(trace_fd, buf, n);
}

start:

	trace_fd = open("trace_marker", WR_ONLY);

trace_marker_raw:

This is similar to trace_marker above, but is meant for for binary data
to be written to it, where a tool can be used to parse the data
from trace_pipe_raw.

uprobe_events:

Add dynamic tracepoints in programs.
See uprobetracer.txt

uprobe_profile:

Uprobe statistics. See uprobetrace.txt

instances:

This is a way to make multiple trace buffers where different
events can be recorded in different buffers.
See "Instances" section below.

events:

This is the trace event directory. It holds event tracepoints
(also known as static tracepoints) that have been compiled
into the kernel. It shows what event tracepoints exist
and how they are grouped by system. There are "enable"
files at various levels that can enable the tracepoints
when a "1" is written to them.

See events.txt for more information.

set_event:

By echoing in the event into this file, will enable that event.

See events.txt for more information.

available_events:

A list of events that can be enabled in tracing.

See events.txt for more information.

hwlat_detector:

Directory for the Hardware Latency Detector.
See "Hardware Latency Detector" section below.

per_cpu:

This is a directory that contains the trace per_cpu information.

per_cpu/cpu0/buffer_size_kb:

The ftrace buffer is defined per_cpu. That is, there's a separate
buffer for each CPU to allow writes to be done atomically,
and free from cache bouncing. These buffers may have different
size buffers. This file is similar to the buffer_size_kb
file, but it only displays or sets the buffer size for the
specific CPU. (here cpu0).

per_cpu/cpu0/trace:

This is similar to the "trace" file, but it will only display
the data specific for the CPU. If written to, it only clears
the specific CPU buffer.

per_cpu/cpu0/trace_pipe

This is similar to the "trace_pipe" file, and is a consuming
read, but it will only display (and consume) the data specific
for the CPU.

per_cpu/cpu0/trace_pipe_raw

For tools that can parse the ftrace ring buffer binary format,
the trace_pipe_raw file can be used to extract the data
from the ring buffer directly. With the use of the splice()
system call, the buffer data can be quickly transferred to
a file or to the network where a server is collecting the
data.

Like trace_pipe, this is a consuming reader, where multiple
reads will always produce different data.

per_cpu/cpu0/snapshot:

This is similar to the main "snapshot" file, but will only
snapshot the current CPU (if supported). It only displays
the content of the snapshot for a given CPU, and if
written to, only clears this CPU buffer.

per_cpu/cpu0/snapshot_raw:

Similar to the trace_pipe_raw, but will read the binary format
from the snapshot buffer for t
@DawallAZllon91 DawallAZllon91 added bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed invalid This doesn't seem right question Further information is requested labels Apr 29, 2024
@DawallAZllon91 DawallAZllon91 self-assigned this Apr 29, 2024
Repository owner locked and limited conversation to collaborators Apr 29, 2024
@DawallAZllon91 DawallAZllon91 linked a pull request May 2, 2024 that will close this issue
@DawallAZllon91 DawallAZllon91 linked a pull request May 10, 2024 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.