Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc #1919

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions libcontainer/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,8 @@ service (CLOS) and each CLOS has a capacity bitmask (CBM).

Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth.
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
in MBps unit if MBA Software Controller is enabled.

It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.
Expand Down Expand Up @@ -236,7 +237,7 @@ set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth percentage.
L3 cache id and memory bandwidth.
```
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
Expand All @@ -249,6 +250,18 @@ that is allocated is also dependent on the CPU model and can be looked up at
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.

If MBA Software Controller is enabled through mount option "-o mba_MBps"
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
instead of "percentages". The kernel underneath would use a software feedback
mechanism or a "Software Controller" which reads the actual bandwidth using
MBM counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

For example, on a two-socket machine, the schema line could be
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
and 7000 MBps memory bandwidth limit on socket 1.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Expand Down
4 changes: 3 additions & 1 deletion libcontainer/configs/intelrdt.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ type IntelRdt struct {
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
L3CacheSchema string `json:"l3_cache_schema,omitempty"`

// The schema of memory bandwidth percentage per L3 cache id
// The schema of memory bandwidth per L3 cache id
// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
// The unit of memory bandwidth is specified in "percentages" by
// default, and in "MBps" if MBA Software Controller is enabled.
MemBwSchema string `json:"memBwSchema,omitempty"`
}
51 changes: 47 additions & 4 deletions libcontainer/intelrdt/intelrdt.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ import (
*
* Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
* over memory bandwidth for the software. A user controls the resource by
* indicating the percentage of maximum memory bandwidth.
* indicating the percentage of maximum memory bandwidth or memory bandwidth
* limit in MBps unit if MBA Software Controller is enabled.
*
* More details about Intel RDT CAT and MBA can be found in the section 17.18
* of Intel Software Developer Manual:
Expand Down Expand Up @@ -95,7 +96,7 @@ import (
*
* Memory bandwidth schema:
* It has allocation values for memory bandwidth on each socket, which contains
* L3 cache id and memory bandwidth percentage.
* L3 cache id and memory bandwidth.
* Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
* For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
*
Expand All @@ -106,6 +107,18 @@ import (
* min_bw + N * bw_gran. Intermediate values are rounded to the next control
* step available on the hardware.
*
* If MBA Software Controller is enabled through mount option "-o mba_MBps":
* mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
* We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
* instead of "percentages". The kernel underneath would use a software feedback
* mechanism or a "Software Controller" which reads the actual bandwidth using
* MBM counters and adjust the memory bandwidth percentages to ensure:
* "actual memory bandwidth < user specified memory bandwidth".
*
* For example, on a two-socket machine, the schema line could be
* "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
* and 7000 MBps memory bandwidth limit on socket 1.
*
* For more information about Intel RDT kernel interface:
* https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
*
Expand Down Expand Up @@ -165,6 +178,8 @@ var (
isCatEnabled bool
// The flag to indicate if Intel RDT/MBA is enabled
isMbaEnabled bool
// The flag to indicate if Intel RDT/MBA Software Controller is enabled
isMbaScEnabled bool
)

type intelRdtData struct {
Expand Down Expand Up @@ -197,7 +212,12 @@ func init() {
isCatEnabled = true
}
}
if isMbaFlagSet {
if isMbaScEnabled {
// We confirm MBA Software Controller is enabled in step 2,
// MBA should be enabled because MBA Software Controller
// depends on MBA
isMbaEnabled = true
} else if isMbaFlagSet {
if _, err := os.Stat(filepath.Join(intelRdtRoot, "info", "MB")); err == nil {
isMbaEnabled = true
}
Expand Down Expand Up @@ -232,6 +252,11 @@ func findIntelRdtMountpointDir() (string, error) {
return "", fmt.Errorf("Error found less than 3 fields post '-' in %q", text)
}

// Check if MBA Software Controller is enabled through mount option "-o mba_MBps"
if strings.Contains(postSeparatorFields[2], "mba_MBps") {
isMbaScEnabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?
Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq Thank you for kind code review.

I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?

The name "MBA Software Controller" stands for an enhancement of MBA feature in kernel. The motivation of "MBA Software Controller" in kernel is to mitigate some limitations in MBA which described in kernel documentation (search "Software Controller" in https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt for more details) and to make the interface more user friendly - we could specify memory bandwidth limit in "MBps" (Mega Bytes per second), it is more straightforward than memory bandwidth limit in "percentages" from end user's perspective.

The kernel underneath uses a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using Memory Bandwidth Monitoring (MBM, an Intel RDT monitoring feature) counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth". In other words, MBA Software Controller also makes use of percentage limit control in hardware internally.

MBA Software Controller depends on MBA and MBM hardware capabilities, If both MBA and MBM are enabled by hardware, we could enable MBA Software Controller through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.

In this PR, we add new flag "isMbaScEnabled" to indicate if MBA Software Controller is enabled. In step 2 of intelrdt.init(), this flag is set if mount option "mba_MBps" for resctrl filesystem is found during parsing /proc/self/mountinfo. In step 3 of intelrdt.init(), if "isMbaScEnabled" is true, we could mark "isMbaEnabled" true immediately. We don't need to double check if MBA is enabled through checking if /sys/fs/resctrl/info/MB/ is available.

In runc, both memory bandwidth schemata of original MBA and MBA Software Controller are in unified format: MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;.... the only difference is that the unit of memory bandwidth is specified in "percentages" by default, and in "MBps" if MBA Software Controller is enabled. So we deal with both original MBA and MBA Software Controller cases with exactly the same logic in runc.

In addition, we export function IsMbaScEnabled() to check if flag "isMbaScEnabled" is true or false. Currently it is only called by TestIntelRdtSetMemBwScSchema() in MBA Software Controller unit test in runc. But it will be useful to the caller in upper layer software (e.g., docker) in future. For example, the caller could check if MBA Software Controller is enabled. If yes, the memory bandwidth will be passed in "MBps" unit, otherwise the memory bandwidth will be passed in "percentages" unit.

}

return fields[4], nil
}
}
Expand Down Expand Up @@ -480,6 +505,11 @@ func IsMbaEnabled() bool {
return isMbaEnabled
}

// Check if Intel RDT/MBA Software Controller is enabled
func IsMbaScEnabled() bool {
return isMbaScEnabled
}

// Get the 'container_id' path in Intel RDT "resource control" filesystem
func GetIntelRdtPath(id string) (string, error) {
rootPath, err := getIntelRdtRoot()
Expand Down Expand Up @@ -633,7 +663,7 @@ func (m *IntelRdtManager) Set(container *configs.Config) error {
//
// About memory bandwidth schema:
// It has allocation values for memory bandwidth on each socket, which
// contains L3 cache id and memory bandwidth percentage.
// contains L3 cache id and memory bandwidth.
// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
// For example, on a two-socket machine, the schema line could be:
// "MB:0=20;1=70"
Expand All @@ -645,6 +675,19 @@ func (m *IntelRdtManager) Set(container *configs.Config) error {
// The available bandwidth control steps are: min_bw + N * bw_gran.
// Intermediate values are rounded to the next control step available
// on the hardware.
//
// If MBA Software Controller is enabled through mount option
// "-o mba_MBps": mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
// We could specify memory bandwidth in "MBps" (Mega Bytes per second)
// unit instead of "percentages". The kernel underneath would use a
// software feedback mechanism or a "Software Controller" which reads
// the actual bandwidth using MBM counters and adjust the memory
// bandwidth percentages to ensure:
// "actual memory bandwidth < user specified memory bandwidth".
//
// For example, on a two-socket machine, the schema line could be
// "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on
// socket 0 and 7000 MBps memory bandwidth limit on socket 1.
if container.IntelRdt != nil {
path := m.GetPath()
l3CacheSchema := container.IntelRdt.L3CacheSchema
Expand Down
38 changes: 38 additions & 0 deletions libcontainer/intelrdt/intelrdt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,41 @@ func TestIntelRdtSetMemBwSchema(t *testing.T) {
t.Fatal("Got the wrong value, set 'schemata' failed.")
}
}

func TestIntelRdtSetMemBwScSchema(t *testing.T) {
if !IsMbaScEnabled() {
return
}

helper := NewIntelRdtTestUtil(t)
defer helper.cleanup()

const (
memBwScSchemaBefore = "MB:0=5000;1=7000"
memBwScSchemeAfter = "MB:0=9000;1=4000"
)

helper.writeFileContents(map[string]string{
"schemata": memBwScSchemaBefore + "\n",
})

helper.IntelRdtData.config.IntelRdt.MemBwSchema = memBwScSchemeAfter
intelrdt := &IntelRdtManager{
Config: helper.IntelRdtData.config,
Path: helper.IntelRdtPath,
}
if err := intelrdt.Set(helper.IntelRdtData.config); err != nil {
t.Fatal(err)
}

tmpStrings, err := getIntelRdtParamString(helper.IntelRdtPath, "schemata")
if err != nil {
t.Fatalf("Failed to parse file 'schemata' - %s", err)
}
values := strings.Split(tmpStrings, "\n")
value := values[0]

if value != memBwScSchemeAfter {
t.Fatal("Got the wrong value, set 'schemata' failed.")
}
}