---
title: 拓扑管理故障排查
weight: 60
content_type: task
---
<!--
title: Troubleshooting Topology Management
weight: 60
content_type: task
-->

<!-- overview -->

<!--
Kubernetes keeps many aspects of how pods execute on nodes abstracted
from the user. This is by design. However, some workloads require
stronger guarantees in terms of latency and/or performance in order to operate
acceptably. The `kubelet` provides methods to enable more complex workload
placement policies while keeping the abstraction free from explicit placement
directives.
-->
Kubernetes 从用户角度将 Pod 在节点上的执行方式进行大量抽象。这是一种有意的设计。
然而，一些工作负载为了能够正常运行，需要在延迟和/或性能方面获得更强的保障。
`kubelet` 提供了一些方法，在不引入显式调度指令的前提下，实现更复杂的工作负载调度策略。

<!--
You can manage topology within nodes. This means helping the kubelet to configure the host operating system so that
Pods and containers are placed on the correct side of inner boundaries, such as _NUMA domains_. (NUMA is an abbreviation
of _non-uniform memory access_, and refers to an idea that CPUs might be topologically closer to specific regions of
memory, due to the physical layout of the hardware components and the way that these are connected).
-->
你可以在节点内管理拓扑结构。这意味着帮助 kubelet 配置主机操作系统，使 Pod 和容器能够被调度在正确的内部边界（例如 **NUMA 域**）上。
（NUMA 是 **non-uniform memory access** 的缩写，指的是由于硬件组件的物理布局及其连接方式，CPU 在拓扑上可能更接近某些特定的内存区域。）

<!--
## Sources of troubleshooting information

You can use the following means to troubleshoot the reason why a pod could not be deployed or
became rejected at a node, in the context of topology management:
-->
## 故障排查信息来源  {#sources-of-troubleshooting-information}

在拓扑管理的上下文中，你可以通过以下方式排查为什么 Pod 无法被部署或在节点上被拒绝的原因：

<!--
- _Pod status_ - indicates topology affinity errors
- _system logs_ - include valuable information for debugging; for example, about generated hints
- _kubelet state file_ - the dump of internal state of the Memory Manager
  (including the _node map_ and _memory maps_)
- You can use the [device plugin resource API](#device-plugin-resource-api)
  to retrieve information about the memory reserved for containers
-->
- **Pod 状态** —— 指示拓扑亲和性相关错误
- **系统日志** —— 包含用于调试的重要信息，例如生成的提示
- **kubelet 状态文件** —— Memory Manager 的内部状态转储（包括 **node map** 和 **memory maps**）
- 你可以使用[设备插件资源 API](#device-plugin-resource-api) 获取容器预留内存的信息

<!--
## Troubleshoot `TopologyAffinityError` {#TopologyAffinityError}

This error typically occurs in the following situations:

* a node has not enough resources available to satisfy the pod's request
* the pod's request is rejected due to particular Topology Manager policy constraints

The error appears in the status of a pod:
-->
## 排查 `TopologyAffinityError` {#TopologyAffinityError}

此错误通常出现在以下情况下：

* 节点可用资源不足，无法满足 Pod 的请求
* Pod 的请求由于 Topology Manager 策略约束而被拒绝

此错误显示在 Pod 的状态中：

```shell
kubectl get pods
```

```none
NAME         READY   STATUS                  RESTARTS   AGE
guaranteed   0/1     TopologyAffinityError   0          113s
```

<!--
Use `kubectl describe pod <id>` or `kubectl events` to obtain a detailed error message:
-->
使用 `kubectl describe pod <id>` 或 `kubectl events` 获取详细错误消息：

```none
Warning  TopologyAffinityError  10m   kubelet, dell8  Resources cannot be allocated with Topology locality
```

<!--
## Examine system logs

Search system logs with respect to a particular pod.

The set of hints generated by CPU Manager should be present in the logs.
Also, the set of hints that Memory Manager generated for the pod can be found in the logs.

Topology Manager merges these hints to calculate a single best hint.
The best hint should also be present in the logs.
-->
## 检查系统日志  {#examine-system-logs}

搜索有关特定 Pod 的系统日志。

CPU Manager 生成的提示集合应出现在日志中。
另外，Memory Manager 为 Pod 生成的提示也可以在日志中找到。

Topology Manager 合并这些提示，计算出一个最佳提示。这个最佳提示也应出现在日志中。

<!--
The best hint indicates where to allocate all the resources.
Topology Manager tests this hint against its current policy, and based on the verdict,
it either admits the pod to the node or rejects it.

Also, search the logs for occurrences associated with the Memory Manager;
for example to find out information about `cgroups` and `cpuset.mems` updates.
-->
最佳提示指示所有资源应如何分配。Topology Manager 根据其当前策略来测试此提示，
并基于结果决定是允许 Pod 调度到节点，或是拒绝将 Pod 调度到节点。

此外，还可以在日志中查找与 Memory Manager 相关的记录，例如找出与 `cgroups` 和 `cpuset.mems` 更新有关的信息。

<!--
## Examples

### Examine the memory manager state on a node

Let us first deploy a sample `Guaranteed` pod whose specification is as follows:
-->
## 示例  {#examples}

### 检查节点上的 Memory Manager 状态

首先，部署一个示例 `Guaranteed` Pod，其规约如下：

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed
spec:
  containers:
  - name: guaranteed
    image: consumer
    imagePullPolicy: Never
    resources:
      limits:
        cpu: "2"
        memory: 150Gi
      requests:
        cpu: "2"
        memory: 150Gi
    command: ["sleep","infinity"]
```

<!--
Next, log into the node where it was deployed and examine the state file in
`/var/lib/kubelet/memory_manager_state`:
-->
接下来，登录到部署 Pod 的节点，并检查 `/var/lib/kubelet/memory_manager_state` 中的状态文件：

```json
{
   "policyName":"Static",
   "machineState":{
      "0":{
         "numberOfAssignments":1,
         "memoryMap":{
            "hugepages-1Gi":{
               "total":0,
               "systemReserved":0,
               "allocatable":0,
               "reserved":0,
               "free":0
            },
            "memory":{
               "total":134987354112,
               "systemReserved":3221225472,
               "allocatable":131766128640,
               "reserved":131766128640,
               "free":0
            }
         },
         "nodes":[
            0,
            1
         ]
      },
      "1":{
         "numberOfAssignments":1,
         "memoryMap":{
            "hugepages-1Gi":{
               "total":0,
               "systemReserved":0,
               "allocatable":0,
               "reserved":0,
               "free":0
            },
            "memory":{
               "total":135286722560,
               "systemReserved":2252341248,
               "allocatable":133034381312,
               "reserved":29295144960,
               "free":103739236352
            }
         },
         "nodes":[
            0,
            1
         ]
      }
   },
   "entries":{
      "fa9bdd38-6df9-4cf9-aa67-8c4814da37a8":{
         "guaranteed":[
            {
               "numaAffinity":[
                  0,
                  1
               ],
               "type":"memory",
               "size":161061273600
            }
         ]
      }
   },
   "checksum":4142013182
}
```

<!--
It can be deduced from the state file that the pod was pinned to both NUMA nodes, i.e.:
-->
可以从状态文件中推断，此 Pod 被固定到两个 NUMA 节点上，即：

```json
"numaAffinity":[
   0,
   1
],
```

<!--
Pinned term means that pod's memory consumption is constrained (through `cgroups` configuration)
to these NUMA nodes.

This automatically implies that Memory Manager instantiated a new group that
comprises these two NUMA nodes, i.e. `0` and `1` indexed NUMA nodes.

In order to analyse memory resources available in a group,the corresponding entries from
NUMA nodes belonging to the group must be added up.
-->
“Pinned（固定）”一词表示，Pod 的内存消耗被约束（通过 `cgroups` 配置）在这些 NUMA 节点上。

这也意味着 Memory Manager 自动创建了一个新的组，该组包含这两个 NUMA 节点，即索引为 `0` 和 `1` 的 NUMA 节点。

为了分析一个组中的可用内存资源，需要将该组内所有 NUMA 节点的对应条目进行累加。

<!--
For example, the total amount of free "conventional" memory in the group can be computed
by adding up the free memory available at every NUMA node in the group,
i.e., in the `"memory"` section of NUMA node `0` (`"free":0`) and NUMA node `1` (`"free":103739236352`).
So, the total amount of free "conventional" memory in this group is equal to `0 + 103739236352` bytes.

The line `"systemReserved":3221225472` indicates that the administrator of this node reserved
`3221225472` bytes (i.e. `3Gi`) to serve kubelet and system processes at NUMA node `0`,
by using `--reserved-memory` flag.
-->
例如，组中“常规”内存的总空闲量可以通过将组内每个 NUMA 节点的空闲内存相加得到，
即 NUMA 节点 `0` 的 `"memory"` 部分（`"free": 0`）和 NUMA 节点 `1` 的 `"memory"` 部分（`"free": 103739236352`）。
因此，此组中“常规”内存的总空闲量为 `0 + 103739236352` 字节。

`"systemReserved": 3221225472` 这一行表示节点管理员通过 `--reserved-memory` 参数，在
NUMA 节点 `0` 上为 kubelet 和系统进程预留了 `3221225472` 字节（即 `3Gi`）的内存。

<!--
## Check the device plugin resource API {#device-plugin-resource-api}

The kubelet provides a `PodResourceLister` gRPC service to enable discovery of resources and associated metadata.
By using its [List gRPC endpoint](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#grpc-endpoint-list),
information about reserved memory for each container can be retrieved, which is contained
in protobuf `ContainerMemory` message.

This information can be retrieved solely for pods in Guaranteed QoS class.
-->
## 检查设备插件资源 API {#device-plugin-resource-api}

kubelet 提供了一个名为 `PodResourceLister` 的 gRPC 服务，用于发现资源及其相关元数据。通过其
[List gRPC 接口](/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#grpc-endpoint-list)，
可以获取每个容器的预留内存信息，这些信息包含在 protobuf 的 `ContainerMemory` 消息中。

这些信息仅适用于 Guaranteed QoS 类的 Pod。
