Instance labels missing from pmseries, part of the API and downstream consumers (proxy, pcp-grafana)

I'm writing a PMDA, and it works as expected, but instance label propagation seems inconsistent in PCP. This affects my PMDA, but also standard ones. It seems like some logic leads to the REST API not returning instance labels, which in turn causes errors in grafana-pcp and other downstream consumers.

System info:

* PCP 7.0.3 (installed from packagecloud repo)
* OS: ubuntu/noble
* pmproxy: Running as systemd service on port 44322

Can't share the PMDA code (it's platform specific anyway), but I'll show the behavior and also how to repro with a bundled PMDA.

The gist is that the following tools show all expected instance labels:

* `pminfo --labels`
* `dbpmda` with the `label` subcommand
* `/metrics` endpoint

But they're missing from:

* REST API (PMAPI)
* `pmseries`
  * and downstream pcp-grafana

Working through this with my own PMDA, which is exposing low-level AMD EPYC metrics:

```c
static int
esmi_labelCallBack(pmInDom indom, unsigned int inst, pmLabelSet **lp)
{
    int serial;

    if (indom == PM_INDOM_NULL)
        return 0;

    serial = pmInDom_serial(indom);

    /* Add disp_instance label for socket indom */
    if (serial == SOCKET_INDOM) {
        if (inst < num_sockets && socket_names[inst] != NULL) {
            return pmdaAddLabels(lp, "{\"disp_instance\":\"%s\"}",
                                 socket_names[inst]);
        }
    }

    /* Add disp_instance, die_id, and socket_id labels for core indom */
    if (serial == CORE_INDOM) {
        if (inst < num_cores && core_names[inst] != NULL) {
            return pmdaAddLabels(lp, "{\"disp_instance\":\"%s\",\"die_id\":%d,\"socket_id\":%d}",
                                 core_names[inst], core_die_id[inst], core_socket_id[inst]);
        }
    }

    return 0;
}
```

All core-scope metrics get three instance labels to localize them on the chiplets and sockets. The PMDA installs and runs without warnings or errors.

Poking via dbpmda:

```bash
echo 'open pipe /var/lib/pcp/pmdas/esmi/pmdaesmi -d 470
label instances 470.1' | sudo dbpmda
```

output:

```text
Start pmdaesmi PMDA: /var/lib/pcp/pmdas/esmi/pmdaesmi -d 470
Instances of pmInDom: 470.1
[  0] Labels inst: 0
    die_id=0
    disp_instance="core0"
    socket_id=0
[  1] Labels inst: 1
    die_id=0
    disp_instance="core1"
    socket_id=0
[  2] Labels inst: 2
    die_id=0
    disp_instance="core2"
    socket_id=0
    ...
```

Then pminfo:

```bash
pminfo --labels esmi.energy.core | head -n5
```

output:

```text
esmi.energy.core
    labels {"agent":"esmi","device_type":"cpu_core","domainname":"localdomain","groupid":1000,"hostname":"aitop","indom_name":"per core","machineid":"4542127d93e1480e823cf51ba57d25a3","userid":1000}
    inst [0 or "core0"] labels {"agent":"esmi","device_type":"cpu_core","die_id":0,"disp_instance":"core0","domainname":"localdomain","groupid":1000,"hostname":"aitop","indom_name":"per core","machineid":"4542127d93e1480e823cf51ba57d25a3","socket_id":0,"userid":1000}
    inst [1 or "core1"] labels {"agent":"esmi","device_type":"cpu_core","die_id":0,"disp_instance":"core1","domainname":"localdomain","groupid":1000,"hostname":"aitop","indom_name":"per core","machineid":"4542127d93e1480e823cf51ba57d25a3","socket_id":0,"userid":1000}
```

**Note:** `die_id`, `disp_instance`, and `socket_id` are present in instance labels.

Metrics endpoint:

```bash
curl -s http://localhost:44322/metrics | grep esmi_energy_core | head -5
```

output:

```text
# HELP esmi_energy_core Cumulative core energy consumption in Joules
# TYPE esmi_energy_core counter
esmi_energy_core{disp_instance="core0",agent="esmi",indom_name="per core",hostname="aitop",instid="0",instname="core0",machineid="4542127d93e1480e823cf51ba57d25a3",domainname="localdomain",die_id="0",socket_id="0",device_type="cpu_core"} 383191.321868
esmi_energy_core{disp_instance="core1",agent="esmi",indom_name="per core",hostname="aitop",instid="1",instname="core1",machineid="4542127d93e1480e823cf51ba57d25a3",domainname="localdomain",die_id="0",socket_id="0",device_type="cpu_core"} 339390.282028
esmi_energy_core{disp_instance="core2",agent="esmi",indom_name="per core",hostname="aitop",instid="2",instname="core2",machineid="4542127d93e1480e823cf51ba57d25a3",domainname="localdomain",die_id="0",socket_id="0",device_type="cpu_core"} 75834.41345199999
```

Still there ...

Now the instance domain REST API:

```bash
curl -s 'http://localhost:44322/pmapi/indom?indom=470.1' | jq | head -n30
```

Output:

```json
{
  "context": 184337792,
  "indom": "470.1",
  "labels": {
    "device_type": "cpu_core",
    "domainname": "localdomain",
    "hostname": "aitop",
    "indom_name": "per core",
    "machineid": "4542127d93e1480e823cf51ba57d25a3"
  },
  "text-oneline": "Instance domain \"core\" for ESMI PMDA",
  "text-help": "One instance per physical CPU core detected by the ESMI library.\nInstances are named \"core0\", \"core1\", etc.\nNote: With SMT enabled, sibling threads share the same core.",
  "instances": [
    {
      "instance": 11,
      "name": "core11",
      "labels": {
        "domainname": "localdomain",
        "hostname": "aitop",
        "machineid": "4542127d93e1480e823cf51ba57d25a3"
      }
    },
    {
      "instance": 23,
      "name": "core23",
      "labels": {
        "domainname": "localdomain",
        "hostname": "aitop",
        "machineid": "4542127d93e1480e823cf51ba57d25a3"
      }
```

Instance labels are missing.

Now the metric endpoint:

```bash
curl -s 'http://localhost:44322/pmapi/metric?names=esmi.energy.core' | jq
```

Output:

```json
{
  "context": 1243538900,
  "metrics": [
    {
      "name": "esmi.energy.core",
      "series": "a3f7748a41b1aca855f561e934ab092b2a78428f",
      "pmid": "470.5.0",
      "indom": "470.1",
      "type": "double",
      "sem": "counter",
      "units": "none",
      "labels": {
        "agent": "esmi",
        "device_type": "cpu_core",
        "domainname": "localdomain",
        "hostname": "aitop",
        "indom_name": "per core",
        "machineid": "4542127d93e1480e823cf51ba57d25a3"
      },
      "text-oneline": "Cumulative core energy consumption in Joules",
      "text-help": "The cumulative energy consumption of each physical core in Joules.[...]"
    }
  ]
}
```

Now this isn't specific to my PMDA, for example I noticed missing labels for a NVIDIA, disk, and others, so dashboards/pmrpoxy queries lack important context.

For example, there's a `device_name` here:

```bash
pminfo --labels disk.dev.read 2>/dev/null | grep device_name | head -n1
#     inst [0 or "nvme3n1"] labels {"agent":"linux","device_name":"nvme3n1","device_type":"block","domainname":"localdomain","groupid":1000,"hostname":"aitop","indom_name":"per disk","machineid":"4542127d93e1480e823cf51ba57d25a3","userid":1000}
```

But not here:

```bash
pmseries -l `pmseries 'disk.dev.read'` 2>&1 | grep nvme3n1
#    inst [0 or "nvme3n1"] labels {"agent":"linux","device_type":"block","domainname":"localdomain","groupid":986,"hostname":"aitop","indom_name":"per disk","machineid":"4542127d93e1480e823cf51ba57d25a3","userid":997}
```

nor here:

```bash
curl -s 'http://localhost:44322/pmapi/indom?indom=60.1' | jq '.instances[] | select(.name == "nvme3n1")'
# {
#   "instance": 0,
#   "name": "nvme3n1",
#   "labels": {
#     "domainname": "localdomain",
#     "hostname": "aitop",
#     "machineid": "4542127d93e1480e823cf51ba57d25a3"
#   }
# }
```

And another example from NVIDIA:

```bash
pminfo --labels nvidia.power

# nvidia.power
#    labels {"agent":"nvidia","device_type":"gpu","domainname":"localdomain","groupid":1000,"hostname":"aitop","indom_name":"per gpu","machineid":"4542127d93e1480e823cf51ba57d25a3","units":"milliwatts","userid":1000}
#    inst [0 or "gpu0"] labels {"agent":"nvidia","device_type":"gpu","domainname":"localdomain","gpu":0,"groupid":1000,"hostname":"aitop","indom_name":"per gpu","machineid":"4542127d93e1480e823cf51ba57d25a3","units":"milliwatts","userid":1000,"uuid":"GPU-7235b8ec-cfc6-c44b-967f-c404e4564320"}
#    inst [1 or "gpu1"] labels {"agent":"nvidia","device_type":"gpu","domainname":"localdomain","gpu":1,"groupid":1000,"hostname":"aitop","indom_name":"per gpu","machineid":"4542127d93e1480e823cf51ba57d25a3","units":"milliwatts","userid":1000,"uuid":"GPU-4db1918f-1bb1-7200-983c-0ceb1ae0c7b3"}
#    inst [2 or "gpu2"] labels {"agent":"nvidia","device_type":"gpu","domainname":"localdomain","gpu":2,"groupid":1000,"hostname":"aitop","indom_name":"per gpu","machineid":"4542127d93e1480e823cf51ba57d25a3","units":"milliwatts","userid":1000,"uuid":"GPU-bb8efb89-eacc-8ef4-fd19-4ca522115940"}
```

`gpu` exists as a label here, but not the in API, the proxy, Redis or others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Instance labels missing from pmseries, part of the API and downstream consumers (proxy, pcp-grafana) #2418

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Instance labels missing from pmseries, part of the API and downstream consumers (proxy, pcp-grafana) #2418

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions