Skip to content

Agent hangs if it doesn't start with the right permissions #111

@mwringe

Description

@mwringe

When starting the agent with the proper permissions, it will throw the following error in the logs and hang:

1 node_event_consumer.go:72] Error obtaining information about the agent pod [openshift-infra/hawkular-openshift-agent-qzg21]. err=User "system:serviceaccount:openshift-infra:hawkular-openshift-agent" cannot get pods in project "openshift-infra"

If the SA is given the proper permissions, the pod will still hang. If the pod is restarted it will startup properly.

By hanging like this, its left in a position where its indicating that its ready and running properly (status 1/1). At the very least, if it cannot properly continue, it should exit so that a new pod can be started in its place.

In this case, I believe the agent should wait and attempt to connect a few more times after some delay. We could even use a 'readiness probe' here to determine when the agent reaches a ready state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions