Skip to content

session-manager-plugin ResumeSession fails with HTTP 403 about 60 minutes after successful StartSession when using temporary scoped credentials via aws ssm start-session #136

@rackerchandra

Description

@rackerchandra

Summary

Long-lived SSH tunnels disconnect every ~60 minutes when using temporary scoped credentials. The Session Manager plugin successfully establishes an initial StartSession, but around the one-hour mark, it attempts to reconnect/resume the data channel and fails with ResumeSession returning HTTP 403.

This is reproducible with newer session-manager-plugin versions, but not with 1.2.553.0.


Environment

  • OS: macOS darwin/arm64
  • AWS CLI: 2.15.30
  • session-manager-plugin versions:
    • 1.2.553.0 = stable ✓
    • 1.2.792.0 = fails ✗
    • 1.2.804.0 = fails ✗
  • Region: us-east-2
  • Session Type: SSH tunneling / local port forwarding to RDS through EC2 over Session Manager

Impact

  • Long-lived SSH tunnels disconnect every ~60 minutes
  • RDS client sessions drop mid-work
  • Keepalive settings on SSH do not prevent the disconnect

Exact Behavior

  1. Temporary AWS credentials are obtained and exported into the environment:

    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_SESSION_TOKEN
    • AWS_DEFAULT_REGION
  2. aws ssm start-session is launched from OpenSSH ProxyCommand

  3. Session starts successfully and data channel opens

  4. Around 60 minutes later, plugin attempts session resume/reconnect

  5. ResumeSession returns 403

  6. Plugin retries several times, then websocket closes and SSH tunnel is dropped


Minimal Logic

tempCreds = getScopedTemporaryCredentials(duration≈3600s)

export AWS_ACCESS_KEY_ID=tempCreds.accessKeyId
export AWS_SECRET_ACCESS_KEY=tempCreds.secretAccessKey
export AWS_SESSION_TOKEN=tempCreds.sessionToken
export AWS_DEFAULT_REGION=us-east-2

exec aws ssm start-session \
  --region us-east-2 \
  --target <instance-id> \
  --document-name AWS-StartSSHSession \
  --parameters portNumber=22

Results:

  • Initial session: ✓ success
  • After ~60 min:
    • Plugin tries ResumeSession
    • ResumeSession → HTTP 403
    • Plugin retries
    • Websocket closes
    • SSH tunnel disconnects

Log Evidence

Sanitized plugin logs show:

INFO Opening websocket connection ...
INFO Successfully opened websocket connection ...
INFO Connected to instance[...] on port: 22

ERROR Reach the retry limit 5 for receive messages.
ERROR Trying to reconnect the session ...
ERROR Resume Session failed: operation error SSM: ResumeSession, https response error StatusCode: 403
ERROR Failed to get token: operation error SSM: ResumeSession, https response error StatusCode: 403
ERROR Error sending stream data message websocket: close sent

More specific failing locations from plugin logs:

  • websocketchannel.go.245
  • sessionhandler.go.95
  • sessionhandler.go.171
  • sessionhandler.go.190
  • streaming.go.315

Timing

Most recent repro:

  • Session established: 2026-04-09 09:37:49 to 09:37:57
  • Failure begins: 2026-04-09 10:37:50 Pacific time

This one-hour timing is consistent across reports.


Tests Already Done

  • ✓ SSH keepalives enabled:

    • ServerAliveInterval=60
    • ServerAliveCountMax=3
    • Same failure still occurs
  • SSM_PLUGIN_SKIP_CLIENT_CONFIGURE=true tested

    • Same failure still occurs
  • ✓ Downgrading plugin to 1.2.553.0 removes the issue

  • ✓ Upgrading back to 1.2.792.0 / 1.2.804.0 reproduces the issue


Questions for AWS Clarification

  1. Was there a behavioral change in session-manager-plugin after 1.2.553.0 affecting reconnect/resume?
  2. Is ResumeSession now expected to require fresh locally available AWS credentials at reconnect time, even when the original StartSession succeeded?
  3. Is there a known regression in 1.2.764.0+, 1.2.792.0, or 1.2.804.0 around websocket rotation / resume?
  4. What exact IAM permissions and resource patterns are now required for successful ResumeSession and ssmmessages:OpenDataChannel during reconnect?
  5. Is this expected when using short-lived scoped STS credentials exported via environment variables for aws ssm start-session?

Important Observation

This does not look like an initial session establishment problem. StartSession succeeds. The failure is specifically in the plugin's reconnect/resume path about one hour later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions