Skip to content

Issue: Cookies Not Persisting Between Workflows in Persistent Browser Sessions #3897

@fillipe-revido

Description

@fillipe-revido

Problem Description

When using persistent browser sessions, cookies are not persisting between workflow runs. After executing a login workflow successfully, subsequent workflows using the same browser_session_id are not reusing the cookies, resulting in "NOT_LOGGED_IN" errors.

Expected Behavior

When a persistent browser session is created and a login workflow is executed:

  1. Cookies should be saved to a persistent user_data_dir (e.g., /app/skyvern/persistent_browser_sessions/{org_id}/{session_id})
  2. When a subsequent workflow uses the same browser_session_id, it should:
    • Connect to the existing browser via CDP using the stored browser_address
    • Use the same persistent user_data_dir to load cookies
    • Preserve authentication state between workflows

Actual Behavior

  1. Session Creation: The persistent session is created correctly with the right user_data_dir:

    [persistent_sessions_manager.py:453] Using persistent user_data_dir for session
    browser_session_id=pbs_457827170959729696
    user_data_dir=/app/skyvern/persistent_browser_sessions/o_454165536444531154/pbs_457827170959729696
    browser_address=None
    
  2. Browser Address Persisted: The browser address is correctly stored in the database:

    [persistent_sessions_manager.py:376] Persisted browser address for session
    browser_session_id=pbs_457827170959729696
    browser_address=http://127.0.0.1:9222
    
  3. Workflow Execution Issue: When a workflow run is created for this session, it's creating a new browser state with a temporary user_data_dir instead of using the persistent one:

    [browser_manager.py:215] Creating browser state for workflow run
    workflow_run_id=wr_457827265449010210
    
    [browser_factory.py:557] Using temporary user_data_dir
    user_data_dir=./temp/skyvern_browser_3gaxhz4_
    

Root Cause Analysis

The issue appears to be in browser_manager.py -> get_or_create_for_workflow_run():

  1. When get_browser_state() from PersistentSessionsManager returns None (browser state not in memory)
  2. The code creates a new browser state via _create_browser_state()
  3. At this point, workflow_run.browser_address is None, so it doesn't retrieve the browser_address from the database
  4. Without browser_address, the code falls back to creating a new browser instead of connecting via CDP
  5. Without explicit browser_session_id propagation in some code paths, user_data_dir is not calculated correctly, resulting in a temporary directory

Attempted Fixes

We've tried several approaches:

  1. Modified _create_headless_chromium and _create_headful_chromium to calculate user_data_dir from browser_session_id when not explicitly provided
  2. Added logic to retrieve browser_address from database when browser_session_id is present but workflow_run.browser_address is None
  3. Added retry logic in _connect_to_cdp_browser to find existing contexts

However, we're still seeing the "Using temporary user_data_dir" log, indicating the issue persists.

Relevant Logs

Session Creation (Correct):

[persistent_sessions_manager.py:453] Using persistent user_data_dir for session
browser_session_id=pbs_457827170959729696
user_data_dir=/app/skyvern/persistent_browser_sessions/o_454165536444531154/pbs_457827170959729696

[browser_factory.py:536] Using provided user_data_dir for persistent session
user_data_dir=/app/skyvern/persistent_browser_sessions/o_454165536444531154/pbs_457827170959729696

[persistent_sessions_manager.py:376] Persisted browser address for session
browser_session_id=pbs_457827170959729696
browser_address=http://127.0.0.1:9222

Workflow Execution (Problem):

[browser_manager.py:215] Creating browser state for workflow run
workflow_run_id=wr_457827265449010210

[browser_factory.py:557] Using temporary user_data_dir
user_data_dir=./temp/skyvern_browser_3gaxhz4_

Questions

  1. Has anyone else encountered this cookie persistence issue with persistent browser sessions?
  2. Is there a recommended pattern for ensuring browser_session_id is correctly propagated throughout the browser creation chain?
  3. Should we always retrieve browser_address from the database when browser_session_id is present, even if workflow_run.browser_address exists?

Environment

  • Skyvern version: Latest (Docker deployment)
  • Python version: As per Docker image
  • Playwright version: As per Skyvern dependencies

We're willing to pay someone to help us hourly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions