Skip to content

fix(context): always return inner dict from process_sources#54

Open
0xghost42 wants to merge 1 commit into
sentient-agi:mainfrom
0xghost42:fix/process-sources-return-type
Open

fix(context): always return inner dict from process_sources#54
0xghost42 wants to merge 1 commit into
sentient-agi:mainfrom
0xghost42:fix/process-sources-return-type

Conversation

@0xghost42

Copy link
Copy Markdown

Summary

SourceProcessor.process_sources is invoked with a SearchResult from serp_search.get_sources and its return value is fed straight into build_context, which reads .get('organic') / .get('topStories') / .get('answerBox') on a plain dict (context_building/build_context.py:55-90).

Three of the four return paths inside process_sources already unwrap the inner dict:

return sources.data
return self._update_sources_with_content(sources.data, valid_sources, html_contents, query)

The other two leak the SearchResult wrapper:

if not valid_sources:
    return sources                 # <-- SearchResult
except Exception as e:
    ...
    return sources                 # <-- SearchResult

Whenever the SERP call returns no organic results (empty query, rate limiting, quota exhaustion, etc.) or the scrape/rerank step raises, build_context then crashes with:

AttributeError: 'SearchResult' object has no attribute 'get'

This is the same error surfaced in #15. The reporter's root cause turned out to be an install issue (No module named 'src') that ended up in the except branch, but the underlying inconsistency is real — any future exception inside process_sources will surface as the same misleading AttributeError downstream rather than being logged and recovered.

Change

src/opendeepsearch/context_building/process_sources_pro.py:

  • Unwrap sources.data in both leaking branches, falling back to {} when data is None (the error-result case — SearchResult.__init__ leaves data=None when constructed with error=...).
  • build_context already tolerates an empty dict (its extract_information(sources_result.get('organic', [])) chain returns '' cleanly), so the agent now degrades gracefully instead of crashing.
  • Return annotation updated from List[dict] to dict to match the actual contract. sources parameter annotation dropped — it was typed List[dict] but is in fact a SearchResult, and importing SearchResult into this module just for the hint is more churn than the fix warrants.

Verification

  • python -m py_compile src/opendeepsearch/context_building/process_sources_pro.py clean.
  • Traced both leaking paths to build_context and confirmed they previously triggered AttributeError: 'SearchResult' object has no attribute 'get'; after the fix, build_context receives {} and returns ''.
  • The three already-correct paths (return sources.data, return sources.data via wiki branch, return self._update_sources_with_content(sources.data, ...)) are unchanged.

Out of scope

The wider type-annotation mismatch (_get_valid_sources / _update_sources_with_content declare List[dict] but operate on the inner SERP dict) is left for a separate cleanup PR — the runtime bug is what this PR addresses.

process_sources() is fed a SearchResult from serp_search.get_sources but
its consumer build_context() reads .get('organic') / .get('topStories')
on a plain dict (build_context.py:55-90). Three of the four return paths
already unwrap the inner dict via sources.data:

    return sources.data
    return self._update_sources_with_content(sources.data, ...)

The remaining two paths return the SearchResult wrapper instead:

    if not valid_sources:
        return sources                 # <-- SearchResult
    except Exception as e:
        ...
        return sources                 # <-- SearchResult

Whenever the SERP response has no organic results (empty query, rate
limit, quota exhaustion) or scraping/reranking raises, build_context()
hits the wrapper and crashes with

    AttributeError: 'SearchResult' object has no attribute 'get'

mirroring the report in issue sentient-agi#15. The reporter's root cause turned out
to be an install issue ('No module named src') but the underlying
inconsistency is real: any exception inside process_sources surfaces as
this misleading AttributeError downstream.

Unwrap to sources.data in both branches, falling back to {} when data is
None (the error-result case from SerperAPI/SearXNG). build_context()
already tolerates an empty dict (returns ''), so the agent now degrades
gracefully instead of crashing.

Return annotation updated from List[dict] to dict to match the actual
contract; sources parameter annotation dropped (was List[dict], actually
SearchResult — leaving it untyped rather than importing SearchResult
into this module just for the hint).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant