feat(parse): add multimodal parser semantic artifact MVP by yeyitech · Pull Request #1555 · volcengine/OpenViking

yeyitech · 2026-04-17T18:11:16Z

Summary

upgrade image/audio/video parsers to write semantic sidecars instead of metadata-only placeholder roots
reuse lightweight OCR/VLM/ASR-style helpers when available and emit graceful fallback text when they are not
add focused parser tests covering semantic artifacts and fallback outputs for multimodal media

Testing

python -m py_compile openviking/parse/parsers/media/image.py openviking/parse/parsers/media/audio.py openviking/parse/parsers/media/video.py tests/parse/test_media_parser_semantics.py
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest --noconftest -q -o addopts= tests/parse/test_media_parser_semantics.py

CLAassistant · 2026-04-17T18:11:23Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Codex seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2026-04-17T18:12:25Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 88
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add semantic artifacts for image parser Relevant files: openviking/parse/parsers/media/image.py tests/parse/test_media_parser_semantics.py Sub-PR theme: Add semantic artifacts for audio parser Relevant files: openviking/parse/parsers/media/audio.py tests/parse/test_media_parser_semantics.py Sub-PR theme: Add semantic artifacts for video parser Relevant files: openviking/parse/parsers/media/video.py tests/parse/test_media_parser_semantics.py
⚡ Recommended focus areas for review Private API Usage VideoParser calls private methods (_asr_transcribe_with_timestamps, _asr_transcribe, _vlm_describe) from AudioParser and ImageParser. This creates tight coupling and may break if those internal implementations change. audio_parser = AudioParser() timestamped = await audio_parser._asr_transcribe_with_timestamps(audio_bytes, None) if timestamped: return "timestamped", timestamped plain = await audio_parser._asr_transcribe(audio_bytes, None) if plain: return ("plain" if "unavailable" not in plain.lower() and "failed" not in plain.lower() else "unavailable"), plain return "unavailable", "Audio transcription unavailable: extracted audio track produced no transcript." async def _extract_audio_track(self, file_path: Path) -> Optional[bytes]: """Extract a WAV audio track using ffmpeg when available.""" ffmpeg_path = shutil.which("ffmpeg") if not ffmpeg_path: return None def _extract() -> Optional[bytes]: result = subprocess.run( [ ffmpeg_path, "-y", "-i", str(file_path), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-f", "wav", "pipe:1", ], capture_output=True, check=True, ) return result.stdout or None try: return await asyncio.to_thread(_extract) except Exception as exc: logger.debug("ffmpeg audio extraction failed for %s: %s", file_path, exc) return None async def _describe_preview_frame( self, preview_bytes: bytes, instruction: str = "", ) -> Optional[str]: """Describe the preview frame through the existing image VLM helper.""" image_parser = ImageParser() if not image_parser._is_vlm_available(): return None description = await image_parser._vlm_describe( preview_bytes, model=None, instruction=instruction, ) Unvalidated Magic Bytes Audio signature validation uses b"\x00\x00\x00" as a magic byte for .m4a, which will incorrectly match any file starting with three null bytes, regardless of actual format. ".m4a": [b"\x00\x00\x00", b"ftypM4A", b"ftypisom"], ".opus": [b"OggS"], } ext_lower = file_path.suffix.lower() magic_list = audio_magic_bytes.get(ext_lower, []) if not any( len(audio_bytes) >= len(magic) and audio_bytes.startswith(magic) for magic in magic_list ): raise ValueError( f"Invalid audio file: {file_path}. " f"File signature does not match expected format {ext_lower}" ) Unvalidated Magic Bytes Video signature validation uses b"\x00\x00\x00" as a magic byte for .mp4/.mov, which will incorrectly match any file starting with three null bytes, regardless of actual format. video_magic_bytes = { ".mp4": [b"\x00\x00\x00", b"ftyp"], ".avi": [b"RIFF"], ".mov": [b"\x00\x00\x00", b"ftyp"], ".mkv": [b"\x1a\x45\xdf\xa3"], ".webm": [b"\x1a\x45\xdf\xa3"], ".flv": [b"FLV"], ".wmv": [b"\x30\x26\xb2\x75\x8e\x66\xcf\x11"], } ext_lower = file_path.suffix.lower() magic_list = video_magic_bytes.get(ext_lower, []) if not any( len(video_bytes) >= len(magic) and video_bytes.startswith(magic) for magic in magic_list ): raise ValueError( f"Invalid video file: {file_path}. " f"File signature does not match expected format {ext_lower}" )

github-actions · 2026-04-17T18:13:38Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Api resilience	Add timeouts to OpenAI ASR requests Add explicit timeout configuration to the OpenAI client and transcription requests to avoid hanging indefinitely. The OpenAI Python client supports timeouts at both the client and request levels. openviking/parse/parsers/media/audio.py [310-319] +client_kwargs["timeout"] = 60.0 # Add default timeout for all client requests client = openai.OpenAI(**client_kwargs) with tempfile.NamedTemporaryFile(mode="wb", suffix=".wav", delete=False) as temp_file: temp_file.write(audio_bytes) temp_file_path = temp_file.name with open(temp_file_path, "rb") as audio_file: response = client.audio.transcriptions.create( model=model_name, file=audio_file, language=self.config.language, + timeout=120.0, # Add explicit request-level timeout for transcription ) Suggestion importance[1-10]: 5 __ Why: Adding timeouts to OpenAI client and requests improves resilience by avoiding indefinite hangs. This is a moderate robustness improvement with clear value.	Low

feat(parse): add multimodal parser semantic artifact MVP

4307ad2

github-project-automation bot added this to OpenViking project Apr 17, 2026

github-project-automation bot moved this to Backlog in OpenViking project Apr 17, 2026

github-actions bot added the Review effort 3/5 label Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parse): add multimodal parser semantic artifact MVP#1555

feat(parse): add multimodal parser semantic artifact MVP#1555
yeyitech wants to merge 1 commit intovolcengine:mainfrom
yeyitech:feat/issue-372-multimodal-parser-mvp

yeyitech commented Apr 17, 2026

Uh oh!

CLAassistant commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeyitech commented Apr 17, 2026

Summary

Testing

Uh oh!

CLAassistant commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 17, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants