Skip to content
This repository was archived by the owner on Jun 30, 2025. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions fixtures/007.expected
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
at his touch of a certain icy pang along my blood. “Come, sir, said I.
at his touch ofa certain icy pang along my blood. “Come, sir, said I.
“You forget that I have not yet the pleasure of your acquaintance. Be
seated, if you please.” And I showed him an example, and sat down
myself in my customary seat and with as fair an imitation of my or-
dinary manner to a patient, as the lateness of the hour, the nature of
my preoccupations, and the horror I had of my visitor, would suffer
me to muster.

“I beg your pardon, Dr. Lanyon, he replied civilly enough. “What
“I beg your pardon, Dr. Lanyon, he replied civilly enough. “What
you say is very well founded; and my impatience has shown its heels
to my politeness. I come here at the instance of your colleague, Dr.
Henry Jekyll, on a piece of business of some moment; and I under-
stood...” He paused and put his hand to his throat, and I could see,
in spite of his collected manner, that he was wrestling against the
approaches of the hysteria—“T understood, a drawer...”
approaches of the hysteria—“I understood, a drawer...”

But here I took pity on my visitor's suspense, and some perhaps
But here I took pity on my visitors suspense, and some perhaps
on my own growing curiosity.

“There it is, sir,” said I, pointing to the drawer, where it lay on the
Expand All @@ -25,7 +25,7 @@ heart: I could hear his teeth grate with the convulsive action of his
jaws; and his face was so ghastly to see that I grew alarmed both for
his life and reason.

“Compose yourself, said I.
“Compose yourself, said I.

He turned a dreadful smile to me, and as if with the decision of
despair, plucked away the sheet. At sight of the contents, he uttered
Expand Down
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@
"release": "bumpp --commit --tag --push"
},
"dependencies": {
"tesseract.js": "^6.0.0",
"unpdf": "^0.12.1"
},
"devDependencies": {
Expand Down
69 changes: 6 additions & 63 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 17 additions & 6 deletions src/extractors/img.extractor.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { Buffer } from 'node:buffer';
import { createWorker } from 'tesseract.js';
import { exec } from 'node:child_process';
import { env } from 'node:process';
import { defineTextExtractor } from '../extractors.models';

export const imageExtractorDefinition = defineTextExtractor({
Expand All @@ -11,13 +12,23 @@
'image/gif',
],
extract: async ({ arrayBuffer }) => {
const buffer = Buffer.from(arrayBuffer);
const binary = env.LECTURE_TESSERACT_BINARY ?? 'tesseract';

const worker = await createWorker();
const { stdout } = await new Promise<{ stdout: string }>((resolve, reject) => {
const child = exec(`${binary} stdin stdout`, (error, stdout) => {
if (error) {
reject(error);
} else {
resolve({ stdout });
}
});

const { data: { text } } = await worker.recognize(buffer);
await worker.terminate();
child.stdin.write(Buffer.from(arrayBuffer));

Check failure on line 26 in src/extractors/img.extractor.ts

View workflow job for this annotation

GitHub Actions / CI - Lib

Unhandled error

Error: write EPIPE ❯ afterWriteDispatched node:internal/stream_base_commons:159:15 ❯ writeGeneric node:internal/stream_base_commons:150:3 ❯ Socket._writeGeneric node:net:964:11 ❯ Socket._write node:net:976:8 ❯ writeOrBuffer node:internal/streams/writable:572:12 ❯ _write node:internal/streams/writable:501:10 ❯ Socket.Writable.write node:internal/streams/writable:510:10 ❯ content src/extractors/img.extractor.ts:26:19 ❯ Object.extract src/extractors/img.extractor.ts:17:30 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Serialized Error: { errno: -32, code: 'EPIPE', syscall: 'write' } This error originated in "src/extractors.usecases.test.ts" test file. It doesn't mean the error was thrown inside the file itself, but while it was running. The latest test that might've caused the error is "fixture fixtures/006.png". It might mean one of the following: - The error was thrown, while Vitest was running this test. - If the error occurred after the test had been completed, this was the last documented test before it was thrown.

Check failure on line 26 in src/extractors/img.extractor.ts

View workflow job for this annotation

GitHub Actions / CI - Lib

Unhandled error

Error: write EPIPE ❯ afterWriteDispatched node:internal/stream_base_commons:159:15 ❯ writeGeneric node:internal/stream_base_commons:150:3 ❯ Socket._writeGeneric node:net:964:11 ❯ Socket._write node:net:976:8 ❯ writeOrBuffer node:internal/streams/writable:572:12 ❯ _write node:internal/streams/writable:501:10 ❯ Socket.Writable.write node:internal/streams/writable:510:10 ❯ content src/extractors/img.extractor.ts:26:19 ❯ Object.extract src/extractors/img.extractor.ts:17:30 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Serialized Error: { errno: -32, code: 'EPIPE', syscall: 'write' } This error originated in "src/extractors.usecases.test.ts" test file. It doesn't mean the error was thrown inside the file itself, but while it was running. The latest test that might've caused the error is "fixture fixtures/007.jpg". It might mean one of the following: - The error was thrown, while Vitest was running this test. - If the error occurred after the test had been completed, this was the last documented test before it was thrown.
child.stdin.end();
});

return { content: text };
return {
content: stdout,
};
},
});
Loading