DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON #1052
-
I'm building a web app to transcribe audio from Youtube videos to text with Deepgram SDK 3.9. I'm using NextJS 14.2, React 18.2, Node 20.17 and Typescript. I'm using yt-dlp to download audio from Youtube (via a user entering the URL), which is then streamed directly to Deepgram for transcription. The problem I'm having is that transcriptions are failing and I'm getting this error:
Full error log and file code below. Let me know if you need any more info. I've been stuck on this for a while so any help you can offer would be appreciated. Thanks in advance. Here's the full response from the terminal: Starting Deepgram transcription...
yt-dlp error: [youtube] Extracting URL: https://www.youtube.com/watch?v=GOXiqUKiGQc&pp=ygUMcG9kY2FzdCBjbGlw
[youtube] GOXiqUKiGQc: Downloading webpage
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading ios player API JSON
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading tv player API JSON
Deepgram transcription error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
__dgError: true,
originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
at JSON.parse (<anonymous>)
at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
at successSteps (node:internal/deps/undici/undici:5454:27)
at fullyReadBody (node:internal/deps/undici/undici:4381:9)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
Transcription processing error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
__dgError: true,
originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
at JSON.parse (<anonymous>)
at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
at successSteps (node:internal/deps/undici/undici:5454:27)
at fullyReadBody (node:internal/deps/undici/undici:4381:9)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading player dd017f77
Background processing error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
__dgError: true,
originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
at JSON.parse (<anonymous>)
at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
at successSteps (node:internal/deps/undici/undici:5454:27)
at fullyReadBody (node:internal/deps/undici/undici:4381:9)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading m3u8 information
yt-dlp error: [info] GOXiqUKiGQc: Downloading 1 format(s): 251
yt-dlp error: [download] Destination: -
yt-dlp error: [download] Download completed Here's the transcription process flow I'm using: Initial Request (route.ts):
// 1. User sends POST request with YouTube URL
// 2. Verify user authentication with Clerk
// 3. Call startTranscriptionProcess with:
// - YouTube URL
// - User ID
// - Supabase client
const transcriptionId = await startTranscriptionProcess(youtubeUrl, userId, supabase);
Start Process (utils.ts):
// 1. Generate unique UUID for transcription
// 2. Create initial database record (status: "pending")
// 3. Start background processing using Promise.resolve()
export async function startTranscriptionProcess(youtubeUrl: string, userId: string, supabase: SupabaseClient) {
const transcriptionId = uuidv4();
await createTranscriptionRecord(...);
// Non-blocking background process
Promise.resolve().then(() => {
processTranscription(transcriptionId, youtubeUrl, userId, supabase);
});
return transcriptionId; // Return immediately to user
}
Process Transcription (utils.ts → ytdlp.ts → deepgram.ts):
async function processTranscription() {
// 1. Get video metadata (title)
const metadata = await extractVideoMetadata(youtubeUrl);
await updateTranscriptionTitle(...);
// 2. Download audio stream
const audioStream = await downloadAudioStream(youtubeUrl);
// 3. Send to Deepgram for transcription
const result = await transcribeAudio(audioStream);
// 4. Update database with transcript
await updateTranscriptionStatus(...);
}
Audio Download (ytdlp.ts):
export async function downloadAudioStream() {
// 1. Spawn yt-dlp process to download YouTube audio
// 2. Convert to MP3 format
// 3. Stream the audio data using TransformStream
// 4. Return a ReadableStream of audio data
}
Transcription (deepgram.ts):
export async function transcribeAudio() {
// 1. Convert Web stream to Node stream
// 2. Buffer the audio data
// 3. Send to Deepgram API
// 4. Return transcription result
}
Here's my deepgram.ts file:
import { createClient } from '@deepgram/sdk';
import { Readable } from 'stream';
import { ReadableStream } from 'stream/web';
export async function transcribeAudio(audioStream: ReadableStream<Uint8Array>): Promise<any> {
const apiKey = process.env.DEEPGRAM_API_KEY;
if (!apiKey) {
throw new Error('DEEPGRAM_API_KEY not found in .env.local');
}
const deepgram = createClient(apiKey);
try {
console.log('Starting Deepgram transcription...');
// @ts-ignore - Type mismatch between Web and Node streams
const nodeStream = Readable.fromWeb(audioStream);
const { result, error } = await deepgram.listen.prerecorded.transcribeFile(
nodeStream,
{
model: 'nova-2',
language: 'en',
smart_format: true,
punctuate: true,
paragraphs: true,
utterances: true,
diarize: true,
}
);
if (error) {
throw error;
}
if (!result) {
throw new Error('No result from Deepgram');
}
return result;
} catch (error) {
console.error('Deepgram transcription error:', error);
throw error;
}
}
Here's my utils.ts file:
import { SupabaseClient } from '@supabase/supabase-js';
import { ReadableStream } from 'stream/web';
import { v4 as uuidv4 } from 'uuid';
import { transcribeAudio } from '@/app/lib/deepgram';
import { downloadAudioStream, extractVideoMetadata } from '@/app/lib/ytdlp';
interface TranscriptionMetadata {
id: string;
userid: string;
youtubeurl: string;
title: string | null;
transcript: string | null;
status: 'pending' | 'completed' | 'failed';
created_at: string;
updated_at: string;
jobid: number | null;
version: number;
}
export async function startTranscriptionProcess(youtubeUrl: string, userId: string, supabase: SupabaseClient): Promise<string> {
const transcriptionId = uuidv4();
try {
await createTranscriptionRecord(transcriptionId, userId, youtubeUrl, null, supabase);
Promise.resolve().then(() => {
processTranscription(transcriptionId, youtubeUrl, userId, supabase).catch(error => {
console.error('Background processing error:', error);
updateTranscriptionStatus(transcriptionId, 'failed', supabase).catch(console.error);
});
});
return transcriptionId;
} catch (error) {
console.error('Failed to start transcription:', error);
throw error;
}
}
async function processTranscription(id: string, youtubeUrl: string, userId: string, supabase: SupabaseClient) {
try {
const metadata = await extractVideoMetadata(youtubeUrl);
await updateTranscriptionTitle(id, metadata.title, supabase);
const audioStream: ReadableStream = await downloadAudioStream(youtubeUrl);
const result = await transcribeAudio(audioStream);
const transcript = result.results?.channels[0]?.alternatives[0]?.transcript;
if (!transcript) {
throw new Error('No transcript found in Deepgram response');
}
await updateTranscriptionStatus(id, 'completed', supabase, transcript);
} catch (error) {
console.error('Transcription processing error:', error);
await updateTranscriptionStatus(id, 'failed', supabase);
throw error;
}
}
async function createTranscriptionRecord(
id: string,
userId: string,
youtubeUrl: string,
title: string | null,
supabase: SupabaseClient
): Promise<void> {
const { error } = await supabase
.from("transcriptions")
.upsert({
id,
userid: userId,
youtubeurl: youtubeUrl,
title,
transcript: null,
status: "pending",
version: 1,
created_at: new Date().toISOString(),
updated_at: new Date().toISOString(),
jobid: null
});
if (error) throw new Error("Failed to create transcription record");
}
async function updateTranscriptionStatus(
id: string,
status: TranscriptionMetadata['status'],
supabase: SupabaseClient,
transcript?: string
): Promise<void> {
const { error } = await supabase
.from("transcriptions")
.update({
status,
transcript,
updated_at: new Date().toISOString()
})
.eq("id", id);
if (error) {
console.error("Failed to update transcription status:", error);
throw new Error("Failed to update transcription status");
}
}
async function updateTranscriptionTitle(id: string, title: string, supabase: SupabaseClient): Promise<void> {
const { error } = await supabase
.from("transcriptions")
.update({ title })
.eq("id", id);
if (error) {
console.error("Failed to update transcription title:", error);
throw new Error("Failed to update transcription title");
}
}
And my route.ts file:
import { NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { startTranscriptionProcess } from "@/app/lib/utils";
import createServerSupabase from "@/app/lib/supabase-server";
export async function POST(request: Request) {
const { userId } = await auth();
if (!userId) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
try {
const { youtubeUrl } = await request.json();
const supabase = createServerSupabase();
const transcriptionId = await startTranscriptionProcess(youtubeUrl, userId, supabase);
return NextResponse.json({
message: "Transcription started successfully",
transcriptionId
});
} catch (error) {
console.error("Transcription error:", error);
return NextResponse.json(
{ error: "Failed to start transcription" },
{ status: 500 }
);
}
} |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
The error trace you provided suggests several potential issues related to your app's workflow for processing YouTube videos and sending data to Deepgram for transcription.
Here are some things to consider:
1. Deepgram Error: Unexpected Token '<'
2. yt-dlp Errors
From the yt-dlp logs, we can see it successfully downloads the YouTube video, but there are severa…