DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON #1052

jrellisai · 2025-01-17T00:40:56Z

jrellisai
Jan 17, 2025

I'm building a web app to transcribe audio from Youtube videos to text with Deepgram SDK 3.9.

I'm using NextJS 14.2, React 18.2, Node 20.17 and Typescript.

I'm using yt-dlp to download audio from Youtube (via a user entering the URL), which is then streamed directly to Deepgram for transcription.

The problem I'm having is that transcriptions are failing and I'm getting this error:

"DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON"

Full error log and file code below. Let me know if you need any more info.

I've been stuck on this for a while so any help you can offer would be appreciated. Thanks in advance.

Here's the full response from the terminal:

Starting Deepgram transcription...
yt-dlp error: [youtube] Extracting URL: https://www.youtube.com/watch?v=GOXiqUKiGQc&pp=ygUMcG9kY2FzdCBjbGlw
[youtube] GOXiqUKiGQc: Downloading webpage

yt-dlp error: [youtube] GOXiqUKiGQc: Downloading ios player API JSON

yt-dlp error: [youtube] GOXiqUKiGQc: Downloading tv player API JSON

Deepgram transcription error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
    at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  __dgError: true,
  originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
      at JSON.parse (<anonymous>)
      at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
      at successSteps (node:internal/deps/undici/undici:5454:27)
      at fullyReadBody (node:internal/deps/undici/undici:4381:9)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
Transcription processing error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
    at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  __dgError: true,
  originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
      at JSON.parse (<anonymous>)
      at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
      at successSteps (node:internal/deps/undici/undici:5454:27)
      at fullyReadBody (node:internal/deps/undici/undici:4381:9)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading player dd017f77

Background processing error: DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON
    at eval (webpack-internal:///(rsc)/./node_modules/@deepgram/sdk/dist/module/packages/AbstractRestClient.js:76:28)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  __dgError: true,
  originalError: SyntaxError: Unexpected token '<', "<html><bod"... is not valid JSON
      at JSON.parse (<anonymous>)
      at parseJSONFromBytes (node:internal/deps/undici/undici:5472:19)
      at successSteps (node:internal/deps/undici/undici:5454:27)
      at fullyReadBody (node:internal/deps/undici/undici:4381:9)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async consumeBody (node:internal/deps/undici/undici:5463:7)
}
yt-dlp error: [youtube] GOXiqUKiGQc: Downloading m3u8 information

yt-dlp error: [info] GOXiqUKiGQc: Downloading 1 format(s): 251

yt-dlp error: [download] Destination: -

yt-dlp error: [download] Download completed

Here's the transcription process flow I'm using:

Initial Request (route.ts):
// 1. User sends POST request with YouTube URL
// 2. Verify user authentication with Clerk
// 3. Call startTranscriptionProcess with:
//    - YouTube URL
//    - User ID
//    - Supabase client
const transcriptionId = await startTranscriptionProcess(youtubeUrl, userId, supabase);

Start Process (utils.ts):
// 1. Generate unique UUID for transcription
// 2. Create initial database record (status: "pending")
// 3. Start background processing using Promise.resolve()
export async function startTranscriptionProcess(youtubeUrl: string, userId: string, supabase: SupabaseClient) {
    const transcriptionId = uuidv4();
    await createTranscriptionRecord(...);
    
    // Non-blocking background process
    Promise.resolve().then(() => {
        processTranscription(transcriptionId, youtubeUrl, userId, supabase);
    });

    return transcriptionId; // Return immediately to user
}

Process Transcription (utils.ts → ytdlp.ts → deepgram.ts):
async function processTranscription() {
    // 1. Get video metadata (title)
    const metadata = await extractVideoMetadata(youtubeUrl);
    await updateTranscriptionTitle(...);

    // 2. Download audio stream
    const audioStream = await downloadAudioStream(youtubeUrl);
    
    // 3. Send to Deepgram for transcription
    const result = await transcribeAudio(audioStream);
    
    // 4. Update database with transcript
    await updateTranscriptionStatus(...);
}

Audio Download (ytdlp.ts):
export async function downloadAudioStream() {
    // 1. Spawn yt-dlp process to download YouTube audio
    // 2. Convert to MP3 format
    // 3. Stream the audio data using TransformStream
    // 4. Return a ReadableStream of audio data
}

Transcription (deepgram.ts):
export async function transcribeAudio() {
    // 1. Convert Web stream to Node stream
    // 2. Buffer the audio data
    // 3. Send to Deepgram API
    // 4. Return transcription result
}

Here's my deepgram.ts file:

import { createClient } from '@deepgram/sdk';
import { Readable } from 'stream';
import { ReadableStream } from 'stream/web';

export async function transcribeAudio(audioStream: ReadableStream<Uint8Array>): Promise<any> {
  const apiKey = process.env.DEEPGRAM_API_KEY;
  if (!apiKey) {
    throw new Error('DEEPGRAM_API_KEY not found in .env.local');
  }

  const deepgram = createClient(apiKey);

  try {
    console.log('Starting Deepgram transcription...');
    // @ts-ignore - Type mismatch between Web and Node streams
    const nodeStream = Readable.fromWeb(audioStream);

    const { result, error } = await deepgram.listen.prerecorded.transcribeFile(
      nodeStream,
      {
        model: 'nova-2',
        language: 'en',
        smart_format: true,
        punctuate: true,
        paragraphs: true,
        utterances: true,
        diarize: true,
      }
    );

    if (error) {
      throw error;
    }

    if (!result) {
      throw new Error('No result from Deepgram');
    }

    return result;
  } catch (error) {
    console.error('Deepgram transcription error:', error);
    throw error;
  }
}


Here's my utils.ts file:

import { SupabaseClient } from '@supabase/supabase-js';
import { ReadableStream } from 'stream/web';
import { v4 as uuidv4 } from 'uuid';
import { transcribeAudio } from '@/app/lib/deepgram';
import { downloadAudioStream, extractVideoMetadata } from '@/app/lib/ytdlp';

interface TranscriptionMetadata {
  id: string;
  userid: string;
  youtubeurl: string;
  title: string | null;
  transcript: string | null;
  status: 'pending' | 'completed' | 'failed';
  created_at: string;
  updated_at: string;
  jobid: number | null;
  version: number;
}

export async function startTranscriptionProcess(youtubeUrl: string, userId: string, supabase: SupabaseClient): Promise<string> {
  const transcriptionId = uuidv4();
  
  try {
    await createTranscriptionRecord(transcriptionId, userId, youtubeUrl, null, supabase);
    
    Promise.resolve().then(() => {
      processTranscription(transcriptionId, youtubeUrl, userId, supabase).catch(error => {
        console.error('Background processing error:', error);
        updateTranscriptionStatus(transcriptionId, 'failed', supabase).catch(console.error);
      });
    });

    return transcriptionId;
  } catch (error) {
    console.error('Failed to start transcription:', error);
    throw error;
  }
}

async function processTranscription(id: string, youtubeUrl: string, userId: string, supabase: SupabaseClient) {
  try {
    const metadata = await extractVideoMetadata(youtubeUrl);
    await updateTranscriptionTitle(id, metadata.title, supabase);

    const audioStream: ReadableStream = await downloadAudioStream(youtubeUrl);
    const result = await transcribeAudio(audioStream);
    
    const transcript = result.results?.channels[0]?.alternatives[0]?.transcript;
    
    if (!transcript) {
      throw new Error('No transcript found in Deepgram response');
    }

    await updateTranscriptionStatus(id, 'completed', supabase, transcript);
  } catch (error) {
    console.error('Transcription processing error:', error);
    await updateTranscriptionStatus(id, 'failed', supabase);
    throw error;
  }
}

async function createTranscriptionRecord(
  id: string, 
  userId: string, 
  youtubeUrl: string, 
  title: string | null,
  supabase: SupabaseClient
): Promise<void> {
  const { error } = await supabase
    .from("transcriptions")
    .upsert({
      id,
      userid: userId,
      youtubeurl: youtubeUrl,
      title,
      transcript: null,
      status: "pending",
      version: 1,
      created_at: new Date().toISOString(),
      updated_at: new Date().toISOString(),
      jobid: null
    });

  if (error) throw new Error("Failed to create transcription record");
}

async function updateTranscriptionStatus(
  id: string, 
  status: TranscriptionMetadata['status'], 
  supabase: SupabaseClient,
  transcript?: string
): Promise<void> {
  const { error } = await supabase
    .from("transcriptions")
    .update({ 
      status,
      transcript,
      updated_at: new Date().toISOString()
    })
    .eq("id", id);

  if (error) {
    console.error("Failed to update transcription status:", error);
    throw new Error("Failed to update transcription status");
  }
}

async function updateTranscriptionTitle(id: string, title: string, supabase: SupabaseClient): Promise<void> {
  const { error } = await supabase
    .from("transcriptions")
    .update({ title })
    .eq("id", id);

  if (error) {
    console.error("Failed to update transcription title:", error);
    throw new Error("Failed to update transcription title");
  }
}


And my route.ts file:

import { NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { startTranscriptionProcess } from "@/app/lib/utils";
import createServerSupabase from "@/app/lib/supabase-server";

export async function POST(request: Request) {
  const { userId } = await auth();
  
  if (!userId) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  try {
    const { youtubeUrl } = await request.json();
    const supabase = createServerSupabase();
    const transcriptionId = await startTranscriptionProcess(youtubeUrl, userId, supabase);
    
    return NextResponse.json({ 
      message: "Transcription started successfully",
      transcriptionId 
    });
  } catch (error) {
    console.error("Transcription error:", error);
    return NextResponse.json(
      { error: "Failed to start transcription" },
      { status: 500 }
    );
  }
}

Answered by jpvajda

Jan 23, 2025

The error trace you provided suggests several potential issues related to your app's workflow for processing YouTube videos and sending data to Deepgram for transcription.

Here are some things to consider:

1. Deepgram Error: Unexpected Token '<'

This error is caused because the Deepgram SDK received HTML content instead of the expected JSON payload. This is likely due to:
Invalid Request or URL: The audio file provided to Deepgram might not be properly processed or resolved. If the input is not a valid audio file, Deepgram might return an error page (HTML) instead of JSON.

2. yt-dlp Errors

From the yt-dlp logs, we can see it successfully downloads the YouTube video, but there are severa…

View full answer

2025-01-17T00:40:58Z

deepgram-community[bot]
bot Jan 17, 2025

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
_{Consider joining our Discord community for more opportunity to engage with your fellow Deepgram users. You can earn points which can be redeemed for cool stuff by being active in our communities!}

0 replies

2025-01-17T00:42:24Z

deepgram-community[bot]
bot Jan 17, 2025

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

2025-01-17T00:42:29Z

deepgram-community[bot]
bot Jan 17, 2025

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

A request ID that triggered your error or issue.

0 replies

jrellisai · 2025-01-18T15:53:29Z

deepgram-community[bot]
bot Jan 18, 2025

could you provide a code sample of your app code, specifically where you've configured the API calls to Deepgram?

Also if you are actually making calls to Deepgram that are failing, but we are returning an error, provide those request IDs as well if you can.

This message was sent by John Vajda from Deepgram, via our community automation.

3 replies

jrellisai Jan 19, 2025
Author

Hi John, I believe I gave the necessary sample code and error logs in my original post. Let me know if you think I'm missing something. Thanks

jpvajda Jan 23, 2025
Maintainer

@jrellisai I went ahead wrapped your code and errors in markdown ticks ``` to make it more clear, let me look at this now.

jpvajda Jan 23, 2025
Maintainer

The error trace you provided suggests several potential issues related to your app's workflow for processing YouTube videos and sending data to Deepgram for transcription.

Here are some things to consider:

1. Deepgram Error: Unexpected Token '<'

This error is caused because the Deepgram SDK received HTML content instead of the expected JSON payload. This is likely due to:
Invalid Request or URL: The audio file provided to Deepgram might not be properly processed or resolved. If the input is not a valid audio file, Deepgram might return an error page (HTML) instead of JSON.

2. yt-dlp Errors

From the yt-dlp logs, we can see it successfully downloads the YouTube video, but there are several intermediary errors, such as:

Issues Extracting Player JSON: These might not be critical since the final file downloads, but they hint at potential throttling or scraping detection mechanisms by YouTube. Ensure you are using the latest version of yt-dlp to handle changes in YouTube's API
I've heard there has been issues using yt-dlp lately and you should check their issue backlog to see if anything you are experiencing is related: https://github.com/yt-dlp/yt-dlp/issues

3. Deepgram Processing Workflow

Here's a checklist to debug and resolve the issue:

a. Confirm Audio Extraction. Ensure that the audio from yt-dlp is properly downloaded and passed to Deepgram. If you are piping audio directly (- for stdout in yt-dlp), ensure it's being correctly handled in your app.
b. Validate Deepgram Input. Deepgram expects audio in a specific format (MP3, WAV, etc.) or a public URL pointing to the audio file. If the input is invalid or inaccessible, you'll encounter this error.
c: Check yt-dlp Output: If you're using stdout to pipe audio from yt-dlp directly, confirm that the data is in a valid audio format. Alternatively, download the audio to a temporary file and then send it to Deepgram.
d: confirm yt-dlp isn't having issues that might be causing your problem.

Answer selected by deepgram-community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON #1052

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

DeepgramUnknownError: Unexpected token '<', "<html><bod"... is not valid JSON #1052

jrellisai Jan 17, 2025

Replies: 4 comments · 3 replies

deepgram-community[bot] bot Jan 17, 2025

deepgram-community[bot] bot Jan 17, 2025

deepgram-community[bot] bot Jan 17, 2025

deepgram-community[bot] bot Jan 18, 2025

jrellisai Jan 19, 2025 Author

jpvajda Jan 23, 2025 Maintainer

jpvajda Jan 23, 2025 Maintainer

jrellisai
Jan 17, 2025

Replies: 4 comments 3 replies

deepgram-community[bot]
bot Jan 17, 2025

deepgram-community[bot]
bot Jan 17, 2025

deepgram-community[bot]
bot Jan 17, 2025

deepgram-community[bot]
bot Jan 18, 2025

jrellisai Jan 19, 2025
Author

jpvajda Jan 23, 2025
Maintainer

jpvajda Jan 23, 2025
Maintainer