Skip to content

Latest commit

 

History

History
533 lines (390 loc) · 15.6 KB

API.md

File metadata and controls

533 lines (390 loc) · 15.6 KB

API

The api is a function that you can use to integrate this package into your apps. When read this api docs you can toggle Outline (see top right) menu in github so you can navigate easily.

This package is written with typescript, You don't have to read all the docs in here, because this package now support VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...

Show Video
intellisense.mp4

see API_VANILLA.md for vanilla js version.


Actually, Theres a lot of function, llm engine and constant that you can import from this package. Here's just few of them. When you have buy the package you can just go to the index.ts file and see all the function and constant. the package have a lot of features, ofcourse it have a lot of APIs.

Show How to import something from the package
// v5.3.6 API
import {
  // Main
  markTheWords,
  useTextToSpeech,

  // Utilities function for precision and add more capabilities
  pronunciationCorrection,
  getLangForThisText,
  getTheVoices,
  noAbbreviation,
  speak,
  convertTextIntoClearTranscriptText,

  // Package Data and Cache Integration
  // Your app can read the data used by this package, like:
  PKG,
  PREFERRED_VOICE, // Set global config for the preffered voice
  PKG_STATUS_OPT, // Package status option
  PKG_DEFAULT_LANG, // Package default lang
  LANG_CACHE_KEY, // Package lang sessionStorage key
  OPENAI_CHAT_COMPLETION_API_ENDPOINT,
  getVoiceBasedOnVoiceURI,
  getCachedVoiceInfo,
  getCachedVoiceURI,
  setCachedVoiceInfo,
  getCachedVoiceName,
} from "react-speech-highlight";

// Type data for typescript
import type {
  ControlHLType,
  StatusHLType,
  PrepareHLType,
  SpokenHLType,
  UseTextToSpeechReturnType,
  ActivateGestureProps,
  GetVoicesProps,
  VoiceInfo,
  markTheWordsFuncType,
  ConfigTTS,
  getAudioType,
  getAudioReturnType,
  VisemeMap,
  SentenceInfo,
} from "react-speech-highlight";

Main

1. TTS Marker markTheWords()

The markTheWords() function is to process the string text and give some marker to every word and sentences that system will read.

Show Code

Important, This example using react useMemo() to avoid unecessary react rerender. i mean it will only execute when the text is changing. it's similiar with useEffect().


function abbreviationFunction(str) {
  // You can write your custom abbreviation function here
  // example:
  // Input(string) : LMK
  // Ouput(string) : Let me know

  return str;
}

const textHL = useMemo(() => markTheWords(text, abbreviationFunction), [text]);

2. TTS React Hook useTextToSpeech()

2.A. CONFIG

There are two config placement, initialConfig and actionConfig.

Show Code
const initialConfig = {
  autoHL: true,
  disableSentenceHL: false,
  disableWordHL: false,
  classSentences: "highlight-sentence",
  classWord: "highlight-spoken",

  lang: "id-ID",
  pitch: 1,
  rate: 0.9,
  volume: 1,
  autoScroll: false,
  clear: true,

  // For viseme mapping,
  visemeMap: {},

  // Prefer or fallback to audio file
  preferAudio: null,
  fallbackAudio: null,

  batchSize: 200,

  timestampDetectionMode: "auto",
};

const { controlHL, statusHL, prepareHL, spokenHL } =
  useTextToSpeech(initialConfig);
const actionConfig = {
  autoHL: true,
  disableSentenceHL: false,
  disableWordHL: false,
  classSentences: "highlight-sentence",
  classWord: "highlight-spoken",

  lang: "id-ID",
  pitch: 1,
  rate: 0.9,
  volume: 1,
  autoScroll: false,
  clear: true,

  // For viseme mapping,
  visemeMap: {},

  // Prefer or fallback to audio file
  preferAudio: "example.com/some_file.mp3",
  fallbackAudio: "example.com/some_file.mp3",

  batchSize: null, // or 200

  timestampDetectionMode: "auto", // or rule, ml
};

void controlHL.play({
  textEl: textEl.current,
  onEnded: () => {
    console.log("Callback when tts done");
  },
  actionConfig,
});
Show details config
  • autoHL

    If the voice is not support the onboundary event, then this package prefer to disable word highlight. instead of trying to mimic onboundary event

  • disableSentenceHL

    Disable sentence highlight

  • disableWordHL

    Disable word highlight

  • classSentences

    You can styling the highlighted sentence with css to some class name

  • classWord

    You can styling the highlighted word with css to some class name

  • lang

    The one used for SpeechSynthesisUtterance.lang. see

  • pitch

    The one used for SpeechSynthesisUtterance.pitch

  • volume

    The one used for SpeechSynthesisUtterance.volume

  • autoScroll

    Beautifull auto scroll, so the user can always see the highlighted sentences

  • clear

    if true overide previous played TTS with some new TTS that user want, if false user want to execute play new TTS but there's still exist played TTS. so it will just entering queue behind it

  • visemeMap

    The data for this parameter i provide in the demo website source code.

  • preferAudio

    Some API to pass string or async function that return audio url like this example.com/some_file.mp3 as preferred audio.

    So the package will use this audio instead of the built in web speech synthesis.

  • fallbackAudio

    Some API to pass string or async function that return audio url like thisexample.com/some_file.mp3 as fallback audio.

    When the built in web speech synthesis error or user doesn't have any voice. the fallback audio file will be used.

    async function getAudioForThisText(text){
     var res = await getAudioFromTTSAPI("https://yourbackend.com/api/elevenlabs....",text);
     // convert to audio file, convert again to audio url
    
     return res;
    }
    
    const config = {
      preferAudio: getAudioForThisText // will only call if needed (if user want to play) so you can save cost
      fallbackAudio: getAudioForThisText // will only call if needed (if web speech synthesis fail)  so you can save cost
    }
    
    const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config)
  • batchSize

    The batch size for the audio file.

    When you set the batch is null so they send all the text. then you set for 200 package will chunk the text into 200 character.

    Example: 200 so package will batched send 200 characters per request to TTS API

    Readmore about batch system in this package

  • timestampDetectionMode

    Detection mode for timestamp engine. see private docs

2.B. INTERFACE

controlHL

controlHL.play();
controlHL.pause();
controlHL.resume();
controlHL.stop();
controlHL.seekSentenceBackward();
controlHL.seekSentenceForward();
controlHL.seekParagraphBackward();
controlHL.seekParagraphForward();
controlHL.changeConfig();
controlHL.activateGesture();

statusHL

Some react state that give the status of the program. The value it can be idle|play|calibration|pause|loading. You can fixed the value with accessing from PKG_STATUS_OPT constant.

Name Description
idle it's initial state
calibration system still process the text, so when TTS is playing it will performs accurate and better
play The system still playing TTS
pause Resume TTS
loading it mean the the system still processing to get best voices available. status will change to this value if we call prepareHL.getVoices() see

prepareHL

Contain state and function to preparing the TTS. From all available voices that we can get from the SpeechSynthesis.getVoices() this package will test the voice and give 5 only best voice with language specified before.

Name Description
prepareHL.getVoices() Function to tell this package to get the best voice. see
prepareHL.voices React state store the result from prepareHL.getVoices()
prepareHL.loadingProgress React state for knowing voice testing progress

spokenHL

Contain react state for reporting while TTS playing.

Name Description
spokenHL.sentence Some react state, Get the sentence that read
spokenHL.word Some react state, Get the word that read
spokenHL.viseme Some react state, Get the current viseme
spokenHL.precentageWord Read precentage between 0-100 based on words
spokenHL.precentageSentence Read precentage between 0-100 based on sentences

Utilities

Utilities function for precision and add more capabilities

1. pronunciationCorrection()

The common problem is the text display to user is different with their spoken form. like math symbol, equations, terms, etc.. readmore about pronounciation problem

How to build this package with open ai api integration

Show Code
const inputText = `
<ul>
  <li>1000</li>
  <li>4090</li>
  <li>1.000.000</li>
  <li>1,2</li>
  <li>9.001</li>
  <li>30,1</li>
</ul>
`;

const textEl = useRef();

const pronounciation = async (): Promise<void> => {
  if (textEl.current) {
    await pronunciationCorrection(textEl.current, (progress) => {
      console.log(progress);
    });
  }
};

useEffect(() => {
  if (textEl.current) {
    console.log("pronounciation");
    void pronounciation();
  }
  // eslint-disable-next-line
}, []);

const textHL = useMemo(() => markTheWords(inputText), [inputText]);

return (
  <div ref={textEl}>
    <p
      dangerouslySetInnerHTML={{
        __html: textHL,
      }}
    ></p>
  </div>
);

2. getLangForThisText()

For example you want to implement this package into blog website with multi language, it's hard to know the exact language for each post / article.

Then i use chat gpt api to detect what language from some text. see How to build this package with open ai api integration

Show Code
var timeout = null;

const inputText = `
Hallo, das ist ein deutscher Beispieltext
`;

async function getLang() {
  var predictedLang = await getLangForThisText(textEl.current);

  // will return `de`
  if (predictedLang) {
    setLang(predictedLang);
  }
}

useEffect(() => {
  if (textEl.current) {
    if (inputText != "") {
      // The timeout is for use case: text change frequently.
      // if the text doesn't change just call getLang();
      if (timeout) {
        clearTimeout(timeout);
      }

      timeout = setTimeout(() => {
        getLang();
      }, 2000);
    }
  }
}, [inputText]);

3. convertTextIntoClearTranscriptText()

Function to convert your input string (just text or html string) into Speech Synthesis Markup Language (SSML) clear format that this package can understand when making transcript timestamp.

You must use this function when making the audio file

var convertInto = "ssml"; // or "plain_text"
var clear_transcript = convertTextIntoClearTranscriptText(
  "your string here",
  convertInto
);
// with the clear_transcript you can make audio file with help of other speech synthesis platforms like elevenlabs etc.

Package Data and Cache Integration

The data or cache (storage) that this package use can be accessed outside. The one that used by React GPT Web Guide.

Show
import {
  // ...other API

  // Your app can read the data / cache used by this package, like:
  PREFERRED_VOICE, // Set global config for the preffered voice
  PKG_STATUS_OPT, // Package status option
  PKG_DEFAULT_LANG, // Package default lang
  LANG_CACHE_KEY, // Package lang sessionStorage key
  OPENAI_CHAT_COMPLETION_API_ENDPOINT, // Key to set open ai chat completion api
  getVoiceBasedOnVoiceURI,
  getCachedVoiceInfo,
  getCachedVoiceURI,
  setCachedVoiceInfo,
  getCachedVoiceName,
} from "react-speech-highlight";

Usage example:

Set custom constant value for this package

import { setupKey, storage } from "@/app/react-speech-highlight";

// set global preferred voice
useEffect(() => {
  const your_defined_preferred_voice = {
    // important! Define language code (en-us) with lowercase letter
    "de-de": ["Helena", "Anna"],
  };

  storage.setItem(
    "global",
    setupKey.PREFERRED_VOICE,
    yourDefinedPreferredVoice
  );

  // Set open ai chat completion api
  // example in demo website (next js using environment variable) src/Components/ClientProvider.tsx
  if (process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT) {
    storage.setItem(
      "global",
      setupKey.OPENAI_CHAT_COMPLETION_API_ENDPOINT,
      process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT
    );
  }

  // or
  storage.setItem(
    "global",
    OPENAI_CHAT_COMPLETION_API_ENDPOINT,
    "http://localhost:8000/api/v1/public/chat"
  );

  // You can set the headers for the fetch API request with this key in sessionStorage
  const headers = {
    Authorization: `Bearer xxx_YOUR_PLATFORM_AUTH_TOKEN_HERE_xxx`,
  };

  // Tips: Hover your mouse over the REQUEST_HEADERS variable to see the example and docs
  storage.setItem("global", setupKey.REQUEST_HEADERS, headers);

  // Speech to Text API endpoint
  if (process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT) {
    storage.setItem(
      "global",
      setupKey.OPENAI_SPEECH_TO_TEXT_API_ENDPOINT,
      process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT
    );
  }
}, []);