The api is a function that you can use to integrate this package into your apps. When read this api docs you can toggle Outline
(see top right) menu in github so you can navigate easily.
This package is written with typescript, You don't have to read all the docs in here, because this package now support VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...
Show Video
intellisense.mp4
see API_VANILLA.md for vanilla js version.
Actually, Theres a lot of function, llm engine and constant that you can import from this package. Here's just few of them. When you have buy the package you can just go to the index.ts
file and see all the function and constant. the package have a lot of features, ofcourse it have a lot of APIs.
Show How to import something from the package
// v5.3.6 API
import {
// Main
markTheWords,
useTextToSpeech,
// Utilities function for precision and add more capabilities
pronunciationCorrection,
getLangForThisText,
getTheVoices,
noAbbreviation,
speak,
convertTextIntoClearTranscriptText,
// Package Data and Cache Integration
// Your app can read the data used by this package, like:
PKG,
PREFERRED_VOICE, // Set global config for the preffered voice
PKG_STATUS_OPT, // Package status option
PKG_DEFAULT_LANG, // Package default lang
LANG_CACHE_KEY, // Package lang sessionStorage key
OPENAI_CHAT_COMPLETION_API_ENDPOINT,
getVoiceBasedOnVoiceURI,
getCachedVoiceInfo,
getCachedVoiceURI,
setCachedVoiceInfo,
getCachedVoiceName,
} from "react-speech-highlight";
// Type data for typescript
import type {
ControlHLType,
StatusHLType,
PrepareHLType,
SpokenHLType,
UseTextToSpeechReturnType,
ActivateGestureProps,
GetVoicesProps,
VoiceInfo,
markTheWordsFuncType,
ConfigTTS,
getAudioType,
getAudioReturnType,
VisemeMap,
SentenceInfo,
} from "react-speech-highlight";
The markTheWords()
function is to process the string text and give some marker to every word and sentences that system will read.
Show Code
Important, This example using react useMemo()
to avoid unecessary react rerender. i mean it will only execute when the text
is changing. it's similiar with useEffect()
.
function abbreviationFunction(str) {
// You can write your custom abbreviation function here
// example:
// Input(string) : LMK
// Ouput(string) : Let me know
return str;
}
const textHL = useMemo(() => markTheWords(text, abbreviationFunction), [text]);
There are two config placement, initialConfig and actionConfig.
Show Code
const initialConfig = {
autoHL: true,
disableSentenceHL: false,
disableWordHL: false,
classSentences: "highlight-sentence",
classWord: "highlight-spoken",
lang: "id-ID",
pitch: 1,
rate: 0.9,
volume: 1,
autoScroll: false,
clear: true,
// For viseme mapping,
visemeMap: {},
// Prefer or fallback to audio file
preferAudio: null,
fallbackAudio: null,
batchSize: 200,
timestampDetectionMode: "auto",
};
const { controlHL, statusHL, prepareHL, spokenHL } =
useTextToSpeech(initialConfig);
const actionConfig = {
autoHL: true,
disableSentenceHL: false,
disableWordHL: false,
classSentences: "highlight-sentence",
classWord: "highlight-spoken",
lang: "id-ID",
pitch: 1,
rate: 0.9,
volume: 1,
autoScroll: false,
clear: true,
// For viseme mapping,
visemeMap: {},
// Prefer or fallback to audio file
preferAudio: "example.com/some_file.mp3",
fallbackAudio: "example.com/some_file.mp3",
batchSize: null, // or 200
timestampDetectionMode: "auto", // or rule, ml
};
void controlHL.play({
textEl: textEl.current,
onEnded: () => {
console.log("Callback when tts done");
},
actionConfig,
});
Show details config
-
autoHL
If the voice is not support the onboundary event, then this package prefer to disable word highlight. instead of trying to mimic onboundary event
-
disableSentenceHL
Disable sentence highlight
-
disableWordHL
Disable word highlight
-
classSentences
You can styling the highlighted sentence with css to some class name
-
classWord
You can styling the highlighted word with css to some class name
-
lang
The one used for
SpeechSynthesisUtterance.lang
. see -
pitch
The one used for
SpeechSynthesisUtterance.pitch
-
volume
The one used for
SpeechSynthesisUtterance.volume
-
autoScroll
Beautifull auto scroll, so the user can always see the highlighted sentences
-
clear
if
true
overide previous played TTS with some new TTS that user want, iffalse
user want to execute play new TTS but there's still exist played TTS. so it will just entering queue behind it -
visemeMap
The data for this parameter i provide in the demo website source code.
-
preferAudio
Some API to pass
string
orasync function
that return audio url like thisexample.com/some_file.mp3
as preferred audio.So the package will use this audio instead of the built in web speech synthesis.
-
fallbackAudio
Some API to pass
string
orasync function
that return audio url like thisexample.com/some_file.mp3
as fallback audio.When the built in web speech synthesis error or user doesn't have any voice. the fallback audio file will be used.
async function getAudioForThisText(text){ var res = await getAudioFromTTSAPI("https://yourbackend.com/api/elevenlabs....",text); // convert to audio file, convert again to audio url return res; } const config = { preferAudio: getAudioForThisText // will only call if needed (if user want to play) so you can save cost fallbackAudio: getAudioForThisText // will only call if needed (if web speech synthesis fail) so you can save cost } const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config)
-
batchSize
The batch size for the audio file.
When you set the batch is null so they send all the text. then you set for 200 package will chunk the text into 200 character.
Example: 200 so package will batched send 200 characters per request to TTS API
-
timestampDetectionMode
Detection mode for timestamp engine. see private docs
controlHL.play();
controlHL.pause();
controlHL.resume();
controlHL.stop();
controlHL.seekSentenceBackward();
controlHL.seekSentenceForward();
controlHL.seekParagraphBackward();
controlHL.seekParagraphForward();
controlHL.changeConfig();
controlHL.activateGesture();
Some react state that give the status of the program. The value it can be idle|play|calibration|pause|loading
. You can fixed the value with accessing from PKG_STATUS_OPT
constant.
Name | Description |
---|---|
idle |
it's initial state |
calibration |
system still process the text, so when TTS is playing it will performs accurate and better |
play |
The system still playing TTS |
pause |
Resume TTS |
loading |
it mean the the system still processing to get best voices available. status will change to this value if we call prepareHL.getVoices() see |
Contain state and function to preparing the TTS. From all available voices that we can get from the SpeechSynthesis.getVoices() this package will test the voice and give 5 only best voice with language specified before.
Name | Description |
---|---|
prepareHL.getVoices() | Function to tell this package to get the best voice. see |
prepareHL.voices | React state store the result from prepareHL.getVoices() |
prepareHL.loadingProgress | React state for knowing voice testing progress |
Contain react state for reporting while TTS playing.
Name | Description |
---|---|
spokenHL.sentence | Some react state, Get the sentence that read |
spokenHL.word | Some react state, Get the word that read |
spokenHL.viseme | Some react state, Get the current viseme |
spokenHL.precentageWord | Read precentage between 0-100 based on words |
spokenHL.precentageSentence | Read precentage between 0-100 based on sentences |
Utilities function for precision and add more capabilities
The common problem is the text display to user is different with their spoken form. like math symbol, equations, terms, etc.. readmore about pronounciation problem
How to build this package with open ai api integration
Show Code
const inputText = `
<ul>
<li>1000</li>
<li>4090</li>
<li>1.000.000</li>
<li>1,2</li>
<li>9.001</li>
<li>30,1</li>
</ul>
`;
const textEl = useRef();
const pronounciation = async (): Promise<void> => {
if (textEl.current) {
await pronunciationCorrection(textEl.current, (progress) => {
console.log(progress);
});
}
};
useEffect(() => {
if (textEl.current) {
console.log("pronounciation");
void pronounciation();
}
// eslint-disable-next-line
}, []);
const textHL = useMemo(() => markTheWords(inputText), [inputText]);
return (
<div ref={textEl}>
<p
dangerouslySetInnerHTML={{
__html: textHL,
}}
></p>
</div>
);
For example you want to implement this package into blog website with multi language, it's hard to know the exact language for each post / article.
Then i use chat gpt api to detect what language from some text. see How to build this package with open ai api integration
Show Code
var timeout = null;
const inputText = `
Hallo, das ist ein deutscher Beispieltext
`;
async function getLang() {
var predictedLang = await getLangForThisText(textEl.current);
// will return `de`
if (predictedLang) {
setLang(predictedLang);
}
}
useEffect(() => {
if (textEl.current) {
if (inputText != "") {
// The timeout is for use case: text change frequently.
// if the text doesn't change just call getLang();
if (timeout) {
clearTimeout(timeout);
}
timeout = setTimeout(() => {
getLang();
}, 2000);
}
}
}, [inputText]);
Function to convert your input string (just text or html string) into Speech Synthesis Markup Language (SSML) clear format that this package can understand when making transcript timestamp.
You must use this function when making the audio file
var convertInto = "ssml"; // or "plain_text"
var clear_transcript = convertTextIntoClearTranscriptText(
"your string here",
convertInto
);
// with the clear_transcript you can make audio file with help of other speech synthesis platforms like elevenlabs etc.
The data or cache (storage) that this package use can be accessed outside. The one that used by React GPT Web Guide.
Show
import {
// ...other API
// Your app can read the data / cache used by this package, like:
PREFERRED_VOICE, // Set global config for the preffered voice
PKG_STATUS_OPT, // Package status option
PKG_DEFAULT_LANG, // Package default lang
LANG_CACHE_KEY, // Package lang sessionStorage key
OPENAI_CHAT_COMPLETION_API_ENDPOINT, // Key to set open ai chat completion api
getVoiceBasedOnVoiceURI,
getCachedVoiceInfo,
getCachedVoiceURI,
setCachedVoiceInfo,
getCachedVoiceName,
} from "react-speech-highlight";
Usage example:
import { setupKey, storage } from "@/app/react-speech-highlight";
// set global preferred voice
useEffect(() => {
const your_defined_preferred_voice = {
// important! Define language code (en-us) with lowercase letter
"de-de": ["Helena", "Anna"],
};
storage.setItem(
"global",
setupKey.PREFERRED_VOICE,
yourDefinedPreferredVoice
);
// Set open ai chat completion api
// example in demo website (next js using environment variable) src/Components/ClientProvider.tsx
if (process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT) {
storage.setItem(
"global",
setupKey.OPENAI_CHAT_COMPLETION_API_ENDPOINT,
process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT
);
}
// or
storage.setItem(
"global",
OPENAI_CHAT_COMPLETION_API_ENDPOINT,
"http://localhost:8000/api/v1/public/chat"
);
// You can set the headers for the fetch API request with this key in sessionStorage
const headers = {
Authorization: `Bearer xxx_YOUR_PLATFORM_AUTH_TOKEN_HERE_xxx`,
};
// Tips: Hover your mouse over the REQUEST_HEADERS variable to see the example and docs
storage.setItem("global", setupKey.REQUEST_HEADERS, headers);
// Speech to Text API endpoint
if (process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT) {
storage.setItem(
"global",
setupKey.OPENAI_SPEECH_TO_TEXT_API_ENDPOINT,
process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT
);
}
}, []);