In general, do not agonize over decisions. If the event is not covered here, please ask on Project Flipside's Slack workspace.
There are two tag types <> and (()).
Tags that contains identifiers or names use <>. You can use colon to separate tag name with spoken words.
Tags that are enclosed by (()) are those that are hard to annotate or where the label is unsure.
When enclosing tags with <> or (()), please ensure that there are no spaces in between the opening and closing symbols. Use _ if spaces are needed in between.
If you cannot understand what is said, replace the word with (())
Fillers are sounds or words that if omitted from the transcript, would not change what the speaker is trying to convey. You can use <fill:>, <filler:>, or <hes:> (interchangeable). However, only do this if (1) there is only one token, (2) the token immediately precedes the correct word intended. Try to re-read your transcripts without the enclosed tag to test your decision.
- Thinking words
e.g. <filler:ah>, <filler:mm> - Non-lexical filler
e.g. <filler:eh> - Fragments/repeats due to false starts. For fragments, hyphenate the cut-off point.
e.g. <filler:ma-> magandang <filler:uma-> tanghali
e.g. <filler:one> one million pesos - Case where you DON'T use filler tags.
e.g. magandang <miss:uma-:umaga> magandang hapon (there is correction, but it's not a filler)
e.g. one million one million pesos (the repetition can be omitted, but this consists of more than one token so we do not use filler tags)
If the speaker mispronounces the word or the word is cut-off NOT because of the speaker. Fragments should be hyphenated at cut-off points. There are two cases:
- If the correct word can be supplied, use <miss:wrong:right>
- If the correct word cannot be supplied, use ((what_was_heard))
Examples:
- The speaker says "tinignan"
e.g. <miss:tinignan:tiningnan> mo ba? - Example where the incomplete word is obvious
e.g. Pula, puti, <miss:as-:asul>
The <foreign> tag is used when the speaker says a word or string of words from another language that would not be widely accepted or understood as part of the native language. This utterance is NOT transcribed and the <foreign> tag is inserted instead.
Individual loan words that are spoken and commonly used as part of the native language are transcribed with the accepted loan word spelling. For example, words such as “kimono”, “croissant”, or “falafel” would be considered commonly accepted loan words in the language. Such words are written using the same character set as the native language.
Overlaps don't exist in FSC. When two or more people talk over the same region, place the words inside a <overlap:transcript> tag to complete the thought of the utterance you are writing. The word content can be left out if they're only noise.
e.g. (Two people are talking, someone says "guwapo" at the same time as "masipag"; you don't transcribe guwapo)
Matalino siya, saka <overlap:masipag>
Use the tags below only if the event is clearly distinguishable. If the event occurs over a span of one or more words, the tag should indicate where it starts, just before the first word it affects.
Categories:
- <breath> for loud breathing
- <chuckle>
- <burp>
- <cough>
- <laugh>
- <music>
- <pause> for pause in talking
- <ring> for phone rings
- <sneeze>
- <sniff>
- <swoosh> for transitional sound effects
- <throat> for clearing throat, like ehem
- <whisper>
Feel free to add if there is an event missing, but make sure to inform members on our slack workspace.
Proper nouns should retain their original spelling.
Titles are transcribed as word: "Dr." must be written as "Doctor"
Exceptions are when the abbreviated form are actually pronounced that way. If someone says /ink/ for Incorporated, write Inc instead of Incorporated. More examples: philo for philosophy
Do not use punctuations unless they are essential for the word. Always try to write contractions completely as much as possible.
Acronyms are transcribed as words if spoken as words, and as letters if spoken as letters. When transcribing sequences of letters an underscore is inserted between each letter. For example: PAG-ASA; A_B_S-C_B_N
Never write numerals. Always transcribe in full words.
16 -> sixteen
1,024 -> one thousand twenty four
Use the / / tag when the letter is pronounced/lipped instead of saying the letter name.
- Choose between Balarila system and original spelling depending on the way the word fits into the utterance, usually dependent on the pronunciation.
e.g. Hinuli ang suspek (vs suspect) ng pulis (vs police)
Inisnab (vs ini-snob) ang gimik (vs gimmick) ng mga istambay (vs standby) - When a Filipino prefix is attached to a borrowed word, separate the prefix with a hyphen:
e.g. nag-shopping, naka-long-sleeves, pa-try - When a borrowed word undergoes partial reduplication, retain the spelling of the duplicated part from the root word, separated with hyphen:
e.g. mag-sho-shopping, age-agenda, i-fo-forward or ipo-forward (depending on pronunciation) - Infixations with "in" and "um". In certain cases, spellings of the root word will change (from c to k in the example below).
e.g. finorward, gumraduate, kinompute
pino-forward or fino-forward (depending on pronunciation), guma-graduate, kino-compute