go-clean
is a flexible, stand-alone, lightweight library for detecting and censoring profanities in Go.
go get -u github.com/martinhrvn/go-clean
By default
package main
import (
goclean "github.com/martinhrvn/go-clean"
)
func main() {
goclean.IsProfane("fuck this shit")
// returns true
goclean.List("fuck this shit")
// returns "DetectedConcern{Word: "fuck", MatchedWord: "fuck", StartIndex: 0, EndIndex: 3}"
goclean.Redact("fuck this shit")
// returns "**** this shit"
}
Calling goclean.IsProfane(s)
, goclean.ExtractProfanity(s)
or goclean.Redact(s)
will use the default profanity detector,
that is configured in the config.json
file.
If you'd like to disable leet speak, numerical character or special character sanitization, you have to create a ProfanityDetector instead:
profanityDetector := goclean.NewProfanitySanitizer(goclean.Config{
// will not sanitize leet speak (a$$, b1tch, etc.)
DetectLeetSpeak: false,
// will not detect obfuscated words (f_u_c_k, etc.)
DetectObfuscated: false,
// replacement character for redacted words
ReplacementCharacter: '*',
// Lenght for obfuscated characters (e.g. if set to "1" f_u_c_k will be detected but f___u___c___k won't)
ObfuscationLength: 1,
Profanities: []goclean.WordMatcher{
{ Word: "fuck", Regex: "f[u]+ck" }
}
})
DetectLeetSpeak
: sanitize leet speak (a$$
,b1tch
, etc.)- default:
true
- default:
DetectObfuscated
: detect obfuscated words (f_u_c_k
, etc.)- default:
true
- default:
ObfuscationLength
: length for obfuscated characters (e.g. if set to "1"f_u_c_k
will be detected butf___u___c___k
won't)- default:
3
- default:
ReplacementCharacter
: replacement character for redacted words- default:
*
- default:
used for profanities and false negatives configuration
Regex
:- if found it will be used to match word instead of
Word
- if found it will be used to match word instead of
Word
:- word to detect,
- if
DetectObfuscated: true
it will also match words withObfuscationLength
characters in between letters
Level
:- optional profanity level that will be returned from
List
method
- optional profanity level that will be returned from
These are words that contain words that are profanities but are not profane themselves.
For example word bass
contains ass
but is not profane.
These are words that may be incorrectly filtered as false positives and words that should always be treated as profane, regardless of false postives. These are matched before false positives are removed.
For example: dumbass
is false negative, as bass
is false positive so to be matched it needs to be added to false negatives.
Returns list of DetectedConcerns
for profanities found in the given string.
This contains:
Word
: base word found (in case only regex is provided empty string will be returned, e.g. forfuuuck
it will befuck
)MatchedWord
: actual word found in string (e.g. forfuuuck
it will befuuuck
)StartIndex
: start index of word in stringEndIndex
: end index of word in stringLevel
: profanity level (if provided, else it will be0
)
If the configuration is:
WordMatcher {
Word: "fuck"
Regex: "f[u]+ck"
Level: 1
}
and the input string is fuuuck
, it will return:
DetectedEntity {
Word: "fuck"
MatchedWord: "fuuuck"
StartIndex: 0
EndIndex: 6
}
It will return string with profanities replaced with ReplacementCharacter
for each character of detected profanities.
The input string "shit hit the fan"
will be returned as "**** hit the fan"
.
Returns true
if the given string contains profanities.
The input string "shit hit the fan"
returns true
.