You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
i am using xml2map since a while now and it was very robust and stable!
From the very beginning i tried to prepare the xml containing string with the following function strings.ToValidUTF8
Function ToValidUTF8
// ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences// replaced by the replacement string, which may be empty.funcToValidUTF8(s, replacementstring) string {
varbBuilderfori, c:=ranges {
ifc!=utf8.RuneError {
continue
}
_, wid:=utf8.DecodeRuneInString(s[i:])
ifwid==1 {
b.Grow(len(s) +len(replacement))
b.WriteString(s[:i])
s=s[i:]
break
}
}
// Fast path for unchanged inputifb.Cap() ==0 { // didn't call b.Grow abovereturns
}
invalid:=false// previous byte was from an invalid UTF-8 sequencefori:=0; i<len(s); {
c:=s[i]
ifc<utf8.RuneSelf {
i++invalid=falseb.WriteByte(c)
continue
}
_, wid:=utf8.DecodeRuneInString(s[i:])
ifwid==1 {
i++if!invalid {
invalid=trueb.WriteString(replacement)
}
continue
}
invalid=falseb.WriteString(s[i : i+wid])
i+=wid
}
returnb.String()
}
I am executing xml2map like this
// Prepare bytes a stringstr:=string(*b)
// Strip Bad UTF-8str=strings.ToValidUTF8(str, "")
decoder:=xml2map.NewDecoder(strings.NewReader(str))
result, err:=decoder.Decode()
iferr!=nil {
zap.L().Error("Could not unmarshal XML", zap.Error(err), zap.String("XML", str))
returnerr
}
Since a few days it seems that there is a uncaught case i have a hard time to chase down which seems to be causing problems with illegal character code U+000B
Do you have a robust way to make strip out every character xml2map has problems with?
Thanks a lot in advance
The text was updated successfully, but these errors were encountered:
I seem to have problems with the following line in my data (some content replaced)
<Value dataType="string">-0200:
kSi for max. Blabla text obfuscated.MoreText

-0300:
kSi > 5/6
</Value>
So it seems that these ?html-encoded? values cause problems?
OK as maybe already guessed, this can be fixed by just replacing the HTML encoded characters before attempting to decode the xml
// Remove HTML encoded characters like 
 or 
// These cause xml2map to fail encoding
re_html := regexp.MustCompile(`&#x[A-Fa-f0-9]{0,2};`)
str = re_html.ReplaceAllString(str, "_")
Not sure if this is a bug or expected, so leaving it open (to you) to close this issue
Hello,
i am using xml2map since a while now and it was very robust and stable!
From the very beginning i tried to prepare the xml containing string with the following function
strings.ToValidUTF8
Function ToValidUTF8
I am executing xml2map like this
Since a few days it seems that there is a uncaught case i have a hard time to chase down which seems to be causing problems with
illegal character code U+000B
Do you have a robust way to make strip out every character xml2map has problems with?
Thanks a lot in advance
The text was updated successfully, but these errors were encountered: