-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
I added some json repairs that helped me with malformed messages (#341)
* I added some json repairs that helped me with malformed messages There are two of them: The first will remove hard line feeds that appear in the message part because the model added those instead of escaped line feeds. This happens a lot in my experiments and that actually fixes them. The second one is less tested and should handle the case that the model answers with multiple blocks of strings in quotes or even uses unescaped quotes. It should grab everything betwenn the message: " and the ending curly braces, escape them and makes it propper json that way. Disclaimer: Both function were written with the help of ChatGPT-4 (I can't write much Python). I think the first one is quite solid but doubt that the second one is fully working. Maybe somebody with more Python skills than me (or with more time) has a better idea for this type of malformed replies. * Moved the repair output behind the debug flag and removed the "clean" one * Added even more fixes (out of what I just encountered while testing) It seems that cut of json can be corrected and sometimes the model is to lazy to add not just one curly brace but two. I think it does not "cost" a lot to try them all out. But the expeptions get massive that way :) * black * for the final hail mary with extract_first_json, might as well add a double end bracket instead of single --------- Co-authored-by: cpacker <[email protected]>
- Loading branch information
Showing
2 changed files
with
165 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
import json | ||
|
||
import memgpt.local_llm.json_parser as json_parser | ||
|
||
|
||
EXAMPLE_MISSING_CLOSING_BRACE = """{ | ||
"function": "send_message", | ||
"params": { | ||
"inner_thoughts": "Oops, I got their name wrong! I should apologize and correct myself.", | ||
"message": "Sorry about that! I assumed you were Chad. Welcome, Brad! " | ||
} | ||
""" | ||
|
||
EXAMPLE_BAD_TOKEN_END = """{ | ||
"function": "send_message", | ||
"params": { | ||
"inner_thoughts": "Oops, I got their name wrong! I should apologize and correct myself.", | ||
"message": "Sorry about that! I assumed you were Chad. Welcome, Brad! " | ||
} | ||
}<|>""" | ||
|
||
EXAMPLE_DOUBLE_JSON = """{ | ||
"function": "core_memory_append", | ||
"params": { | ||
"name": "human", | ||
"content": "Brad, 42 years old, from Germany." | ||
} | ||
} | ||
{ | ||
"function": "send_message", | ||
"params": { | ||
"message": "Got it! Your age and nationality are now saved in my memory." | ||
} | ||
} | ||
""" | ||
|
||
EXAMPLE_HARD_LINE_FEEDS = """{ | ||
"function": "send_message", | ||
"params": { | ||
"message": "Let's create a list: | ||
- First, we can do X | ||
- Then, we can do Y! | ||
- Lastly, we can do Z :)" | ||
} | ||
} | ||
""" | ||
|
||
|
||
def test_json_parsers(): | ||
"""Try various broken JSON and check that the parsers can fix it""" | ||
|
||
test_strings = [EXAMPLE_MISSING_CLOSING_BRACE, EXAMPLE_BAD_TOKEN_END, EXAMPLE_DOUBLE_JSON, EXAMPLE_HARD_LINE_FEEDS] | ||
|
||
for string in test_strings: | ||
try: | ||
json.loads(string) | ||
assert False, f"Test JSON string should have failed basic JSON parsing:\n{string}" | ||
except: | ||
print("String failed (expectedly)") | ||
try: | ||
json_parser.clean_json(string) | ||
except: | ||
f"Failed to repair test JSON string:\n{string}" | ||
raise |