Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix memory leak in parseJson() on invalid json-string #24686

Closed
wants to merge 4 commits into from

Conversation

ThomasTJdev
Copy link
Contributor

@ThomasTJdev ThomasTJdev commented Feb 12, 2025

⚠️ Original PR was only for closing the stream in parseJson, but during testing the solution I encountered memory leaks in the parsing when the input string included non-compliant JSON.

PR Summary

This PR ensures that the stream created within the parseJson*() result variable is properly closed and that the object created during parsing JSON is freed in case of a bad input string.

  1. Previously, newStringStream() was used inline within the result = assignment with an implicite .close(), but this change refactors it into a try-finally block to guarantee cleanup.
  2. The raiseParseErr within the parser did not remove the newObject created.

Reproduction

The second part of the of the PR - the parsing memory leak can be replicated with the code and commands in the post below.

Changes

  1. Streaming change is within line 1065-1072
  2. Parsing change is within line 906-943

Stream bug

This here is the valgrind output from a larger code base where the parsing bug is fixed, but the streaming part is not fixed.

$ nim c -d:dev --m:orc -d:useMalloc --debugger:native main
$ valgrind --leak-check=full --log-file=valgrind_details4.txt --track-origins=yes --num-callers=20 ./main

Output:

==787== 10,240 bytes in 80 blocks are definitely lost in loss record 485 of 494
==787==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==787==    by 0x2564C9: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==787==    by 0x2564D7: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==787==    by 0x257FD8: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==787==    by 0x257B70: nimNewObj (arc.nim:96)
==787==    by 0x29F761: streams::newStringStream(sink<string>) (streams.nim:1304)
==787==    by 0x2A7F6F: json::parseJson(string, bool, bool) (json.nim:1050)
==787==    by 0x5A9218: api_tasks::apiTasksGet(string, string, string, string, seq<string>, string, string, bool, string, seq<string>, bool, bool, bool, string, string) (system.nim:940)
==787==    by 0x7DF381: routes_api::colonanonymous__resourcesZroutesZapi95generalZroutes95api_u18865_(ptr<mummy::RequestObj>) (routes_api_tasks.nim:39)
==787==    by 0x337FAC: toHandler::colonanonymous_(ptr<mummy::RequestObj>) (routers.nim:271)

Parsing bug

(see second post below)

@ThomasTJdev
Copy link
Contributor Author

Well, not be as simple as I hoped... And it might not even the correct PR..

The little snippet below will generate definitely lost in Valgrind when compiled with both --mm:orc and --mm:arc, but it goes free with --mm:refc. It's only feeding bad - when the input is fine, no problem!

$ nim -v
Nim Compiler Version 2.2.2 [Linux: amd64]
Compiled at 2025-02-06
Copyright (c) 2006-2025 by Andreas Rumpf

git hash: 6c34f62785263ad412f662f3e4e4bf8d8751d113
active boot switches: -d:release

Code with error in the json string.

# v.nim
import std/[json]

var test: seq[string]
var testData: JsonNode
try:
  ## Fails
  testData = parseJson("""[{"id": 1"}, {"id": "2"}]""")

  ## Works
  # testdata = parseJson("""[{"id": "1"}, {"id": "2"}]""")
  
  ## Fails
  # let stream = newStringStream("""[{"id": 1"}, {"id": "2"}]""")
  # testData = parseJson(stream, "input", false, false)
  # stream.close()
  
except:
  testData = %* []
for t in testData:
  test.add(t["id"].getStr())
echo $test

Valgrind

$ nim c -d:dev -d:useMalloc --debugger:native v
$ valgrind --leak-check=full --log-file=valgrind_details4.txt --track-origins=yes --num-callers=20 ./v
[]
$ grep "definitely lost" valgrind_details4.txt -A25
==1578== 64 bytes in 1 blocks are definitely lost in loss record 2 of 6
==1578==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1578==    by 0x10BB13: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==1578==    by 0x10BB21: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==1578==    by 0x10D628: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==1578==    by 0x10D1C0: nimNewObj (arc.nim:96)
==1578==    by 0x11F984: json::newJArray (json.nim:249)
==1578==    by 0x120280: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:922)
==1578==    by 0x1205DA: json::parseJson(ref<streams::StreamObj>, string, bool, bool) (json.nim:965)
==1578==    by 0x120763: json::parseJson(string, bool, bool) (json.nim:1050)
==1578==    by 0x120D99: NimMainModule (v.nim:22)
==1578==    by 0x12137F: NimMainInner (system.nim:417)
==1578==    by 0x121392: NimMain (system.nim:428)
==1578==    by 0x1213B4: main (system.nim:436)
==1578== 
==1578== 456 (64 direct, 392 indirect) bytes in 1 blocks are definitely lost in loss record 5 of 6
==1578==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1578==    by 0x10BB13: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==1578==    by 0x10BB21: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==1578==    by 0x10D628: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==1578==    by 0x10D1C0: nimNewObj (arc.nim:96)
==1578==    by 0x11F6F6: json::newJObject (json.nim:245)
==1578==    by 0x11FEF4: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:906)
==1578==    by 0x12039C: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:925)
==1578==    by 0x1205DA: json::parseJson(ref<streams::StreamObj>, string, bool, bool) (json.nim:965)
==1578==    by 0x120763: json::parseJson(string, bool, bool) (json.nim:1050)
==1578==    by 0x120D99: NimMainModule (v.nim:22)
==1578==    by 0x12137F: NimMainInner (system.nim:417)
==1578==    by 0x121392: NimMain (system.nim:428)
==1578==    by 0x1213B4: main (system.nim:436)
==1578== 
==1578== LEAK SUMMARY:
==1578==    definitely lost: 128 bytes in 2 blocks
==1578==    indirectly lost: 392 bytes in 2 blocks
==1578==      possibly lost: 0 bytes in 0 blocks
==1578==    still reachable: 16,512 bytes in 2 blocks
==1578==         suppressed: 0 bytes in 0 blocks
==1578== Reachable blocks (those to which a pointer was found) are not shown.
==1578== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1578== 
==1578== For lists of detected and suppressed errors, rerun with: -s
==1578== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

…mplains

When we just directly call `stream.close()` valgrind will complain with:

```sh
==1842== 1,280 bytes in 10 blocks are definitely lost in loss record 1,729 of 1,841
==1842==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1842==    by 0x3DFD29: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==1842==    by 0x3DFD37: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==1842==    by 0x3E1838: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==1842==    by 0x3E13D0: nimNewObj (arc.nim:96)
==1842==    by 0x4502B7: streams::newStringStream(sink<string>) (streams.nim:1304)
==1842==    by 0x459A04: json::parseJson(string, bool, bool) (json.nim:1088)
```
Without this explicit catch of `JsonParsingError` and the freeing valgrind will complain:

```sh
==1578== 64 bytes in 1 blocks are definitely lost in loss record 2 of 6
==1578==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1578==    by 0x10BB13: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==1578==    by 0x10BB21: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==1578==    by 0x10D628: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==1578==    by 0x10D1C0: nimNewObj (arc.nim:96)
==1578==    by 0x11F984: json::newJArray (json.nim:249)
==1578==    by 0x120280: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:922)
==1578==    by 0x1205DA: json::parseJson(ref<streams::StreamObj>, string, bool, bool) (json.nim:965)
==1578==    by 0x120763: json::parseJson(string, bool, bool) (json.nim:1050)
==1578==    by 0x120D99: NimMainModule (v.nim:22)
==1578==    by 0x12137F: NimMainInner (system.nim:417)
==1578==    by 0x121392: NimMain (system.nim:428)
==1578==    by 0x1213B4: main (system.nim:436)
==1578== 
==1578== 456 (64 direct, 392 indirect) bytes in 1 blocks are definitely lost in loss record 5 of 6
==1578==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1578==    by 0x10BB13: system::alloc0Impl(range09223372036854775807) (malloc.nim:11)
==1578==    by 0x10BB21: system::allocShared0Impl(range09223372036854775807) (malloc.nim:37)
==1578==    by 0x10D628: system::alignedAlloc0(range09223372036854775807, range09223372036854775807) (memalloc.nim:351)
==1578==    by 0x10D1C0: nimNewObj (arc.nim:96)
==1578==    by 0x11F6F6: json::newJObject (json.nim:245)
==1578==    by 0x11FEF4: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:906)
==1578==    by 0x12039C: json::parseJson(var<parsejson::JsonParser>, bool, bool, int) (json.nim:925)
==1578==    by 0x1205DA: json::parseJson(ref<streams::StreamObj>, string, bool, bool) (json.nim:965)
==1578==    by 0x120763: json::parseJson(string, bool, bool) (json.nim:1050)
==1578==    by 0x120D99: NimMainModule (v.nim:22)
==1578==    by 0x12137F: NimMainInner (system.nim:417)
==1578==    by 0x121392: NimMain (system.nim:428)
==1578==    by 0x1213B4: main (system.nim:436)
```
@ThomasTJdev ThomasTJdev changed the title Explicit close stream after parseJson bugfix memory leak in parseJson() on invalid json-string Feb 12, 2025
obj = nil
except JsonParsingError:
if obj != nil:
obj = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is required there is an arc bug. One should not have to play GC to get non leaking code.

@Araq
Copy link
Member

Araq commented Feb 14, 2025

I think the bug is just result not getting freed when an exception is thrown. And since it's internally still "returned" anyway the calling code must be made resilient to a failure. Ping @ringabout

@ringabout
Copy link
Member

ringabout commented Feb 17, 2025

It's really annoying since it doesn't make sense to me. Perhaps it is something else wrong with exceptions

import std/streams


proc foo() =
  var name = newStringStream("2r2")
  raise newException(ValueError, "sh")

try:
  foo()
except:
 discard

echo 123
==95679== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1
==95679==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==95679==    by 0x1113E4: alloc0Impl__system_u1749 (in /home/blue/Nim/test9)
==95679==    by 0x11140A: allocShared0Impl__system_u1762 (in /home/blue/Nim/test9)
==95679==    by 0x111452: alignedAlloc0__system_u1958 (in /home/blue/Nim/test9)
==95679==    by 0x111797: nimNewObj (in /home/blue/Nim/test9)
==95679==    by 0x114167: newStringStream__pureZstreams_u1043 (in /home/blue/Nim/test9)
==95679==    by 0x114400: foo__test9_u2 (in /home/blue/Nim/test9)

it doesn't leak with setjmp

@ringabout
Copy link
Member

ringabout commented Feb 17, 2025

Or nimErr_ isn't cleared when disposing pointers so that it isn't dealloc'ed. I'm not so sure

Araq added a commit that referenced this pull request Feb 18, 2025
ref #24686

With this PR

```nim
import std/streams


proc foo() =
  var name = newStringStream("2r2")
  raise newException(ValueError, "sh")

try:
  foo()
except:
 discard

echo 123
```
this example no longer leaks

---------

Co-authored-by: Andreas Rumpf <[email protected]>
narimiran pushed a commit that referenced this pull request Feb 18, 2025
ref #24686

With this PR

```nim
import std/streams

proc foo() =
  var name = newStringStream("2r2")
  raise newException(ValueError, "sh")

try:
  foo()
except:
 discard

echo 123
```
this example no longer leaks

---------

Co-authored-by: Andreas Rumpf <[email protected]>
(cherry picked from commit 510ac84)
@Araq Araq closed this in #24701 Feb 19, 2025
@Araq Araq closed this in f0b5bf3 Feb 19, 2025
narimiran pushed a commit that referenced this pull request Feb 20, 2025
closes #24686
closes #24693

```nim
# v.nim
import std/[json]

var test: seq[string]
var testData: JsonNode
try:
  ## Fails
  testData = parseJson("""[{"id": 1"}, {"id": "2"}]""")

  ## Works
  # testdata = parseJson("""[{"id": "1"}, {"id": "2"}]""")

  ## Fails
  # let stream = newStringStream("""[{"id": 1"}, {"id": "2"}]""")
  # testData = parseJson(stream, "input", false, false)
  # stream.close()

except:
  testData = %* []
for t in testData:
  test.add(t["id"].getStr())
echo $test
```

With this PR:

```
==66425== LEAK SUMMARY:
==66425==    definitely lost: 0 bytes in 0 blocks
==66425==    indirectly lost: 0 bytes in 0 blocks
==66425==      possibly lost: 0 bytes in 0 blocks
==66425==    still reachable: 16,512 bytes in 2 blocks
==66425==         suppressed: 0 bytes in 0 blocks
==66425== Reachable blocks (those to which a pointer was found) are not shown.
==66425== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==66425==
==66425== For lists of detected and suppressed errors, rerun with: -s
==66425== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
```

(cherry picked from commit f0b5bf3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants