Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpclient -d:ssl and db_postgres incompatible and cause SIGSEGV! (MacOSX) #9419

Open
treeform opened this issue Oct 17, 2018 · 26 comments · Fixed by #10230
Open

httpclient -d:ssl and db_postgres incompatible and cause SIGSEGV! (MacOSX) #9419

treeform opened this issue Oct 17, 2018 · 26 comments · Fixed by #10230

Comments

@treeform
Copy link
Contributor

import db_postgres
import httpclient

var client = newHttpClient()

discard client.getContent("https://google.com") # crashes here

var db = open("", "", "", "") # never gets to DB open (which should fail)

nim c -r -d:ssl server/crazyssl2

Traceback (most recent call last)
httpclient.nim(819)      crazyssl2
httpclient.nim(369)      getDefaultSSL
net.nim(538)             newContext
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

nim -v

Nim Compiler Version 0.19.0 [MacOSX: amd64]
Compiled at 2018-10-02
Copyright (c) 2006-2018 by Andreas Rumpf

git hash: f673fbd91fa8b75e71feff0df232f6c598653722
active boot switches: -d:release
@dom96
Copy link
Contributor

dom96 commented Oct 17, 2018

I added a:

    if newCTX.isNil:
      raiseSSLError()

Just above net.nim if newCTX.SSLCTXSetCipherList(cipherList) != 1: and got a "error:140A90F1:lib(20):func(169):reason(241)" error. Preliminary search implies that this occurs when openssl is initialised multiple times, perhaps db_postgres is causing that to happen somehow?

@treeform
Copy link
Contributor Author

Is there a way to check if openSSL was already initialized? I think it happens inside libpq.dylib code not mine.

@dom96
Copy link
Contributor

dom96 commented Oct 17, 2018 via email

@treeform
Copy link
Contributor Author

I already tried that commenting "SSL_library_init" does nothing. Same error.

@LemonBoy
Copy link
Contributor

You may be interested in this.

@treeform
Copy link
Contributor Author

proc pqinitOpenSSL*(do_ssl: int32, do_crypto: int32) {.cdecl, dynlib: dllName, importc: "PQinitOpenSSL".}
echo "calling pqinitOpenSSL"
pqinitOpenSSL(0, 0)

Added this in, compiles and runs but still fails in the same way...

I could try compiling libpg without SSL.

@treeform
Copy link
Contributor Author

I figured it out! It looks like -d:ssl and libpq grab different libssl.dylibs.

I used otool to figure out what dylib libpq wants:

otool -L /usr/local/Cellar/libpq/10.5/lib/libpq.5.10.dylib
/usr/local/Cellar/libpq/10.5/lib/libpq.5.10.dylib:
        /usr/local/opt/libpq/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.10.0)
        /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)

Its strange but otool does not work on nim programs (does nim use some non-standard system for dylib includes?)

I did this to openssl.nim and it works:

# when useWinVersion:
#   when not defined(nimOldDlls) and defined(cpu64):
#     const
#       DLLSSLName* = "(libssl-1_1-x64|ssleay64|libssl64).dll"
#       DLLUtilName* = "(libcrypto-1_1-x64|libeay64).dll"
#   else:
#     const
#       DLLSSLName* = "(libssl-1_1|ssleay32|libssl32).dll"
#       DLLUtilName* =  "(libcrypto-1_1|libeay32).dll"

#   from winlean import SocketHandle
# else:
#   const versions = "(.1.1|.38|.39|.41|.43|.44|.45|.10|.1.0.2|.1.0.1|.1.0.0|.0.9.9|.0.9.8|)"

#   when defined(macosx):
#     const
#       DLLSSLName* = "libssl" & versions & ".dylib"
#       DLLUtilName* = "libcrypto" & versions & ".dylib"
#   elif defined(genode):
#     const
#       DLLSSLName* = "libssl.lib.so"
#       DLLUtilName* = "libcrypto.lib.so"
#   else:
#     const
#       DLLSSLName* = "libssl.so" & versions
#       DLLUtilName* = "libcrypto.so" & versions
#   from posix import SocketHandle

const
  DLLSSLName* = "/usr/local/opt/openssl/lib/libssl.1.0.0.dylib"
  DLLUtilName* = "/usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib"
from posix import SocketHandle

I think the problem here is that const versions = "(.1.1|.38|.39|.41|.43|.44|.45|.10|.1.0.2|.1.0.1|.1.0.0|.0.9.9|.0.9.8|)" matches some other version of openSSL which is not what libpq selects.

I am not sure how to fix it in the source code though? Should libssl and libcrypot be a -d:ssl_version to be passed?

@dom96
Copy link
Contributor

dom96 commented Oct 19, 2018

huh damn. That's really tricky.

One thing I can think of is:

  • Before loading any DLL check whether the current process has a DLL of the same name already loaded.
    • If so, grab a handle to that DLL.
    • If not, load whatever version of the DLL you can yourself.

Problem: is it possible to query the DLLs that are loaded by a process? Can a handle be grabbed to such a process?

@dom96 dom96 added the Severe label Oct 19, 2018
@treeform
Copy link
Contributor Author

treeform commented Jan 8, 2019

This 1 line patch fixes it: #10230
Turns out just searching them in order of newest to oldest fixes this issue.

@timotheecour
Copy link
Member

reopening see #10281 (comment) for context; we need to find a proper fix that doens't introduce a regression

@timotheecour
Copy link
Member

somewhat relevant: https://github.com/nim-lang/nimble#troubleshooting regarding SSL issues on OSX; but we need proper fix

@hiteshjasani
Copy link
Contributor

Using choosenim with devel#head I'm getting a similar crash.

$ choosenim show
   Channel: devel
   Version: #head
[ ... ]
httpclient.nim(578)      getClient
httpclient.nim(365)      getDefaultSSL
net.nim(536)             newContext
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Something I learned, I have two versions of OpenSSL installed on MacOSX: 1.x and 1.1.x.

  • If I point DYLD_LIBRARY_PATH to the 1.x version I get the crash.
  • If I point it to the 1.1.x version I don't get a crash but instead get an error: Exception message: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match.
    httpclient.nim(578) getClient
    httpclient.nim(365) getDefaultSSL
    net.nim(545) newContext
    net.nim(471) raiseSSLError
  • Lastly, choosenim show crashes for me if DYLD_LIBRARY_PATH is pointing to OpenSSL 1.1.x.

@hiteshjasani
Copy link
Contributor

Okay, I have postgres and httpclient working with nimble build -d:ssl. I'll share what I did to help others.

when defined(windows):
  const
    dllName = "libpq.dll"
elif defined(macosx):
  const
    dllName = "libpq.dylib"
else:
  const
    dllName = "libpq.so(.5|)"

proc pqinitOpenSSL*(do_ssl: int32, do_crypto: int32) {.cdecl, dynlib: dllName, importc: "PQinitOpenSSL".}

proc main() =
  echo "Initting OpenSSL: " & $SSL_library_init()
  echo "Setting up db connection ..."
  pqinitOpenSSL(0, 0)           # Tell postgres not to init OpenSSL
  # ... opening database code
  # ... using httpclient to make http requests

I had both openssl and [email protected] installed via Homebrew on the system. I didn't have either DYLD_LIBRARY_PATH or PKG_CONFIG_PATH defined. Using this method, everything is using openssl as it worked whether [email protected] was installed or not - I tried it both ways.

My env:

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.14.4
BuildVersion:	18E226
$ choosenim show
   Channel: devel
   Version: #head
      Path: /Users/hitesh/.choosenim/toolchains/nim-#head

@treeform
Copy link
Contributor Author

I found that pqinitOpenSSL(0, 0) did not work for me. The only thing that worked is making sure both httpclient (nim stdlib) and libpq (installed via brew) used the same openssl dylib. For me I think it was the openssl 1.0.0 for you it might be some thing else. It all depends on what libpq was hard coded with when it was compiled. If they don't match you get this error.

@hiteshjasani
Copy link
Contributor

@treeform that's strange. Because I was getting the same crash as you in my code before I managed this workaround. And my libpq has similar deps as your version.

otool -L /usr/local/Cellar/libpq/11.2/lib/libpq.5.11.dylib 
/usr/local/Cellar/libpq/11.2/lib/libpq.5.11.dylib:
	/usr/local/opt/libpq/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.11.0)
	/usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
	/usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.200.5)

Okay, I tried your code and got the same crash as you. I modified to be this and it worked fine for me.

from openssl import SSL_library_init
import db_postgres, httpclient

when defined(windows):
  const
    dllName = "libpq.dll"
elif defined(macosx):
  const
    dllName = "libpq.dylib"
else:
  const
    dllName = "libpq.so(.5|)"

proc pqinitOpenSSL*(do_ssl: int32, do_crypto: int32) {.cdecl, dynlib: dllName, importc: "PQinitOpenSSL".}

proc main() =
  echo "Initting OpenSSL: " & $SSL_library_init()
  echo "Setting up db connection ..."
  pqinitOpenSSL(0, 0)           # Tell postgres not to init OpenSSL

  var client = newHttpClient()

  discard client.getContent("https://google.com")

  var db = open("", "", "", "") # fails because it can't connect, but no crash

when isMainModule:
  main()

@Araq
Copy link
Member

Araq commented May 14, 2019

How about we enforce --dynlibOverride for our SSL wrapper and always link "statically" to the lib*.so or whatever the correct terminology is for this clusterfuck.

@hiteshjasani
Copy link
Contributor

Mixed feelings. The security community seems to oscillate between static and dynamically linking OpenSSL. But with how often vulnerabilities are being found and the frequency that OpenSSL is being updated, I'd lean towards keeping dynamic linking and making it more foolproof.

@treeform
Copy link
Contributor Author

I think the real problem is that Nim just does not support all of the openSSL it claims it supports on all operating systems. So you have to find ssl that both pg and you will support.

This mostly fixes: #10230 - it tells nim to use the newest openSSL you can find.

But that fix runs into this issue: #10281 (comment) - here nim finds some newest openSSL that breaks it. Reverting my fix forces it to start looking for an old openSSL that works.

I think the reason for the revert Could not download: Please upgrade your OpenSSL library, it does not support the necessary protocols. OpenSSL error is: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure was because the testing machine had an openSSL version nim does not support (but says it does).

@treeform
Copy link
Contributor Author

I created a spread sheet to track support and versions of different libssl. Turns it its too confusing!

https://docs.google.com/spreadsheets/d/1JgbQS5e2I5bEzFaVYbYrt714CoWGHj-gRIovd7MUfMw/edit?usp=sharing

I think we should drop support for openSSL 0.9.9 as it never shipped.

LibreSSL which is also compatible with openSSL goes by the same name but their version are really confusing. Its really hard to find mapping between libssl.##.so to LibreSSL version.

Its really unclear when any of these shipped, but they shipped before 4/11/2017:

libssl.38.so
libssl.39.so
libssl.10.so
libssl.41.so
libssl.43.so
libssl.44.so

We almost need a test case that can test all versions of openSSL/LibreSSL. Or drop older versions and prevent nim from compiling with them.

@genotrance
Copy link
Contributor

This was already reported earlier in this tread and is the same root cause for issue 122 on the choosenim issue tracker.

choosenim loads libcurl and libssl/libcrypto on OSX using Nim's dynlib loading. libcurl itself is dynamically linked to libssl/libcrypto. If versions don't match, it crashes.

> otool -L /usr/lib/libcurl.dylib
        ...
        /usr/lib/libcrypto.42.dylib (compatibility version 43.0.0, current version 43.0.0)
        /usr/lib/libssl.44.dylib (compatibility version 45.0.0, current version 45.1.0)
        ...

@genotrance
Copy link
Contributor

@genotrance
Copy link
Contributor

Consensus of the conversation was to link ssl/crypto at compile time instead of loading it at runtime. I tried this for choosenim though and it didn't work:

when defined(macosx):
  --define:curl
  --dynlibOverride:ssl
  --dynlibOverride:crypto
  switch("passL", "-lssl -lcrypto")

--define:ssl

Compiling with this results in a link time error - ld cannot find -lssl nor -lcrypto even though they are in /usr/lib.

We got lucky in that choosenim doesn't really need --define:ssl when libcurl is used to download on osx so we can simply change it to:

when defined(macosx):
  --define:curl
else:
  --define:ssl

This fixes the crash for choosenim but does not really provide a path forward for the general issue. Need help from others to figure out how to link against openssl in a traditional fashion on osx.

cc @alaviss @federico3 @Araq

@treeform
Copy link
Contributor Author

My thoughts:

  • if you are deploying on a server just use -d:sslVersion=x.x.x to make sure you are loading exact same openSSL so/dylib/dll your other libraries are loading.

  • if you are shipping your software to user computer where you can't control openSSL you need to figure the openSSL loading order Change the search order of libSSL libs from newest to oldest. #10230

I fixed the order but it was reverted because it caused issues for @timotheecour at #10282. Because it was reverted that made me feel unwanted so I did not continue down this path. But I still believe the real solution lives there.

If you apply my patch again does it fix the issues you are having?

I made a sheet detailing every version: https://docs.google.com/spreadsheets/d/1JgbQS5e2I5bEzFaVYbYrt714CoWGHj-gRIovd7MUfMw/edit#gid=0

You can see on the sheet Nim claims to load version that never existed. Nim missed some. Nim loads openSSL out of order. It's not surprising we are having these OpenSSL issues because of this. Some other library is going i'll load the most recent openSSL, while nim goes I'll load a random openSSL user might have laying around.

@timotheecour
Copy link
Member

timotheecour commented Feb 13, 2020

Because it was reverted that made me feel unwanted so I did not continue down this path. But I still believe the real solution lives there.

very sorry to hear the revert made you feel unwanted; I greatly appreciate all your contributions; as I wrote here #10281 (comment) :

@treeform regression fixes always take precedence over bug fixes; #10230 might've fixed a bug but it introduced a regression [breaking nimble and every package], so it made sense to revert until we find a proper fix, and I only reverted the part that was problematic, OSX; I agree the existing code needs to be fixed properly, but priority was to fix the regression first.

@derekdai
Copy link
Contributor

derekdai commented Apr 6, 2021

On Ubuntu 20.04, I have the same issue with nake

# nakefile.nim
import nake
import httpclient

task "default", "":
  newHttpClient().downloadFile("https://www.google.com", "index.html")
...
Nim/lib/pure/httpclient.nim(556) :anonymous
Nim/lib/pure/httpclient.nim(325) getDefaultSSL
Nim/lib/pure/net.nim(609) newContext
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Without nake, httpclient works fine

import httpclient

newHttpClient().downloadFile("https://www.google.com", "index.html")

@treeform
Copy link
Contributor Author

I have written a HTTP library to solve this problem: https://github.com/treeform/puppy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants