Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when initializing with null language #4028

Open
ccouzens opened this issue Mar 5, 2023 · 17 comments
Open

Segmentation fault when initializing with null language #4028

ccouzens opened this issue Mar 5, 2023 · 17 comments

Comments

@ccouzens
Copy link

ccouzens commented Mar 5, 2023

Basic Information

tesseract 5.2.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.3) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.3.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1

Operating System

No response

Other Operating System

Fedora Linux 37

But this was originally reported to me from a user on a Mac M1 (presumably macOS 13 Ventura).

uname -a

Linux fedora-desktop 6.1.14-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Feb 26 00:13:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Compiler

gcc version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)

Virtualization / Containers

No response

CPU

13th Gen Intel® Core™ i7-13700K

Current Behavior

When using TessBaseAPIInit3(cube, NULL, NULL) the language isn't set to a sensible default, thus later causing a segmentation fault when TessBaseAPIRecognize is called.

Expected Behavior

Given that the documentation says:

The language is (usually) an ISO 639-3 string or nullptr will default to eng.

I would expect a NULL to work the same way as "eng" (not segmentation fault at the Recognize step).

Suggested Fix

Null pointer defaults to "eng".

Other Information

Test case program

#include <tesseract/capi.h>
#include <leptonica/allheaders.h>

int main(int argc, char *argv[]) {
    TessBaseAPI *cube = TessBaseAPICreate();
    TessBaseAPIInit3(cube, NULL, NULL); // change this 2nd `NULL` to "eng" for success

    PIX *image = pixRead("img.png");
    TessBaseAPISetImage2(cube, image);
    TessBaseAPIRecognize(cube, NULL);
    char *text = TessBaseAPIGetUTF8Text(cube);
    printf("%s\n", text);
    TessDeleteText(text);
    pixFreeData(image);
    TessBaseAPIDelete(cube);
}

run using gcc $(pkg-config --cflags --libs tesseract) $(pkg-config --cflags --libs lept) test.c && ./a.out.

This was originally reported against a Rust wrapper: antimatter15/tesseract-rs#34

@stweil
Copy link
Member

stweil commented Mar 5, 2023

The API was changed by my commit f5d22d0 ("Don't set a default language in TessBaseAPI::Init"). The reason for that commit was that Tesseract required (and loaded) eng.traineddata even for tasks which did not require a model file.

So the documentation should be updated, and of course the code should not crash.

@stweil
Copy link
Member

stweil commented Mar 5, 2023

#include <leptonica/allheaders.h>

The right form is #include <allheaders.h>.

@zdenop
Copy link
Contributor

zdenop commented Mar 6, 2023

The right form is #include <allheaders.h>.

Why?

@stweil
Copy link
Member

stweil commented Mar 6, 2023

Including leptonica/allheaders.h simply does not work in many cases, for example on MacOS with Homebrew.

Leptonica tells the compiler to look for header files in the leptonica directory which provides allheaders.h, but not leptonica/allheaders.h:

% pkg-config --cflags lept
-I/opt/homebrew/Cellar/leptonica/1.82.0_1/include/leptonica

Including leptonica/allheaders.h works on typical Linux installations , because the compiler searchs /usr/include and /usr/local/include, too. It won't work on Linux if Leptonica was installed in a non standard location like $HOME.

@amitdo
Copy link
Collaborator

amitdo commented Mar 6, 2023

The PR has more details: #3610

@amitdo amitdo added the API label Mar 6, 2023
@zdenop
Copy link
Contributor

zdenop commented Mar 22, 2023

@stweil : Who leptonica is build on MacOS with Homebrew? With autotool or cmake? On Windows cmake build with custom instalation I got this:

>SET PKG_CONFIG_PATH=f:\win64\lib\pkgconfig
>F:\vcpkg\downloads\tools\msys2\9a1ec3f33446b195\mingw32\bin\pkg-config.exe --cflags lept
-If:/win64/include -If:/win64/include/leptonica

IMO this is right approach, so developer could use both variants. Personally I prefer leptonica/allheaders.h, because allheaders.h is too general and unclear to which library it belongs.

@stweil
Copy link
Member

stweil commented Mar 22, 2023

Homebrew uses autotools for all platforms in its tesseract formula.

I am surprised that pkgconfig gives two include paths for the Leptonica headers. That's strange and unusual.

@stweil
Copy link
Member

stweil commented Mar 23, 2023

@DanBloomberg, there seem to exist two different conventions how people include the header file for Leptonica:

  1. #include <allheaders.h> /* works with compiler flags from pkg-config generated by autotools of cmake */
  2. #include <leptonica/allheaders.h> /* works with compiler flags from pkg-config generated by cmake */

Which of the two variants would you prefer? We currently have the problem that it can depend on the build type (autotools or cmake) whether both variants work or only the first one. We could enforce that only one of both variants is supported, or we could support both variants.

@amitdo
Copy link
Collaborator

amitdo commented Mar 23, 2023

What do you means by pkg-config generated by autotools of cmake?

@amitdo
Copy link
Collaborator

amitdo commented Mar 23, 2023

CMake and Autotools should behave similarly, otherwise you make supporting the software more difficult.

@amitdo
Copy link
Collaborator

amitdo commented Mar 23, 2023

Due to the differences between the builds, for Tesseract at least I would prefer that we support just one build system for Linux, macOS,, and the BSDs.

@amitdo
Copy link
Collaborator

amitdo commented Mar 23, 2023

#4026 is one more example for a different behavior of the Autotools build vs. the CMake build.

@DanBloomberg
Copy link

I have always used variant 1. Both in the library and for the 300 or so programs in the prog/ directory.

Never considered variant 2, which wouldn't work with any of my code because I'm using specific local builds (not installed software) when developing and testing.

@stweil
Copy link
Member

stweil commented Mar 23, 2023

CMake and Autotools should behave similarly, otherwise you make supporting the software more difficult.

Currently they use different templates for lept.pc which results in different compiler flags for the include path. See template for CMake and template for Autotools.

The lept.pc template for CMake should be fixed to fit the template for Autotools.

@stweil
Copy link
Member

stweil commented Mar 23, 2023

What do you means by pkg-config generated by autotools of cmake?

I meant lept.pc which is used for pkg-config and which is generated by autotools or cmake.

@zdenop
Copy link
Contributor

zdenop commented Mar 24, 2023

IMO leptonica has problem with naming conventions: "lept.pc" is one of examples (why not leptonica.pc?)
allheaders.h, PIX, BOX etc are others problems when you try to read the code and you are not familiar. There are too general.
That's why I prefer <tesseract/baseapi.h> and <leptonica/allheaders.h>.

If I understand it correctly: <leptonica/allheaders.h> is problem on MacOS with autotools build and pkgconfig only . On rest of platforms developers has freedom to choose if they will use <leptonica/allheaders.h> or just <allheaders.h>. IMO we should give this freedom also for autotools users...

@stweil
Copy link
Member

stweil commented Mar 24, 2023

<leptonica/allheaders.h> is a problem on any host if the build used lept.pc.in and did not use a common installation prefix like /usr or /usr/local. And therefore we should not propagate it, but tell people to use <allheaders.h>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants