-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sf::st_layers hanging R / bombing rstudio #1995
Comments
Works without problems here, also in rstudio (ubuntu 20.04, RStudio 2022.07.1+554), also no issues when run under valgrind. What is your |
With Windows 10, R 4.2.1 RGui.exe and R.exe:
With freshly installed RStudio-2022.07.1-554.exe no problems. There were recent reports of odd errors in packages installed from the Rstudio mirror - try reinstalling sf outside Rstudio from another mirror? |
I've been double checking all my installs. This is straight from an Rterm.exe under Cmder. Now sst_layers() just hangs. This is after a
|
Does released sf have the same problem? |
Yes. The output of
|
Long shot, re-install Rcpp from a CRAN mirror? |
nope. I guess now's as good a time as any to reinstall all the things. |
I've now completely uninstalled R and all my packages and upgraded rstudio. I'm going to step through the code that creates this and try and find what the hang up is. |
Alright -- I've narrowed this down to its source. The issue is an attribute "uniqueID" that is causing the gpkg reader to hang. If I uncomment the last line, reprex hangs. If I run this in rstudio, it bombs rstudio. While it is hung, the R for Windows process is hammering a CPU and eating between 200 and 4000MB of memory with this sawtooth pattern. Happy to do more debugging or dump some other logs if someone knows where to go looking for them. I guess I'll stop using this "uniqueID" attribute for now. 😝 borked <- sf::read_sf('
{
"type": "FeatureCollection",
"name": "borked",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::5070"
}
},
"features": [
{
"type": "Feature",
"properties": {
"COMID": 19736669,
"REACHCODE": "06010202003357",
"REACH_meas": 22.451321717399999,
"uniqueID": "03507000"
},
"geometry": {
"type": "Point",
"coordinates": [
1117044.368864378193393,
1445765.030222098575905
]
}
},
{
"type": "Feature",
"properties": {
"COMID": 19677981,
"REACHCODE": "06020002000118",
"REACH_meas": 61.4738,
"uniqueID": "03554000"
},
"geometry": {
"type": "Point",
"coordinates": [
1072594.611272452631965,
1396990.093436731025577
]
}
}
]
}
')
sf::write_sf(borked, "borked.gpkg")
sf::write_sf(dplyr::rename(borked, ID = uniqueID), "works.gpkg")
sf::read_sf("works.gpkg")
#> Simple feature collection with 2 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1072595 ymin: 1396990 xmax: 1117044 ymax: 1445765
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 2 × 5
#> COMID REACHCODE REACH_meas ID geom
#> <int> <chr> <dbl> <chr> <POINT [m]>
#> 1 19736669 06010202003357 22.5 03507000 (1117044 1445765)
#> 2 19677981 06020002000118 61.5 03554000 (1072595 1396990)
# sf::read_sf("borked.gpkg")
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 compiler_4.2.1 pillar_1.8.1 highr_0.9
#> [5] class_7.3-20 tools_4.2.1 digest_0.6.29 evaluate_0.16
#> [9] lifecycle_1.0.2 tibble_3.1.8 pkgconfig_2.0.3 rlang_1.0.5
#> [13] reprex_2.0.2 DBI_1.1.3 cli_3.4.0 rstudioapi_0.14
#> [17] yaml_2.3.5 xfun_0.32 fastmap_1.1.0 e1071_1.7-11
#> [21] withr_2.5.0 stringr_1.4.1 dplyr_1.0.10 knitr_1.40
#> [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.1 tidyselect_1.1.2
#> [29] classInt_0.4-7 grid_4.2.1 glue_1.6.2 sf_1.0-8
#> [33] R6_2.5.1 fansi_1.0.3 rmarkdown_2.16 purrr_0.3.4
#> [37] magrittr_2.0.3 ellipsis_0.3.2 htmltools_0.5.3 units_0.8-0
#> [41] KernSmooth_2.23-20 utf8_1.2.2 stringi_1.7.8 proxy_0.4-27
sf::sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
#> "3.9.1" "3.4.3" "7.2.1" "true" "true"
#> PROJ
#> "7.2.1" Created on 2022-09-09 with reprex v2.0.2 |
After some further testing, it's actually any attribute with "unique" in the name. e.g. this causes it.
|
Very odd (Windows 10)
|
Maybe also try terra:
which uses the same GDAL binary library. |
And:
|
|
|
No, nothing with a freshly installed rstudio with the original reprex. My Windows 10 version is the same too. Is any package not a CRAN binary? |
No... I reinstalled R, Rstudio, and my entire library with devtools yesterday. I'll do a little more snooping here. |
Do you know which mirror was used? |
The default cloud one. For the record, I have installed |
By native build, do you mean the released Rtools42? Please simplify by avoiding any Rstudio packages in any source builds, use only the very simplest route. My installs for which no errors occur are just standard CRAN binary installs, though I have Rtools42, and even (not now) test versions of MXE-built libgdal.a testing updated drivers. I have a feeling that somewhere you have a component built with a non-standard build chain. |
@edzer : borked.gpkg and nc_test.gpkg OK also in rstudio. |
OK -- I think this might be something? This is generated by looking at the crash dump in WinDbg. See the lines with stuff like the following out in the middle of the stack: From sf:
From terra:
|
@rsbivand I mean that I installed |
I ran this code from @dblodgett-usgs and I also can confirm that on Windows 10 RStudio is crashing and R Terminal is hanging (for
|
Great; I'm looking at @dblodgett-usgs stack trace for sf, and am completely puzzled why |
@edzer -- for the record, the nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
nc <- dplyr::rename(nc, "cnty_unique_id" = CNTY_ID)
sf::write_sf(nc, "nc_test.gpkg")
sf::read_sf("nc_test.gpkg")
The full report is:
|
Progress - after re-installing sf binary from CRAN cloud mirror, I'm also seeing the Edit: Not stable progress - after switching to |
On the Windows PC on which
that is:
passes, as does |
@rhijmans maybe you could take a look what is going on here, being familiar with windows debugging? |
@rouault yes, the underlying problem seems to be a MSVCRT libsystre in Msys2, which only fails in some settings (my guess). Windows binaries for R are an early adopter of UCRT among FOSS: https://blog.r-project.org/, particularly https://blog.r-project.org/2021/12/07/upcoming-changes-in-r-4.2-on-windows/index.html https://blog.r-project.org/2022/06/16/upcoming-changes-in-r-4.2.1-on-windows/index.html. It appears that OSGeo4W are MSVCRT, possibly because they move at the speed of the slowest included library or application. The patch @kalibera adds is only in the MXE UCRT build train for immediate protection beause a working GPKG driver is vital now. Fixing (or checking) the upstream Msys2 |
No, I am not building packages for R 4.2 with unreleased versions of Rtools42, only with R-devel, that would require too much of computational resources. You can test with R-devel (using my binaries or building the packages from source) and using R 4.2 (building the packages from source). |
For build 5286-5107 of Rtools42 (and subsequent), could those testing please note that sf |
There are known issues with std::regex in multi-byte locales (R runs in UTF-8). See below. The tesseract thread includes a repro that was as far as I understand done using a different UCRT toolchain (not Rtools42). The GCC bug report says that std::regex may get deprecated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723 |
Yes, the UCRT3 builds (using R-devel and unreleased version of Rtools42) automatically apply patches, which also adjust the linking, but if one tests using R 4.2. and unreleased Rtools42, one has to apply the patch manually. The patches used in the last build are available here: https://www.r-project.org/nosvn/winutf8/ucrt3/patches/CRAN/ The patches change the linking unconditionally. There is no prescribed way of conditioning as far as I know and it is a rare problem (4 packages currently have patches changing linking). Perhaps you could check for presence of files (headers, libraries), a bit like configure does, so e.g. check for kea. A new major release of R might require bigger changes, but then one could in principle conditionalize on R version. An installation of Rtools42 (from installer and toolchain bundle) has meta-data in files (.version, also list of MXE packages), but I would myself choose rather checks for presence of files as they are more general and portable. If anyone still had preference for the meta-data files, I think the right way would be to first discuss with the CRAN repository maintainers. |
Ok, @kalibera , hot patching makes good sense, thanks. |
Here are binary builds of terra and sf with the 5336 libgdal - one 7z archive for each of R-devel and R-4.2.1: https://drive.google.com/drive/folders/1aG5XsjkPAO35aUWLSImlB8_eWg47Gxno?usp=sharing. Please do not trust these too much, they pass R CMD check, but will not be identical with those from the actual build system. The package source code was manually patched for the reinstated KEA driver. Gdrive folder updated with copies of the source tarballs and fresh builds of the binaries from these tarballs, about Sept. 15, 10:30 CEST. |
Thanks, I am now working on a bigger update of Rtools42, which will include more than this work-around, so the packages built on my system will get overwritten with even newer ones. So your builds may be good for reference if anyone needed to test only this change and nothing more. |
OSGeo/gdal#6359 removes the use of std::regex in the OGR SQLite&GPKG drivers |
@rouault: thanks! As I read the commit, was the previous code looking in the field string rather than the table definition for the |
@rsbivand -- sorry to be dense, how should I be installing these binary builds? I'm getting errors with |
The previous version looked at fieldStr, which contained each column definition, extracted from tableDefinition, split on comma. Anyway the new version should hopefully be at least as good, and probably better, than the previous one, as I assume the previous one might have had issues if comma were found in column name or DEFAULT 'some default value, with comma' clauses. |
@dblodgett-usgs install.packages("sf_1.0-9.zip") |
🎉 You guys are 🧙 level. library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.5.0, PROJ 7.2.1; sf_use_s2() is TRUE
nc <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
names(nc)[4] <- "cnty_unique_id"
write_sf(nc, "nc_test.gpkg")
read_sf("nc_test.gpkg")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ cnty_u…¹ NAME FIPS FIPSNO CRESS…² BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # … with 90 more rows, 4 more variables: BIR79 <dbl>, SID79 <dbl>,
#> # NWBIR79 <dbl>, geom <MULTIPOLYGON [°]>, and abbreviated variable names
#> # ¹cnty_unique_id, ²CRESS_ID
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_1.0-9
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9.1 compiler_4.2.1 pillar_1.8.1 highr_0.9
#> [5] class_7.3-20 tools_4.2.1 digest_0.6.29 evaluate_0.16
#> [9] lifecycle_1.0.2 tibble_3.1.8 pkgconfig_2.0.3 rlang_1.0.5
#> [13] reprex_2.0.2 DBI_1.1.3 cli_3.4.0 rstudioapi_0.14
#> [17] yaml_2.3.5 xfun_0.32 fastmap_1.1.0 e1071_1.7-11
#> [21] withr_2.5.0 stringr_1.4.1 dplyr_1.0.10 knitr_1.40
#> [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.1 tidyselect_1.1.2
#> [29] classInt_0.4-7 grid_4.2.1 glue_1.6.2 R6_2.5.1
#> [33] fansi_1.0.3 rmarkdown_2.16 purrr_0.3.4 magrittr_2.0.3
#> [37] htmltools_0.5.3 units_0.8-0 KernSmooth_2.23-20 utf8_1.2.2
#> [41] stringi_1.7.8 proxy_0.4-27
sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
#> "3.9.1" "3.5.0" "7.2.1" "true" "true"
#> PROJ
#> "7.2.1" Created on 2022-09-14 with reprex v2.0.2 |
Thanks! I'll update the 7z's tomorrow with sources (done 15/9 10:30 CEST). Actually the issue has helped improve GDAL too, so a "unique" contribution. |
To catch things like this, sooner, on Windows, one would have to run tests in UTF-8, for which one would have to use UCRT and a UCRT toolchain, and one would have to set UTF-8 to be the native encoding via the fusion manifest (at build time via the manifest file, as R does it). It could be done using R/R packages and Rtools42 (MXE with some updates). There are number of patches that are now being done downstream to make gdal work (https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/gdal-1-fixes.patch) and as always, it would be nice if the upstream code already could be built and used with UCRT without such downstream patching. So, it would be nice if the upstream MXE could have up to date versions of gdal etc (not having to patch downstream in Rtools42) and if upstream gdal etc could work with UCRT/UTF-8 (not having to patch downstream in Rtools42). That would increase the chances problems are found in time and solved properly. |
@kalibera This recalls https://blog.r-project.org/2019/03/28/use-of-c-in-packages/index.html, but here the problem has been rushing forward without awareness that saying |
My blog was primarily against using C++ to interface with R, not against C++ in principle. If a library independent on R internally uses C++, without ever calling back to R (or allocating from R heap, throwing exceptions/jumping around R stack frames), that's ok. Yes, checking C++ in R packages for potential use of std::regex would be useful. |
So let |
So CRAN and Bioc teams would be the ones to check. |
Here is another GDAL issue that appears to be related to the R toolchain on Windows. In short, this HDF5 file can be read on linux but not windows. This problem happens with "terra" and also with "stars". LINUX
WINDOWS
The second set of warnings should really be an Error because
|
@rhijmans which versions of the library used by the driver are being used? The windows version will be that of MXE in the ucrt3 builds, presumably released Rtools42, not my test build with the unreleased version of libgdal, which version in your linux, and how was thst gdal built and installed? |
Please note that std::regex is unreliable also on other platforms (seen on macOS at least with R packages as I've learned now). It is not specific to Windows (nor to Rtools42). |
@rsbivand: This fails on windows with the old (in R 3.6, GDAL 3.2.1) and new (UCRT) RTools using GDAL 3.4.3. It works for me on Ubuntu or Debian with GDAL 2.2.3, 2.4.0, and 3.4.0, and you showed that it works in Fedora with GDAL 3.5.2. (and you refer to the HDF lib versions as well, which may be helpful). @kalibera: point well taken that the std::regex problems are not a Rtools issue. But this seems to be caused by how the HDF5 lib is built for R on windows. @rouault solved a (perhaps) similar issue with HDF and OSGeo4W |
@rhijmans From reading the OSGeo4W issue, it may be that there is something odd happening when HDF5 is static rather than dynamic too. It would be interesting to see whether the error seen in R on Windows binary packages can be reproduced without R and the packages using MXE-built GDAL apps. These are not distributed, so someone will have to run a custom cross-compilation. Can you establish whether there is a useful HDF5 test suite on Windows before the HDF5 libraries are used when GDAL is built? Are there issues raised about Windows builds for HDF5 itself? |
This thread may be hard to follow when this is no longer about the std::regex. Still, I think the problem with HDF5 is that the szip is not supported (it is not part of MXE), but the file uses it. The output with the current (unreleased yet) UCRT3 Rtools42 build is:
and szip is indeed disabled in the HDF5 build. |
@kalibera Thanks! I followed up on rspatial/terra#686 as the initial terra issue. |
When I try to sf::st_layers() on the attached gpkg, rstudio bombs. When I list from a terminal, I see:
Reproduce with:
Paste the output of your
sessionInfo()
andsf::sf_extSoftVersion()
nav_06.zip
The text was updated successfully, but these errors were encountered: