Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json sidecar invalid UTF-8: "ReconstructionMethod": "ÿÿÿ", #476

Closed
dmd opened this issue Jan 24, 2021 · 31 comments
Closed

json sidecar invalid UTF-8: "ReconstructionMethod": "ÿÿÿ", #476

dmd opened this issue Jan 24, 2021 · 31 comments

Comments

@dmd
Copy link

dmd commented Jan 24, 2021

Describe the bug
In v1.0.20201224, but not in v1.0.20201102, the JSON sidecar contains a field containing invalid UTF-8:

"ReconstructionMethod": "ÿÿÿ�",

To Reproduce
Steps to reproduce the behavior:

  1. Run the command dcm2niix -o out anon.dcm using the attached file
    anon.dcm.zip

Expected behavior
A valid, BIDS-compatible JSON sidecar encoded as valid UTF-8.

Output Log

If applicable, output generated converting data.

** Version (please complete the following information):**

Chris Rorden's dcm2niiX version v1.0.20201224  GCC9.3.0 x86-64 (64-bit Linux)
v1.0.20201224
  • Is this the latest stable release? NO, it is v1.0.20201224 which is what conda install -y -c conda-forge dcm2niix provides

  • If the latest stable version fails, and you are using macOS or Linux. Does the latest commit on the development branch resolve your issue? YES ... the field is gone

@neurolabusc
Copy link
Collaborator

@yarikoptic I really do not think conda should install from the development branch. Commits to this branch are only subjected to a minimal automatic QA and are not vetted by other users. I introduce experimental features to this branch.

@dmd this specific issue should be fixed in the development branch.

dmd added a commit to dmd/mictools that referenced this issue Jan 24, 2021
@yarikoptic
Copy link
Contributor

FWIW, indeed better then not to use development branch in conda releases. But how even that version came about? I don't see it now among tags here - the location where
conda forge recipe seems to look for releases. Attnn @kastman @ningfei and @neurolabusc (;-)) who maintain dcm2niix in conda forge (I do not, yet at least)

@kastman
Copy link

kastman commented Jan 25, 2021

@neurolabusc Is it possible this was a tag that was uploaded and subsequently deleted? Looks like the url for tag and release archives is the same format, so I suspect it's hard for the conda-forge build bot to know the difference.

I feel like I added that one (or maybe the auto-merge did?) so for the time being I guess we can manually check the repo to see if a new build is an official release. But I'd be curious to see if there's any other automated way to tell the two apart. A quick search doesn't come up with much (shirt of something complicated like firing off a GitHub action specifically on release), but I'll look again.

@neurolabusc
Copy link
Collaborator

OK, so this is another unintended consequence of generating a tag for the development branch noted in issues 470 and 467. It is unfortunate we are unable to restrict condo to the master branch, but going forward we will only generate tags for new stable releases.

Given things have gotten out of sync, maybe we should issue the next stable release a few months early. @yarikoptic since you have large datasets and use dcm2niix in different ways from me, can you test out this current stable release. Likewise, @captainnova and @baxpr I would be grateful if you could provide confidence regarding the output. A lot of the changes in this release were driven by Philips enhanced DICOMs and BEP009. Unfortunately, I have very few exemplars of those formats.

@yarikoptic
Copy link
Contributor

... we we are unable to restrict condo to the master branch, but going forward we will only generate tags for new stable releases.

it is up to maintainer to decide what to upload. But IMHO just do not tag with release like tags in develop. Could have some .dev suffix to signal that it is not a release if a tag desired. Would make it trivial to tell that apart from a proper release.

... can you test out this current stable release

do you mean 1.0.20201102 -- I have tried it on a few datasets. my testing == your testing (ran while building a neurodebian package) + heudiconv tests and a few sample manual cases "in the field", so not that extensive. But so far so good AFAIK (besides what was reported/discussed or addressed (e.g. #395 #464)

@neurolabusc
Copy link
Collaborator

@yarikoptic Please test the latest commit to the dcm2niix development branch (currently v1.0.20210124). Consider this a release candidate for the next stable release.

  • For Windows: You can get a pre-compiled version from AppVeyor (click on the Artifacts button).
  • For UNIX:
git clone --branch development https://github.com/rordenlab/dcm2niix.git
cd dcm2niix/console
g++ -O1 -g -fsanitize=address -fno-omit-frame-pointer  -I.  main_console.cpp nii_foreign.cpp nii_dicom.cpp jpg_0XC3.cpp ujpeg.cpp nifti1_io_core.cpp nii_ortho.cpp nii_dicom_batch.cpp  -o dcm2niix -DmyDisableOpenJPEG
./dcm2niix ....

@yarikoptic
Copy link
Contributor

@yarikoptic Please test the latest commit to the dcm2niix development branch (currently v1.0.20210124).

there is no such tag, correct? so I should just take this one?

$> git describe --tags upstream/development
v1.0.20200427-100-ge32fb1b

(doesn't sound right since you do not release from development branch typically IIRC)?

@yarikoptic
Copy link
Contributor

ie, what should be the "version" for this thing if it doesn't have master merged into it (#467, closed without change)?

@neurolabusc
Copy link
Collaborator

The version is the value reported when you run dcm2niix -v. Issue 467 was the source of this whole debacle. development commits will not have a unique GitHub tag associated with them, as discussed with the unintended consequences from that issue.

dcm2niix -v
Chris Rorden's dcm2niiX version v1.0.20201207  Clang12.0.0 ARM (64-bit MacOS)
v1.0.20201207

@yarikoptic
Copy link
Contributor

from #467 (comment) I can say that you agreed that merging master into development is a good idea ;)

FWIW:

dcm2niix -v
Chris Rorden's dcm2niiX version v1.0.20201207 Clang12.0.0 ARM (64-bit MacOS)
v1.0.20201207

Such a version does not uniquely identify the state, since it would be the same today and tomorrow (when e.g. you push new commits and ask me to retest again) thus making it impossible (without coming up with my own adhoc version) to upload some "perspective" packages anywhere while making them sortable etc. That is why I like git describe mechanism which adds sortable (number of commits after release tag) + uniq (hexsha) identifier into a version. In DataLad, to please version comparators in Python somewhat we do not include that nice hexsha but at least include number of commits after (whenever testing/build from git repo only):

$> datalad --version             
datalad 0.13.7.dev752

here, if master gets merged into development I would get

$> git describe --tags                
v1.0.20201102-31-gd8189f5

(local to me since I was doing merge locally, so nobody else would have that exact hexsha) and thus could mint a version for "development" neurodebian package such as 1.0.20201102+git31-gd8189f5-1 which would allow me to do the same tomorrow for a new (higher number and different hexsha) state without causing any confusion.

@baxpr
Copy link

baxpr commented Jan 25, 2021

@duettwe can you take a look here?

@neurolabusc
Copy link
Collaborator

@yarikoptic the development branch is not for mass distribution. I just wanted you to test it with your own data and routines, as you use the software in ways that I did not expect when developing this software. Your large dataset and routines can be used to validate the development branch prior to generating a new stable release. While each commit is tested against my own internal datasets, these do not explore edge cases.

@duettwe
Copy link

duettwe commented Jan 27, 2021

We tested some DICOM conversions on the most recent development branch (e32fb1b). Verifying with fsleyes, we could see that the NIFTIs appeared correct.

@baxpr
Copy link

baxpr commented Jan 27, 2021

Thanks @duettwe. To expand a bit, we checked a few specific things that we had trouble with in the past:

  • Nifti header geometry (all series were aligned in world space)
  • Philips derived ADC separated into its own .nii
  • fmri volumes stored into single 4d .nii
  • Philips fieldmap split into two .nii, _e1 (magnitude) and _e2 (field in Hz)

@captainnova
Copy link
Collaborator

Hi, I am not sure if I should be commenting here or in issue #473, but when I tested the development branch with my local test script it broke all the GE filenames. The script uses dcm2niix -f 'M_%i_%t_%s_%d', i.e. it expects the output filenames to have (0008, 103e) SeriesDescription (or a filesystem safe version of it) in the %s slot. Instead the development branch is putting in (0018, 1030) ProtocolName.

Which is exactly what #473 said it would do, but I don't understand why that's considered good. I'm not seeing GE swapping SeriesDescription and ProtocolName around, so I don't need dcm2niix to swap them itself. The series descriptions and protocol names look reasonable in the DICOM, i.e. the series descriptions describe the individual series, and (0018, 1030) ProtocolName describe the scan session.

Examples:
An axial T2*w MRI:
(0008, 103e) Series Description LO: u'Ax GRE'
(0018, 1020) Software Version(s) LO: ['23', 'LX', 'MR Software release:DV22.0_V02_1122.a'] # pydicom converted // to ,
(0018, 1030) Protocol Name LO: u'Seizure Brain'

An axial dMRI, with GE's new interleave-b-values-within-volumes scheme:
(0008, 103e) Series Description LO: u'Ax MB Adv dMRI with DC'
(0018, 1020) Software Version(s) LO: ['28', 'LX', 'MR Software release:RX28.0_R02_1946.a']
(0018, 1030) Protocol Name LO: u'ADNI3 Basic Human Protoc'

Since we would only use (0018, 1030) Protocol Name to name a directory of series (if at all), I greatly prefer dcm2niix's old behavior.

Please let me know if I can help sort this out, or if we should continue this in #473.

@baxpr
Copy link

baxpr commented Jan 28, 2021

Interesting. To contribute some info about Philips on that - I believe Philips puts the series description (such as Ax GRE) in the "Series Description" field as well. But the "Protocol Name" field is used for the same data, with an added tag like WIP Ax GRE to indicate when a scan used a research mode. Not sure how consistent that behavior is because I only spot checked.

@captainnova
Copy link
Collaborator

I believe Philips puts the series description (such as Ax GRE) in the "Series Description" field as well. But the "Protocol Name" field is used for the same data, with an added tag like WIP Ax GRE to indicate when a scan used a research mode. Not sure how consistent that behavior is because I only spot checked.

@baxpr , I can confirm that Philips puts the series description in both the usual (0008, 103e) spot and (0018, 1030), although I'd have to go hunting to find one with a WIP. To be honest our lab tends to ignore (0018, 1030), so creatively reinterpreting it doesn't bother us much.

@neurolabusc
Copy link
Collaborator

@andersonwinkler and @captainnova thanks for your feedback. Thanks to help from GE engineers including @mr-jaemin dcm2niix contains a lot more meta-data for users. I had decided to use the series description for protocol name due to samples like this one. However, swapping the series protocol name and series description does seem like a mistake. I think the latest commit should resolve this.

Consider a conversion where the file name should be composed of the series number (%s), series description (%d) and the protocol name (%d):

dcm2niix -f %2s-%d-%p /path/to/DICOMs

Given a DICOM file with the tags:

(0020,0011) IS [5]                                              #   2, 1 SeriesNumber
(0008,103e) LO [3D Ax T1 MP-RAGE_R22]      #  20, 1 SeriesDescription
(0018,1030) LO [Brain Advanced Sequences]  #  24, 1 ProtocolName

Here is the filenames generated by versions of dcm2niix:

Version Filename
v1.0.20201102 05-3D_Ax_T1_MP-RAGE_R22-3D_Ax_T1_MP-RAGE_R22
v1.0.20210124 05-Brain_Advanced_Sequences-3D_Ax_T1_MP-RAGE_R22
v1.0.20210129 05-3D_Ax_T1_MP-RAGE_R22-Brain_Advanced_Sequences

Coming from Siemens, I always use the meaningful protocol name (%p). But often for GE the description (%d) is more meaningful. So GE users like @captainnova have simply used the relevant name for their vendor. The latest commit does allow you to extract both series name and protocol name. I think this is the most parsimonious solution, but it will change file names for GE users who have previously used %p.

@andersonwinkler
Copy link

Many thanks @neurolabusc! We have a wrapper around dcm2niix and in it I was swapping the two fields in the JSON, a step that will now be no longer needed. Many thanks (also to everyone who contributed).
Cheers,
Anderson

@captainnova
Copy link
Collaborator

Thanks @neurolabusc! It passes our local test suite now.

@neurolabusc
Copy link
Collaborator

Thanks all for testing. Expect a new stable release shortly.

@neurorepro
Copy link

Hello, i got the same error when installing with conda the latest version of heudiconv (which in turned installed dcm2niix from conda)

ERROR: /<root_dir>/bids/sub-219/func/sub-219_task-rest_run-01_bold.json is not a valid json file
Traceback (most recent call last):
  File "/opt/conda/shared/envs/ni38/bin/heudiconv", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/cli/run.py", line 24, in main
    workflow(**kwargs)
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/main.py", line 337, in workflow
    prep_conversion(sid,
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/convert.py", line 207, in prep_conversion
    convert(cinfo,
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/convert.py", line 488, in convert
    tuneup_bids_json_files(bids_outfiles)
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/bids.py", line 200, in tuneup_bids_json_files
    json_ = load_json(jsonfile)
  File "/opt/conda/shared/envs/ni38/lib/python3.8/site-packages/heudiconv/utils.py", line 178, in load_json
    data = json.load(fp)
  File "/opt/conda/shared/envs/ni38/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/conda/shared/envs/ni38/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/conda/shared/envs/ni38/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/conda/shared/envs/ni38/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 27 column 27 (char 894)

And when looking at the culprit json file:

"ReconstructionMethod": "�19H½º",
$ heudiconv --version
0.9.0
$ dcm2niix --version
Chris Rorden's dcm2niiX version v1.0.20201224  GCC9.3.0 x86-64 (64-bit Linux)
v1.0.20201224

@neurorepro
Copy link

My solution for now to remove latest dcm2niix without removing heudiconv, and installing a version of dcm2niix not generating the error above:

conda remove --force dcm2niix
sudo apt install dcm2niix

@neurolabusc
Copy link
Collaborator

v1.0.20201224 was never meant for public consumption. There will be a new stable release shortly.

@neurorepro
Copy link

Sure, just letting it out there for people installing heudiconv or dcm2niix via conda and getting the same kind of errors as i did (it got me some time to figure out i didn't get the right dcm2niix version)

@neurorepro
Copy link

neurorepro commented Mar 12, 2021

Hello apologies for the new message as I know you are working on it, but i just want to mention that two new persons contacted me since the last message about the bug they are facing when installing heudiconv with conda, which in turn install this version of dcm2niix. So i believe quite a few users may have this bug without knowing what's wrong with their workflow (I could point out to the users the solution I indicated above).

@yarikoptic I don't know if it is possible meanwhile to temporarily indicate a different version of dcm2niix in the conda recipe for installing heudiconv ? All new installs of heudiconv may face this issue.

@neurolabusc
Copy link
Collaborator

@neurorepro there will be a new stable release of dcm2niix when issue 493 is resolved. I am just waiting for @drmclem to confirm that we will now default to using the Philips precise scaling instead of the real world values when both exist. So I expect a new release in a couple of days.

In addition to issue 491, the new stable release will make substantial changes. I would encourage the heudiconv team to test the latest version of the development branch, as there a number of changes:

@neurorepro
Copy link

@neurolabusc Great to know it's coming along very soon, thank you for the update !

@dmd
Copy link
Author

dmd commented Mar 30, 2021

@neurolabusc Will dcm2niix=1.0.20210317 be available in conda-forge soon?

@neurolabusc
Copy link
Collaborator

Maybe @kastman or @ningfei can help. I did not set up the conda-forge hooks.

@neurolabusc
Copy link
Collaborator

@dmd thanks to @kastman it is done. You can now get v1.0.20210317 from conda-forge

yarikoptic added a commit to neurodebian/dcm2niix that referenced this issue Apr 6, 2021
* tag 'v1.0.20210317': (23 commits)
  CI: Travis updates submodules always from remote.
  Update dcm_qa_nih submodule.
  Remove diagnostic message
  help should describe accession number (rordenlab#496)
  Describe and provide kludge for mangled Canon DICOMs (rordenlab#495)
  Update Philips notes (rordenlab#493)
  Update dcm_qa submodule.
  Support Canon Enhanced DICOM (rordenlab#491)
  Use first recognized manufacturer tag (rordenlab#487)
  If mosaics where ANY volume is non-planar, save ALL volumes as 3D(rordenlab#481)
  Add notes
  Kludge to separate vNav files as 3D (rordenlab#481)
  Tell user to override merging (-m o) to separate coils, specific with fmrib usage that disrupts Siemens DCE processing (rordenlab#187)
  Forbid semicolon in filenames as Linux uses this for command concatenation and Windows function GetOpenFileName will truncate (rordenlab#425)
  report totalReadoutTime once
  EstimatedEffectiveEchoSpacing -> EffectiveEchoSpacing (rordenlab#480)
  GE Round factor for Total Readout Time
  Update explicit naming of DICOMs (rordenlab#252), GE file naming changes (rordenlab#476)
  Update issue templates
  GE TotalReadouTime, BEP009 fixes (rordenlab#476)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants