filtfilt! converts all data to NaN #82

dttrugman · 2021-03-09T15:01:05Z

I am new to SeisIO, so apologies if I am doing something stupid.

I want to implement a simple workflow where I (1) download data and (2) do some simple preprocessing, including a bandpass filter. When I try to filter the data, however, the entire time series appears to be corrupted. For example:

using SeisIO
tstart = "2020-03-26T15:00:00"
tend = "2020-03-26T15:30:00"
CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend,)
println(S)
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))
ungap!(S) # remove gaps
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))
detrend!(S) # remove mean and trend
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))
taper!(S) # taper ends
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))
filtfilt!(S,fl=1.0,fh=10.0)
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x)))

Gives the following response:

SeisData with 1 channels (1 shown)
ID: TX.PECS.00.HHZ
NAME: Pecos
LOC: 31.3704 N, -103.867 E, 914.0 m
FS: 200.0
GAIN: 4.77413e8
RESP: a0 1.0, f0 1.0, 6z, 11p
UNITS: m/s
SRC: http://service.iris.edu/fdsnws/da…
MISC: 4 entries
NOTES: 4 entries
T: 2020-03-26T15:00:00 (0 gaps)
X: -1.010e+03
-1.009e+03
...
-1.410e+03
(nx = 360001)
C: 0 open, 0 total
min:-550105.0 nan: 0
min:-550105.0 nan: 0
min:-549159.7 nan: 0
min:-549159.7 nan: 0
min:NaN nan: 360001

Any idea what I am doing wrong? I am running julia version 1.5.2 if that helps.

dttrugman · 2021-03-10T16:18:19Z

As a follow up, I was able to filter without issuing using DSP.jl directly, e.g.:

xx0 = copy(S[1].x)
responsetype = Bandpass(1.0,10.0,fs=200.0)
designmethod = Butterworth(4)
myfilter = digitalfilter(responsetype,designmethod)
xx1 = filtfilt(myfilter,xx0)

So it appears to be an issue with SeisIO and not that library.

dttrugman · 2021-03-19T12:37:20Z

Any update on this? (Not to be pushy, but it's kind of a serious bug that makes SeisIO more or less unusable). In experiments, I've noticed that it consistently happens for higher sample rate data, though I'm not sure why.

jpjones76 · 2021-03-20T01:25:36Z

I've looked into your report but I see no bug. This can be fixed by converting your data to Float64 precision.

using SeisIO
tstart = "2020-03-26T15:00:00"
tend = "2020-03-26T15:30:00"
CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend,)
S.x[1] = Float64.(S.x[1]); # added by Josh 2021-03-19
ungap!(S) # remove gaps
detrend!(S) # remove mean and trend
taper!(S) # taper ends
filtfilt!(S,fl=1.0,fh=10.0)
println("min:", minimum(S[1].x)," nan: ",sum(isnan.(S[1].x))) # no NaNs

Detail

SeisIO.filtfilt! calls DSP.jl functions in all steps of the filtering process. There are, in fact, exactly two differences from DSP.filtfilt. The irrelevant difference is an optimization for memory conservation on multi-channel objects. The relevant difference is that DSP.filtfilt converts the input data to Float64 precision.

At Float32 precision, your filter choice creates coefficient vectors b, a whose ratios yield denominator values below eps(Float32) in the calls to filt!. NaNs are introduced in that step. The same filt! call in DSP.filtfilt works because a, b, z, X, and Y are Float64.

Incidentally, in Jones et al. (2020), we implicitly demonstrate that our filtering works correctly in Fig. 5b and the corresponding text. So I really don't understand your decision to denounce a published work as "unusable".

Resolution

I'm treating this as an implicit feature request for an automated way to prevent NaNs when calling filtfilt! on Float32 data. There may be a straightforward workaround. Unfortunately, for the scientific questions that SeisIO users are investigating, DSP.jl's forced Float64 conversion won't work: that causes more issues than it prevents.

dttrugman · 2021-03-20T01:49:12Z

Hi, thanks for your response. I just didn't know the data had to be Float64 before using filtfilt!

While it would be great for this to work on Float32, my "feature request" would really just be to provide some guidance and perhaps a worked example of how to do this properly in the Documentation and the Tutorial. I looked through both rather carefully and had no clue.... (I also noticed a similar problem in the Jupyter Notebooks when I ran them). One might also think about throwing an error if filtfilt! is applied to Float32?

In any case, I didn't mean at all to be disparaging about the software package. I am very excited to use it in my everyday workflow; I just need to be able to do simple things like filtering before making the conversion and having the confidence to make the switch. I asked around before posting this, and I am not the only user that has run into this issue and was puzzled by it.

Sorry to offend,

Daniel

jpjones76 · 2021-03-20T04:07:43Z

As I explained in my last comment, the user's choice of filter parameters is responsible for outputting NaNs. It's not a mistake for SeisIO to use Float32 data. Using Float32 data, rather than Float64, has exactly one effect: it narrows the range of viable filtering parameters.

To demonstrate this, suppose I modify your above example as follows:

using SeisIO
fl = 1.0e-7; fh = 1.0; tstart = "2020-03-26T15:00:00"; tend = "2020-03-26T15:30:00"; CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN", CHA,src="IRIS", s=tstart, t=tend)
S.x[1] = Float64.(S.x[1])
ungap!(S)
detrend!(S)
taper!(S)
X = deepcopy(S.x[1])
filtfilt!(S, fl=fl, fh=fh)
responsetype = Bandpass(fl, fh, fs=S.fs[1])
designmethod = Butterworth(4)
myfilter = digitalfilter(responsetype, designmethod)
X1 = filtfilt(myfilter, X)

Here, X1 contains all 64-bit NaNs. So does S.x[1].

A better solution for your data is to downsample. You're band passing with an upper corner frequency of 10.0 Hz. If you check the PSD estimate of the filter frequency response function, you'll find almost no signal energy above 12 Hz. Suppose we resample to 40.0 Hz, then:

using SeisIO
fl = 1.0e-3; fh = 10.0; tstart = "2020-03-26T15:00:00"; tend = "2020-03-26T15:30:00"; CHA = "TX.PECS.00.HHZ"
S = get_data("FDSN", CHA,src="IRIS", s=tstart, t=tend)
S.x[1] = Float64.(S.x[1])
ungap!(S)
detrend!(S)
taper!(S)
resample!(S, fs=40.0)
filtfilt!(S, fl=fl, fh=fh)
findall(isnan.(S.x[1]))

No NaNs.

dttrugman · 2021-03-20T12:48:50Z

Got it, thanks. I agree that is the better solution in this case, though I initially came across the issue when removing the instrument response for high-sample rate data where downsampling beforehand would not have been an option.

In any case, I do think it would be useful to clarify this nuance in the documentation - I am not the first person to make this mistake, nor will I be the last. It is not an issue in other codes commonly used to process seismic data like SAC, ObsPy, etc., so is not obvious to people making the switch. (And correct me if I am wrong, but I saw the same issue in the filtering example notebooks?).

But it's your code, so it's really your call. Just giving my two cents.

kura-okubo · 2021-03-21T05:07:35Z

Hello,

Thank you for the great discussion here. It is very helpful for me to implement filtering with SeisIO.
I have checked the script on the Fig. 5b in Jones et al. (2020) for the validation of instrument response removal.
We used SeisIO.filtfilt!() before translating the instrument response (line 41 of the test script). We obtained the results without NaNs using the second order Butterworth filter in the article.

To further check the issue, I have conducted the bandpass with forth order Butterworth, and found the results with NaNs when datatype is Float32 as discussed in this issue.

using SeisIO

# get data
tstart = "2018-03-01T00:00:00.000"
tend = "2018-03-02T00:00:00.000"
CHA = "TA.121A.--.HHZ"

S = get_data("FDSN",CHA,src="IRIS",s=tstart,t=tend, xf="TA.121A.xml")
println(S)
println(typeof(S[1].x))

freqmin = 0.05
freqmax = 19.9

# 2nd order Butterworth used in Jones et al. (2020) Figure 5b
S1 = deepcopy(S)
np = 2
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("2nd order with Float32 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 2nd order Butterworth with Float64
S1 = deepcopy(S)
np = 2
S1.x[1] = Float64.(S1.x[1])
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("2nd order with Float64 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 4th order Butterworth with Float32
S1 = deepcopy(S)
np = 4
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("4th order with Float32 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

# 4th order Butterworth with Float64
S1 = deepcopy(S)
np = 4
S1.x[1] = Float64.(S1.x[1])
filtfilt!(S1,fl=freqmin,fh=freqmax,np=np,rt="Bandpass")
println("4th order with Float64 min:", minimum(S1[1].x)," nan: ",sum(isnan.(S1[1].x)))

Output:

SeisData with 1 channels (1 shown)
    ID: TA.121A..HHZ
  NAME: Cookes Peak, Deming, NM, USA
   LOC: 32.5324 N, -107.785 E, 1652.0 m
    FS: 100.0
  GAIN: 6.27604e8
  RESP: a0 3.48233f17, f0 0.4, 6z, 11p
 UNITS: m/s
   SRC: http://service.iris.edu/fdsnws/da…
  MISC: 4 entries
 NOTES: 4 entries
     T: 2018-03-01T00:00:00 (0 gaps)
     X: +8.070e+02
        +8.580e+02
            ...
        +9.880e+02
        (nx = 8640001)
     C: 0 open, 0 total
Array{Float32,1}
2nd order with Float32 min:-3507.7124 nan: 0
2nd order with Float64 min:-3507.8948733714124 nan: 0
4th order with Float32 min:NaN nan: 8640001
4th order with Float64 min:-3740.9291867467587 nan: 0

Thank you for the commit (Issue #82) on the documentation and the filtfilt!() function, which helps me understand better on the data type when applying the filter.

Best,

jpjones76 · 2021-03-22T07:43:51Z

Got it, thanks. I agree that is the better solution in this case, though I initially came across the issue when removing the instrument response for high-sample rate data where downsampling beforehand would not have been an option.

Please provide a minimum working example. I am sincerely sorry that you're finding SeisIO frustrating, but merely claiming that you've encountered an error in another place is not enough information to act on.

It is not an issue in other codes commonly used to process seismic data like SAC, ObsPy, etc.

obspy/obspy#2529

so is not obvious to people making the switch. (And correct me if I am wrong, but I saw the same issue in the filtering example notebooks?).

You can verify what I wrote in my last reply by checking the closed Issues: no one has ever reported NaNs in the tutorial notebooks before. In fact, the only tutorial-specific problem at all was Issue #76, due to a minor typo in the documentation.

But it's your code, so it's really your call. Just giving my two cents.

Thank you for your feedback. I added two commits yesterday, one of which resolves the example in your initial comment. I plan to do a release with those commits today (22 March).

dttrugman · 2021-03-22T17:42:01Z

Perfect, thanks for the commits. I think that will help everyone in the long run. The example I posted above was my attempt at a "minimum working example" because the place where I initially spotted the error was in the middle of a long complex workflow with data that is not publicly available.

In terms of the tutorial, you may be on it already with your commit, but the place I am referring to is in the notebook called Part4-Processing, Cell #9: Sf = remove_resp(S, chans=4:S.n)

If we print Sf.x, we get:
4-element Array{Union{AbstractArray{Float32,1}, AbstractArray{Float64,1}},1}:
Float32[2.7664585f6, 2.308987f6, 1.9008121f6, 1.7666095f6, 1.8152758f6, 1.8575735f6, 1.7005322f6, 1.3385941f6, 846242.1, 320249.0 … -215298.5, -502047.0, -700261.25, -781168.4, -807831.6, -849426.06, -916592.8, -449656.7, 19702.234, 30941.07]
Float32[29794.031, -14405.785, -47402.758, -35956.305, 2092.416, 22985.586, 1762.7256, -27433.664, -11251.711, 61675.17 … 18562.438, 80717.66, 90282.016, 69408.92, 54823.605, 54827.85, 56621.652, 30729.158, 6788.757, 6151.325]
Float32[NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN … NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]
Float32[NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN … NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]

I know this hasn't been brought up in the forums, as I did a search before opening a new issue. But I also reached out to two colleagues who have experience with SeisIO. Both had run across the NaN issue before, neither knew why it was happening, and one of the two had stopped using the code because of it. All three of us have a non-trivial amount of experience in observational seismology, though clearly don't have as much expertise in the underlying algorithms as you do.

As you point out, it is not a "bug" but a misunderstanding of the effect that using Float32 data types can have in certain circumstances. So as long as new users are aware, I don't think this is a major issue. (As an aside, the obspy forum you mentioned appears unrelated to filtering / instrument response, though perhaps it is related in some deeper way that escapes me).

Thanks again for all your hard work on the software package. I really look forward to using it more.

Daniel

jpjones76 · 2021-03-22T22:02:21Z

With apologies, I'm not seeing NaNs there at all.

It's possible that IRIS had no data for some channels, or other data issues, when you executed those test commands. In creating that part of the tutorial, we tried to choose test stations that we knew had excellent uptimes, but no station is 100%.

It's also possible that my commits from March 20th fixed it.

I can try to add some code to the tutorial that checks for station and data issues.

dttrugman · 2021-03-22T22:56:47Z

Hi,

I looked into this again. I think you are right, it is a data availability issue. I re-downloaded and re-installed. Now when running the tutorial, I get the following:

Then I sat five minutes and ran again:

So the NaNs go in and out depending on the time of the day. Likely unrelated to the filtering issue, but useful to know about.

Daniel

jpjones76 · 2021-03-22T23:20:17Z

Perfect. I think it should be relatively straightforward to check for that in the tutorial.

jpjones76 · 2021-03-23T06:51:19Z

Patch-level release (v1.2.1) completed. Includes the filtfilt! changes, various tutorial updates (some related to this), and an unrelated SAC problem. All automated checks have passed, so it should be in the Julia package registry tomorrow.

dttrugman · 2021-03-24T00:28:44Z

Great, download and installed, and all tests passed of course.

jpjones76 added external bug bug in SeisIO dependency feature request not a bug and removed external bug bug in SeisIO dependency labels Mar 19, 2021

jpjones76 added a commit that referenced this issue Mar 20, 2021

Issue #82

c4c3cce

jpjones76 closed this as completed Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filtfilt! converts all data to NaN #82

filtfilt! converts all data to NaN #82

dttrugman commented Mar 9, 2021

dttrugman commented Mar 10, 2021

dttrugman commented Mar 19, 2021

jpjones76 commented Mar 20, 2021

dttrugman commented Mar 20, 2021

jpjones76 commented Mar 20, 2021

dttrugman commented Mar 20, 2021

kura-okubo commented Mar 21, 2021 •

edited

Loading

jpjones76 commented Mar 22, 2021 •

edited

Loading

dttrugman commented Mar 22, 2021

jpjones76 commented Mar 22, 2021

dttrugman commented Mar 22, 2021

jpjones76 commented Mar 22, 2021

jpjones76 commented Mar 23, 2021

dttrugman commented Mar 24, 2021

filtfilt! converts all data to NaN #82

filtfilt! converts all data to NaN #82

Comments

dttrugman commented Mar 9, 2021

dttrugman commented Mar 10, 2021

dttrugman commented Mar 19, 2021

jpjones76 commented Mar 20, 2021

Detail

Resolution

dttrugman commented Mar 20, 2021

jpjones76 commented Mar 20, 2021

dttrugman commented Mar 20, 2021

kura-okubo commented Mar 21, 2021 • edited Loading

jpjones76 commented Mar 22, 2021 • edited Loading

dttrugman commented Mar 22, 2021

jpjones76 commented Mar 22, 2021

dttrugman commented Mar 22, 2021

jpjones76 commented Mar 22, 2021

jpjones76 commented Mar 23, 2021

dttrugman commented Mar 24, 2021

kura-okubo commented Mar 21, 2021 •

edited

Loading

jpjones76 commented Mar 22, 2021 •

edited

Loading