-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure empty stream segments are initialised #129
base: main
Are you sure you want to change the base?
Conversation
* Initialise empty segment list in StreamData * Initialise empty stream segments when dejitter_timestamps=True * Update load data tests and add test for a file with an empty streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thank you! Just a small question/suggestion, you are now checking if a test file exists, and if not you're skipping the test, which is really useful. However, a @pytest.mark.skipif
marker would probably be even more visible.
Also, I saw that you made several PRs (great!), can we go one by one to make reviewing and merging easier? I suggest that we start with this one, and then you tell me which one to do next. |
The best I could come up with is to use a fixture.
So the fixture seems like a better option to me? |
For sure. I've marked the others as draft for now. They are basically in sequential order - each PR builds on the previous. I've tried to break them up into separate issues to give you a heads up on where I was trying to get to, and yeah was thinking the same, take one at a time and rebase as we go! |
Yes, I think it's a good option. However, to avoid having to duplicate file paths, you could define a path for each test file globally (since we seem to have exactly one test per file instead of a test using several different files). Something like path_minimal = Path("example-files/minimal.xdf")
@pytest.mark.skipif(not path_minimal.exists(), reason="File not found.")
def test_minimal_file():
streams, header = load_xdf(path_minimal)
... Actually, we already have a list of available testing files as files = {
key: path / value
for key, value in {
"minimal": "minimal.xdf",
"clock_resets": "data_with_clock_resets.xdf",
"empty_streams": "data_with_empty_streams.xdf",
}.items()
if (path / value).exists()
}
@pytest.mark.skipif("minimal" not in files, reason="File not found.")
def test_minimal_file():
streams, header = load_xdf(files["minimal"])
... I think I prefer the last option, WDYT? While we're at it, I think the testing files should not start with |
cffacae
to
0610c35
Compare
* parameterize synchronize_clocks - True: pass - False: pass * parameterize dejitter_timestamps - True: pass - False: error calculating effective sample rate
0610c35
to
d4946d8
Compare
I've added the dictionary of files idea it seems pretty clear, although I still don't like having to specify the key twice :) I've dropped the I also found an off-by-one error in the calculation of effective sample-rate when skipping dejittering.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a couple of comments. Regarding a test for clock_resets.xdf
, let's add it later (not in this PR). I think it's OK having to write the key twice, it's very readable IMO.
@@ -375,13 +378,12 @@ def load_xdf( | |||
for stream in temp.values(): | |||
if len(stream.time_stamps) > 1: | |||
duration = stream.time_stamps[-1] - stream.time_stamps[0] | |||
stream.effective_srate = len(stream.time_stamps) / duration | |||
stream.effective_srate = (len(stream.time_stamps) - 1) / duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it wasn't correct before? Below, the - 1
is needed because it stores the start and stop indices of segments, but here we're calculating the duration. If you only have 1 sample, you will now always get an effective_srate
of 0
(previously it was 1 / duration
, which I think is the correct value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you only have 1 sample, the duration is 0, so wouldn't you end up with 1/0
in that case?
In my view you can only calculate a duration with two or more samples, and if len(stream.time_stamps) > 1
assures this.
Without -1
the effective sample rate in minimal
is calculated as 11.25 Hz but it should be 10 Hz, so I think the -1 is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool, I just push a change for the calculation of effective_srate
when dejitter_timestamps=True
as well.
This one was doing len(time_stamps) / (duration + tdiff)
, which is fine when time-stamps are perfect, but off otherwise.
Perfect clock
nominal_srate = 1
t = np.arange(0, 5, nominal_srate)
(t[-1] - t[0] + nominal_srate) / len(t)
=> 1.0
# Correct
nominal_srate = 1
t = np.arange(0, 5, nominal_srate)
(t[-1] - t[0]) / (len(t) - 1)
=> 1.0
# Correct
Fast clock
nominal_srate = 1
effective_srate = 1.1
t = np.arange(0, 5, effective_srate)
(t[-1] - t[0] + nominal_srate) / len(t)
=> 1.08
# Incorrect
nominal_srate = 1
effective_srate = 1.1
t = np.arange(0, 5, effective_srate)
(t[-1] - t[0]) / (len(t) - 1)
=> 1.1
# Correct
Slow clock
nominal_srate = 1
effective_srate = 0.9
t = np.arange(0, 4, effective_srate)
(t[-1] - t[0] + nominal_srate) / len(t)
=> 0.919
# Incorrect
nominal_srate = 1
effective_srate = 0.9
t = np.arange(0, 4, effective_srate)
(t[-1] - t[0]) / (len(t) - 1)
=> 0.9
# Correct
streams, header = load_xdf( | ||
path, | ||
synchronize_clocks=synchronize_clocks, | ||
dejitter_timestamps=dejitter_timestamps, | ||
jitter_break_threshold_seconds=0.001, | ||
jitter_break_threshold_samples=1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to load the file a second time? Can you adapt the test so that we only need to load it once at the very top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just how it was implemented in the original test, but yes it probably should be its own separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we keep it in this test, because we already have a parameterization of dejitter_timestamps
. However, we might have to adapt some tests, because this specific part seems to rely on the other two arguments set to these exact values.
If it's too complicated, then let's move it to a new test.
@pytest.mark.parametrize("dejitter_timestamps", [False, True]) | ||
@pytest.mark.parametrize("synchronize_clocks", [False, True]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you do not use these two parameters in your test. Depending on if this was intentional, I suggest removing these parameterizations or adding tests for these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters are used, they are passed to load_xdf
.
But I see what you mean I'm not explicitly testing that they change anything, but I think that was intentional for this data file, they shouldn't really have an effect but let me double check that.
Thanks for adding the missing skipif
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that in test_empty_streams
, you are not testing for the different values of dejitter_timestamps
and synchronize_clocks
. All tests are completely independent of whether these are set to True
or False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests are completely independent of whether these are set to True or False.
The test is now conditional on synchronize_clocks.
I've kept the parametrization of dejitter even though it has minimal effect on idealised time-stamps - that feels like a reassuring thing to validate to me, but also happy to remove it if you prefer.
Also, you are comparing float values using |
ca1a798
to
1f359c2
Compare
* This is now the same calculation as when dejitter_timestamps=False
1f359c2
to
0141684
Compare
* new empty_streams.xdf with correct sample_count * add float comparison tolerance levels * explicitly test effect of synchronize_clocks * dejittering has no significant effect because ground-truth time-stamps have zero jitter, but keep as a test parameter to show that dejittering does not cause problem in idealised cases
Initialise empty segment list in StreamData
Initialise empty stream segments when dejitter_timestamps=True
Update load data tests and add test for a file with an empty streams