Ensure empty stream segments are initialised #129

jamieforth · 2025-01-26T12:35:47Z

Initialise empty segment list in StreamData
Initialise empty stream segments when dejitter_timestamps=True
Update load data tests and add test for a file with an empty streams

* Initialise empty segment list in StreamData * Initialise empty stream segments when dejitter_timestamps=True * Update load data tests and add test for a file with an empty streams

cbrnr

Perfect, thank you! Just a small question/suggestion, you are now checking if a test file exists, and if not you're skipping the test, which is really useful. However, a @pytest.mark.skipif marker would probably be even more visible.

cbrnr · 2025-01-27T11:16:21Z

Also, I saw that you made several PRs (great!), can we go one by one to make reviewing and merging easier? I suggest that we start with this one, and then you tell me which one to do next.

jamieforth · 2025-01-28T18:20:33Z

a @pytest.mark.skipif marker would probably be even more visible

The best I could come up with is to use a fixture.

@pytest.mark.skipif evaluates the condition at import time, which seems to end up requiring multiple hard-coded paths.

@pytest.mark.skipif(
    not Path("example-files/minimal.xdf").exists(),
    reason=f"File not available.",
)
def test_minimal_file():
    path = Path("example-files/minimal.xdf")
    ...

So the fixture seems like a better option to me?

jamieforth · 2025-01-28T18:31:42Z

Also, I saw that you made several PRs (great!), can we go one by one to make reviewing and merging easier? I suggest that we start with this one, and then you tell me which one to do next.

For sure. I've marked the others as draft for now. They are basically in sequential order - each PR builds on the previous. I've tried to break them up into separate issues to give you a heads up on where I was trying to get to, and yeah was thinking the same, take one at a time and rebase as we go!

cbrnr · 2025-01-29T06:55:46Z

The best I could come up with is to use a fixture.

@pytest.mark.skipif evaluates the condition at import time, which seems to end up requiring multiple hard-coded paths.

So the fixture seems like a better option to me?

Yes, I think it's a good option. However, to avoid having to duplicate file paths, you could define a path for each test file globally (since we seem to have exactly one test per file instead of a test using several different files). Something like

path_minimal = Path("example-files/minimal.xdf")

@pytest.mark.skipif(not path_minimal.exists(), reason="File not found.")
def test_minimal_file():
    streams, header = load_xdf(path_minimal)
    ...

Actually, we already have a list of available testing files as files, so maybe we could use this list? Maybe we could explicitly add individual testing files (instead of wildcarding) and use a dictionary, which would make it more reusable?

files = {
    key: path / value
    for key, value in {
        "minimal": "minimal.xdf",
        "clock_resets": "data_with_clock_resets.xdf",
        "empty_streams": "data_with_empty_streams.xdf",
    }.items()
    if (path / value).exists()
}

@pytest.mark.skipif("minimal" not in files, reason="File not found.")
def test_minimal_file():
    streams, header = load_xdf(files["minimal"])
    ...

I think I prefer the last option, WDYT?

While we're at it, I think the testing files should not start with data_with_ and just be named clock_resets.xdf and empty_streams.xdf.

* parameterize synchronize_clocks - True: pass - False: pass * parameterize dejitter_timestamps - True: pass - False: error calculating effective sample rate

jamieforth · 2025-01-29T21:31:27Z

I've added the dictionary of files idea it seems pretty clear, although I still don't like having to specify the key twice :)

I've dropped the data_with_ file names.

I also found an off-by-one error in the calculation of effective sample-rate when skipping dejittering.

clock_resets.xdf still needs a test. Do you want to add one, or we could wait until we get to #131?

cbrnr

I still have a couple of comments. Regarding a test for clock_resets.xdf, let's add it later (not in this PR). I think it's OK having to write the key twice, it's very readable IMO.

cbrnr · 2025-01-30T07:28:31Z

src/pyxdf/pyxdf.py

@@ -375,13 +378,12 @@ def load_xdf(
        for stream in temp.values():
            if len(stream.time_stamps) > 1:
                duration = stream.time_stamps[-1] - stream.time_stamps[0]
-                stream.effective_srate = len(stream.time_stamps) / duration
+                stream.effective_srate = (len(stream.time_stamps) - 1) / duration


Are you sure it wasn't correct before? Below, the - 1 is needed because it stores the start and stop indices of segments, but here we're calculating the duration. If you only have 1 sample, you will now always get an effective_srate of 0 (previously it was 1 / duration, which I think is the correct value).

If you only have 1 sample, the duration is 0, so wouldn't you end up with 1/0 in that case?

In my view you can only calculate a duration with two or more samples, and if len(stream.time_stamps) > 1 assures this.

Without -1 the effective sample rate in minimal is calculated as 11.25 Hz but it should be 10 Hz, so I think the -1 is correct.

Yes, you are right!

OK cool, I just push a change for the calculation of effective_srate when dejitter_timestamps=True as well.

This one was doing len(time_stamps) / (duration + tdiff), which is fine when time-stamps are perfect, but off otherwise.

Perfect clock

nominal_srate = 1 t = np.arange(0, 5, nominal_srate) (t[-1] - t[0] + nominal_srate) / len(t)

=> 1.0 # Correct

nominal_srate = 1 t = np.arange(0, 5, nominal_srate) (t[-1] - t[0]) / (len(t) - 1)

=> 1.0 # Correct

Fast clock

nominal_srate = 1 effective_srate = 1.1 t = np.arange(0, 5, effective_srate) (t[-1] - t[0] + nominal_srate) / len(t)

=> 1.08 # Incorrect

nominal_srate = 1 effective_srate = 1.1 t = np.arange(0, 5, effective_srate) (t[-1] - t[0]) / (len(t) - 1)

=> 1.1 # Correct

Slow clock

nominal_srate = 1 effective_srate = 0.9 t = np.arange(0, 4, effective_srate) (t[-1] - t[0] + nominal_srate) / len(t)

=> 0.919 # Incorrect

nominal_srate = 1 effective_srate = 0.9 t = np.arange(0, 4, effective_srate) (t[-1] - t[0]) / (len(t) - 1)

=> 0.9 # Correct

src/pyxdf/pyxdf.py

cbrnr · 2025-01-30T07:30:21Z

test/test_data.py

+    streams, header = load_xdf(
+        path,
+        synchronize_clocks=synchronize_clocks,
+        dejitter_timestamps=dejitter_timestamps,
+        jitter_break_threshold_seconds=0.001,
+        jitter_break_threshold_samples=1,
+    )


Is it necessary to load the file a second time? Can you adapt the test so that we only need to load it once at the very top?

That was just how it was implemented in the original test, but yes it probably should be its own separate test.

Or we keep it in this test, because we already have a parameterization of dejitter_timestamps. However, we might have to adapt some tests, because this specific part seems to rely on the other two arguments set to these exact values.

If it's too complicated, then let's move it to a new test.

cbrnr · 2025-01-30T07:31:34Z

test/test_data.py

+@pytest.mark.parametrize("dejitter_timestamps", [False, True])
+@pytest.mark.parametrize("synchronize_clocks", [False, True])


It seems like you do not use these two parameters in your test. Depending on if this was intentional, I suggest removing these parameterizations or adding tests for these settings.

These parameters are used, they are passed to load_xdf.

But I see what you mean I'm not explicitly testing that they change anything, but I think that was intentional for this data file, they shouldn't really have an effect but let me double check that.

Thanks for adding the missing skipif!

What I meant is that in test_empty_streams, you are not testing for the different values of dejitter_timestamps and synchronize_clocks. All tests are completely independent of whether these are set to True or False.

All tests are completely independent of whether these are set to True or False.

The test is now conditional on synchronize_clocks.

I've kept the parametrization of dejitter even though it has minimal effect on idealised time-stamps - that feels like a reassuring thing to validate to me, but also happy to remove it if you prefer.

cbrnr · 2025-01-30T07:40:55Z

Also, you are comparing float values using np.testing.asert_equal(). We'll see how this goes once the test file is available, but my assumption is that is might fail due to floating point precision. Therefore, it might be necessary to change these comparisons to np.testing.assert_approx_equal() or np.testing.assert_almost_equal().

* This is now the same calculation as when dejitter_timestamps=False

* new empty_streams.xdf with correct sample_count * add float comparison tolerance levels * explicitly test effect of synchronize_clocks * dejittering has no significant effect because ground-truth time-stamps have zero jitter, but keep as a test parameter to show that dejittering does not cause problem in idealised cases

Ensure empty stream segments are initialised

1e59368

* Initialise empty segment list in StreamData * Initialise empty stream segments when dejitter_timestamps=True * Update load data tests and add test for a file with an empty streams

cbrnr reviewed Jan 27, 2025

View reviewed changes

Refactor file loading tests to use skip_if_file_not_found fixture

4dd292a

jamieforth added 2 commits January 29, 2025 18:21

Refactor file loading tests to use a global dictionary of paths

3a472ca

Update test file names: Remove data_with_ prefix

e0304d9

jamieforth force-pushed the fix/expose-segments branch from cffacae to 0610c35 Compare January 29, 2025 21:04

jamieforth added 2 commits January 29, 2025 21:15

More systematically test file data

84339da

* parameterize synchronize_clocks - True: pass - False: pass * parameterize dejitter_timestamps - True: pass - False: error calculating effective sample rate

Fix effective sample rate when dejitter_timestamps=False

d4946d8

jamieforth force-pushed the fix/expose-segments branch from 0610c35 to d4946d8 Compare January 29, 2025 21:15

cbrnr added 2 commits January 30, 2025 08:18

Add skipif

401ee45

Cosmetics

387e813

cbrnr reviewed Jan 30, 2025

View reviewed changes

jamieforth force-pushed the fix/expose-segments branch from ca1a798 to 1f359c2 Compare January 31, 2025 01:31

Change effective_srate when dejitter_timestamps=True

0141684

* This is now the same calculation as when dejitter_timestamps=False

jamieforth force-pushed the fix/expose-segments branch from 1f359c2 to 0141684 Compare January 31, 2025 01:32

jamieforth added 2 commits January 31, 2025 01:32

Format comment

e5e2f13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure empty stream segments are initialised #129

Ensure empty stream segments are initialised #129

jamieforth commented Jan 26, 2025

cbrnr left a comment

cbrnr commented Jan 27, 2025

jamieforth commented Jan 28, 2025

jamieforth commented Jan 28, 2025

cbrnr commented Jan 29, 2025 •

edited

Loading

jamieforth commented Jan 29, 2025

cbrnr left a comment

cbrnr Jan 30, 2025

jamieforth Jan 30, 2025

cbrnr Jan 30, 2025

jamieforth Jan 30, 2025

cbrnr Jan 30, 2025

jamieforth Jan 30, 2025 •

edited

Loading

cbrnr Jan 30, 2025

cbrnr Jan 30, 2025

jamieforth Jan 30, 2025 •

edited

Loading

cbrnr Jan 30, 2025

jamieforth Jan 31, 2025

cbrnr commented Jan 30, 2025

		@pytest.mark.parametrize("dejitter_timestamps", [False, True])
		@pytest.mark.parametrize("synchronize_clocks", [False, True])

Ensure empty stream segments are initialised #129

Are you sure you want to change the base?

Ensure empty stream segments are initialised #129

Conversation

jamieforth commented Jan 26, 2025

cbrnr left a comment

Choose a reason for hiding this comment

cbrnr commented Jan 27, 2025

jamieforth commented Jan 28, 2025

jamieforth commented Jan 28, 2025

cbrnr commented Jan 29, 2025 • edited Loading

jamieforth commented Jan 29, 2025

cbrnr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Perfect clock

Fast clock

Slow clock

Choose a reason for hiding this comment

jamieforth Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamieforth Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbrnr commented Jan 30, 2025

cbrnr commented Jan 29, 2025 •

edited

Loading

jamieforth Jan 30, 2025 •

edited

Loading

jamieforth Jan 30, 2025 •

edited

Loading