Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure empty stream segments are initialised #129

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

jamieforth
Copy link
Contributor

  • Initialise empty segment list in StreamData

  • Initialise empty stream segments when dejitter_timestamps=True

  • Update load data tests and add test for a file with an empty streams

* Initialise empty segment list in StreamData

* Initialise empty stream segments when dejitter_timestamps=True

* Update load data tests and add test for a file with an empty streams
Copy link
Contributor

@cbrnr cbrnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you! Just a small question/suggestion, you are now checking if a test file exists, and if not you're skipping the test, which is really useful. However, a @pytest.mark.skipif marker would probably be even more visible.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 27, 2025

Also, I saw that you made several PRs (great!), can we go one by one to make reviewing and merging easier? I suggest that we start with this one, and then you tell me which one to do next.

@jamieforth
Copy link
Contributor Author

a @pytest.mark.skipif marker would probably be even more visible

The best I could come up with is to use a fixture.

@pytest.mark.skipif evaluates the condition at import time, which seems to end up requiring multiple hard-coded paths.

@pytest.mark.skipif(
    not Path("example-files/minimal.xdf").exists(),
    reason=f"File not available.",
)
def test_minimal_file():
    path = Path("example-files/minimal.xdf")
    ...

So the fixture seems like a better option to me?

@jamieforth
Copy link
Contributor Author

Also, I saw that you made several PRs (great!), can we go one by one to make reviewing and merging easier? I suggest that we start with this one, and then you tell me which one to do next.

For sure. I've marked the others as draft for now. They are basically in sequential order - each PR builds on the previous. I've tried to break them up into separate issues to give you a heads up on where I was trying to get to, and yeah was thinking the same, take one at a time and rebase as we go!

@cbrnr
Copy link
Contributor

cbrnr commented Jan 29, 2025

The best I could come up with is to use a fixture.

@pytest.mark.skipif evaluates the condition at import time, which seems to end up requiring multiple hard-coded paths.

So the fixture seems like a better option to me?

Yes, I think it's a good option. However, to avoid having to duplicate file paths, you could define a path for each test file globally (since we seem to have exactly one test per file instead of a test using several different files). Something like

path_minimal = Path("example-files/minimal.xdf")

@pytest.mark.skipif(not path_minimal.exists(), reason="File not found.")
def test_minimal_file():
    streams, header = load_xdf(path_minimal)
    ...

Actually, we already have a list of available testing files as files, so maybe we could use this list? Maybe we could explicitly add individual testing files (instead of wildcarding) and use a dictionary, which would make it more reusable?

files = {
    key: path / value
    for key, value in {
        "minimal": "minimal.xdf",
        "clock_resets": "data_with_clock_resets.xdf",
        "empty_streams": "data_with_empty_streams.xdf",
    }.items()
    if (path / value).exists()
}

@pytest.mark.skipif("minimal" not in files, reason="File not found.")
def test_minimal_file():
    streams, header = load_xdf(files["minimal"])
    ...

I think I prefer the last option, WDYT?

While we're at it, I think the testing files should not start with data_with_ and just be named clock_resets.xdf and empty_streams.xdf.

* parameterize synchronize_clocks
  - True: pass
  - False: pass

* parameterize dejitter_timestamps
  - True: pass
  - False: error calculating effective sample rate
@jamieforth
Copy link
Contributor Author

I've added the dictionary of files idea it seems pretty clear, although I still don't like having to specify the key twice :)

I've dropped the data_with_ file names.

I also found an off-by-one error in the calculation of effective sample-rate when skipping dejittering.

clock_resets.xdf still needs a test. Do you want to add one, or we could wait until we get to #131?

Copy link
Contributor

@cbrnr cbrnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a couple of comments. Regarding a test for clock_resets.xdf, let's add it later (not in this PR). I think it's OK having to write the key twice, it's very readable IMO.

@@ -375,13 +378,12 @@ def load_xdf(
for stream in temp.values():
if len(stream.time_stamps) > 1:
duration = stream.time_stamps[-1] - stream.time_stamps[0]
stream.effective_srate = len(stream.time_stamps) / duration
stream.effective_srate = (len(stream.time_stamps) - 1) / duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it wasn't correct before? Below, the - 1 is needed because it stores the start and stop indices of segments, but here we're calculating the duration. If you only have 1 sample, you will now always get an effective_srate of 0 (previously it was 1 / duration, which I think is the correct value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only have 1 sample, the duration is 0, so wouldn't you end up with 1/0 in that case?

In my view you can only calculate a duration with two or more samples, and if len(stream.time_stamps) > 1 assures this.

Without -1 the effective sample rate in minimal is calculated as 11.25 Hz but it should be 10 Hz, so I think the -1 is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool, I just push a change for the calculation of effective_srate when dejitter_timestamps=True as well.

This one was doing len(time_stamps) / (duration + tdiff), which is fine when time-stamps are perfect, but off otherwise.

Perfect clock

nominal_srate = 1
t = np.arange(0, 5, nominal_srate)
(t[-1] - t[0] + nominal_srate) / len(t)

=> 1.0 # Correct

nominal_srate = 1
t = np.arange(0, 5, nominal_srate)
(t[-1] - t[0]) / (len(t) - 1)

=> 1.0 # Correct

Fast clock

nominal_srate = 1
effective_srate = 1.1
t = np.arange(0, 5, effective_srate)
(t[-1] - t[0] + nominal_srate) / len(t)

=> 1.08 # Incorrect

nominal_srate = 1
effective_srate = 1.1
t = np.arange(0, 5, effective_srate)
(t[-1] - t[0]) / (len(t) - 1)

=> 1.1 # Correct

Slow clock

nominal_srate = 1
effective_srate = 0.9
t = np.arange(0, 4, effective_srate)
(t[-1] - t[0] + nominal_srate) / len(t)

=> 0.919 # Incorrect

nominal_srate = 1
effective_srate = 0.9
t = np.arange(0, 4, effective_srate)
(t[-1] - t[0]) / (len(t) - 1)

=> 0.9 # Correct

src/pyxdf/pyxdf.py Outdated Show resolved Hide resolved
Comment on lines +142 to +148
streams, header = load_xdf(
path,
synchronize_clocks=synchronize_clocks,
dejitter_timestamps=dejitter_timestamps,
jitter_break_threshold_seconds=0.001,
jitter_break_threshold_samples=1,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to load the file a second time? Can you adapt the test so that we only need to load it once at the very top?

Copy link
Contributor Author

@jamieforth jamieforth Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just how it was implemented in the original test, but yes it probably should be its own separate test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we keep it in this test, because we already have a parameterization of dejitter_timestamps. However, we might have to adapt some tests, because this specific part seems to rely on the other two arguments set to these exact values.

If it's too complicated, then let's move it to a new test.

Comment on lines +155 to +156
@pytest.mark.parametrize("dejitter_timestamps", [False, True])
@pytest.mark.parametrize("synchronize_clocks", [False, True])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you do not use these two parameters in your test. Depending on if this was intentional, I suggest removing these parameterizations or adding tests for these settings.

Copy link
Contributor Author

@jamieforth jamieforth Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters are used, they are passed to load_xdf.

But I see what you mean I'm not explicitly testing that they change anything, but I think that was intentional for this data file, they shouldn't really have an effect but let me double check that.

Thanks for adding the missing skipif!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that in test_empty_streams, you are not testing for the different values of dejitter_timestamps and synchronize_clocks. All tests are completely independent of whether these are set to True or False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests are completely independent of whether these are set to True or False.

The test is now conditional on synchronize_clocks.

I've kept the parametrization of dejitter even though it has minimal effect on idealised time-stamps - that feels like a reassuring thing to validate to me, but also happy to remove it if you prefer.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 30, 2025

Also, you are comparing float values using np.testing.asert_equal(). We'll see how this goes once the test file is available, but my assumption is that is might fail due to floating point precision. Therefore, it might be necessary to change these comparisons to np.testing.assert_approx_equal() or np.testing.assert_almost_equal().

* This is now the same calculation as when dejitter_timestamps=False
* new empty_streams.xdf with correct sample_count

* add float comparison tolerance levels

* explicitly test effect of synchronize_clocks

* dejittering has no significant effect because ground-truth
  time-stamps have zero jitter, but keep as a test parameter to show
  that dejittering does not cause problem in idealised cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants