Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] added LITE version (deactivate TimescaleDB and/or Postgis usage) #29

Merged
merged 8 commits into from
Mar 30, 2022

Conversation

BenediktAllendorf
Copy link
Contributor

Closes #20

Documentation is still missing because there might architectural changes (like renaming config parameters), but I'll add this after approval. ;)

There are four main things about this PR:

  1. The model has been adjusted so that the location column (Postgis) can be dynamically activated (and is ignored otherwise).
  2. Two config parameters have been added: setup_timescaledb and setup_postgis (default: True).
  3. Depending on these parameters (and the database's abilities) _setup_connection() creates the table accordingly: either with the location column (Postgis) or without and either as hyptertable or not.
  4. Some basic test cases have been added for this (requires docker for spinning up the different databases)

Both parameters are only considered when running the first time (i.e., when the table is created). Afterwards, they are ignored (and could be removed from the config file). Each time when connecting, it is independently determined whether the table is a hyptertable and whether the location column is available. E.g., this means, that one could create LTSS without the location column and later on add Postgis to the database (for another use-case) without any problems.

@freol35241
Copy link
Owner

@BenediktAllendorf Thank you for this 👍 I will have a look at this as soon as I can.

Copy link
Owner

@freol35241 freol35241 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BenediktAllendorf, thank you very much for this PR! 👍 And sorry for being a bit slow to get to the review...

I think we need to discuss one fundamental thing here to agree on the best way ahead:

Do we need the configuration options (i.e. setup_postgis and setup_timescaledb) or could we just configure the table automagically according to the available extensions?

I.e., use a hypertable if timescaledb is available and use a location column if postgis is available?

What is you opinion on this? For me, personally, this approach would be fine and I believe it would simplify the setup-logic a bit. But there might be cases where the user want more fine-grained control? Maybe? Or should we keep the configuration options but only as "opt-outs": use_timescaldedb_if_available and use_postgis_if_available (both defaulting to True). Your opinion here would be valuable!

Apart from that:

  • Very nice with the tests! 🥇 I would like to see those being run as part of a CI pipeline (i.e. Github Actions). Do you want to take a stab at this? Otherwise I will set it up as part of another PR when this is merged.
  • The logic for setting up the db/table is quite complicated and I have a hard time understanding if there might be any edge cases in combination with the check_and_migrate-function. Have you given this any thought?

Apart from the above, I only have a single in-code question/comment.

@BenediktAllendorf
Copy link
Contributor Author

I like your suggestion and adjusted the code correspondingly. The complexity is nicely reduced by discarding the options and just relying on what the database offers. I refactored the methods a bit to make it easier to understand what is happening.

I thought about the opt-outs and I don't think it's worth it. I couldn't come up with specific use cases and even if they exist, I don't think they need to be officially supported. If one has such special cases, that needs to be handled manually. (For example, one could create the table with PostGIS, then delete that column and everything still would work fine).

I'm not sure if there should be any log information (a warning?) when setting up LTSS without the extensions. Do you think that is necessary/useful? Or should it "just work"?

About Github automation, I can have a look into that but I would do it independently from this PR.

I can't see any conflicts with check_and_migrate(). The setup only changes for new installations and for old ones, the migration does not touch the location column nor the hyptertable.

Copy link
Owner

@freol35241 freol35241 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated PR. I like the simplified logic but I dont fully agree with your refactoring of the methods. I would like to see that we stick with a _setup_connection method and a _create_table method.

Agree, lets skip the opt-outs.

Yes, logging something on the INFO level would be nice!

I agree regarding Github Actions integration. Lets do that separately.

@BenediktAllendorf
Copy link
Contributor Author

I've incorporated all your feedback, it's more similar to the first version now.

Not sure about the timescaledb_information.hypertables-query - I think to leverage SqlAlchemy here, we would need to create a model/mapper for that View and that seems a bit overkill (?). If you say what is happening is understandable, I guess it's fine this way.

Copy link
Owner

@freol35241 freol35241 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenediktAllendorf .

I finally found the time to try this out (sorry for the delay...) and found one thing that needs to be handled... See the comment below regarding the query to find out whether the table is a hypertable or not.

@freol35241
Copy link
Owner

freol35241 commented Mar 28, 2022

@BenediktAllendorf This is starting to look good I think, but I would like a small change to your latest fix for more timescaledb versions (I know, getting a bit detailed here maybe). But I very much prefer a more pythonic approach using try-except clauses (which uses the stable(?) api of TimescaleDB) rather than the if-else chain with different queries depending on versions. See attached patch-file (apparently it is not possible to fork a fork of a repo you own...?) for my suggestion, I also added a few logging statements throughout the code.

If you apply this patch to your branch to update the code and push it again and include some more documentation in the readme reflecting your changes I will happily accept this PR 👍

0001-Moving-to-a-more-pythonic-approach-with-a-try-except.zip

@BenediktAllendorf
Copy link
Contributor Author

I've applied the patch and added some documentation. Might need some fine adjustments though.

Right now, I tried to keep the documentation as similar to before as possible, nudging the user to a full-version (with both extensions). The information, that one can use LTSS without one (or both) is there, but somewhat hidden. Of course, this could be done the other way around (like "you only need Postgres but if you require Geo-data you should also use PostGIS"). What do you think?

Additionally, but not directly related/limited to this PR: I think in the part about the required superuser privileges it can be added, that the extensions could be enabled beforehand and thus, no superuser is required for LTSS (not even the initial setup).

Copy link
Owner

@freol35241 freol35241 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, the docs works for me! Thanks for your work with this PR 🥇

I will merge this and bump the version number accordingly (i.e. 1.0.0 to 1.1.0)

fyi, I also added #30 if you have some time to spare, otherwise I plan to get to it quite soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: "LITE" version setting - without Timescale and GIS, just pure PostgreSQL
2 participants