-
-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasette serve should accept paths/URLs to CSVs and other file formats #123
Comments
I'm going to keep this separate in csvs-to-sqlite. |
I'm reopening this one as part of #417. Further experience with Python's CSV standard library module has convinced me that pandas is not a required dependency for this. My sqlite-utils package can do most of the work here with very few dependencies. |
How would Datasette accepting URLs work? I want to support not just SQLite files and CSVs but other extensible formats (geojson, Atom, shapefiles etc) as well. So If it's a URL, we can use the first 200 downloaded bytes to decide which type of file it is. This is likely more reliable than hoping the web server provided the correct content-type. Also: let's have a threshold for downloading to disk. We will start downloading to a temp file (location controlled by an environment variable) if either the content length header is above that threshold OR we hit that much data cached in memory already and don't know how much more is still to come. There needs to be a command line option for saying "grab from this URL but force treat it as CSV" - same thing for files on disk.
If you provide less Auto detection could be tricky. Probably do this with a plugin hook. https://github.com/h2non/filetype.py is interesting but deals with images video etc so not right for this purpose. I think we need our own simple content sniffing code via a plugin hook. What if two plugin type hooks can both potentially handle a sniffed file? The CLI can quit and return an error saying content is ambiguous and you need to specify a |
As a half-measure, I'd get value out of being able to upload a CSV and have datasette run csv-to-sqlite on it. |
@obra there's a plugin for that! https://github.com/simonw/datasette-upload-csvs |
Oh. Awesome.
…On Thu, Sep 24, 2020 at 12:28:53AM -0700, Simon Willison wrote:
@obra there's a plugin for that! https://github.com/simonw/
datasette-upload-csvs
�
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.*
--
|
datasette-connectors provides an API for making connectors for any file based database. For example, datasette-pytables is a connector for HDF5 files, so now is possible to use this type of files with Datasette. It'd be nice if Datasette coud provide that API directly, for other file formats and for urls too. |
I also love the idea for this feature and wonder if it could work without having to download the whole database into memory at once if it's a rather large db. Obviously this could be slower but could have many use cases. My comment is partially inspired by this post about streaming sqlite dbs from github pages or such |
I've been thinking more about this one today too. An extension of this (touched on in #417, Datasette Library) would be to support pointing Datasette at a directory and having it automatically load any CSV files it finds anywhere in that folder or its descendants - either loading them fully, or providing a UI that allows users to select a file to open it in Datasette. For larger files I think the right thing to do is import them into an on-disk SQLite database, which is limited only by available disk space. For smaller files loading them into an in-memory database should work fine. |
This would remove the csvs-to-sqlite step which I end up using for almost everything.
I'm hesitant to introduce pandas as a required dependency though since it require compiling numpy. Could build it so this option is only available if you have pandas installed.
The text was updated successfully, but these errors were encountered: