Datasette serve should accept paths/URLs to CSVs and other file formats #123

simonw · 2017-11-19T02:05:48Z

This would remove the csvs-to-sqlite step which I end up using for almost everything.

I'm hesitant to introduce pandas as a required dependency though since it require compiling numpy. Could build it so this option is only available if you have pandas installed.

simonw · 2017-12-10T03:09:53Z

I'm going to keep this separate in csvs-to-sqlite.

simonw · 2019-03-15T14:45:46Z

I'm reopening this one as part of #417.

Further experience with Python's CSV standard library module has convinced me that pandas is not a required dependency for this. My sqlite-utils package can do most of the work here with very few dependencies.

simonw · 2019-03-15T15:09:15Z

How would Datasette accepting URLs work?

I want to support not just SQLite files and CSVs but other extensible formats (geojson, Atom, shapefiles etc) as well.

So datasette serve needs to be able to take filepaths or URLs to a variety of different content types.

If it's a URL, we can use the first 200 downloaded bytes to decide which type of file it is. This is likely more reliable than hoping the web server provided the correct content-type.

Also: let's have a threshold for downloading to disk. We will start downloading to a temp file (location controlled by an environment variable) if either the content length header is above that threshold OR we hit that much data cached in memory already and don't know how much more is still to come.

There needs to be a command line option for saying "grab from this URL but force treat it as CSV" - same thing for files on disk.

datasette mydb.db --type=db http://blah/blah --type=csv

If you provide less --type options thatn you did URLs then the default behavior is used for all of the subsequent URLs.

Auto detection could be tricky. Probably do this with a plugin hook.

https://github.com/h2non/filetype.py is interesting but deals with images video etc so not right for this purpose.

I think we need our own simple content sniffing code via a plugin hook.

What if two plugin type hooks can both potentially handle a sniffed file? The CLI can quit and return an error saying content is ambiguous and you need to specify a --type, picking from the following list.

obra · 2020-09-24T04:49:51Z

As a half-measure, I'd get value out of being able to upload a CSV and have datasette run csv-to-sqlite on it.

simonw · 2020-09-24T07:28:38Z

@obra there's a plugin for that! https://github.com/simonw/datasette-upload-csvs

obra · 2020-09-24T07:42:05Z

Oh. Awesome.

…

On Thu, Sep 24, 2020 at 12:28:53AM -0700, Simon Willison wrote: @obra there's a plugin for that! https://github.com/simonw/ datasette-upload-csvs â�� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.*

--

jsancho-gpl · 2020-11-29T19:12:30Z

datasette-connectors provides an API for making connectors for any file based database. For example, datasette-pytables is a connector for HDF5 files, so now is possible to use this type of files with Datasette.

It'd be nice if Datasette coud provide that API directly, for other file formats and for urls too.

RayBB · 2021-07-18T18:07:29Z

I also love the idea for this feature and wonder if it could work without having to download the whole database into memory at once if it's a rather large db. Obviously this could be slower but could have many use cases.

My comment is partially inspired by this post about streaming sqlite dbs from github pages or such
https://news.ycombinator.com/item?id=27016630

simonw · 2021-07-19T00:04:31Z

I've been thinking more about this one today too. An extension of this (touched on in #417, Datasette Library) would be to support pointing Datasette at a directory and having it automatically load any CSV files it finds anywhere in that folder or its descendants - either loading them fully, or providing a UI that allows users to select a file to open it in Datasette.

For larger files I think the right thing to do is import them into an on-disk SQLite database, which is limited only by available disk space. For smaller files loading them into an in-memory database should work fine.

simonw closed this as completed Dec 10, 2017

simonw added the wontfix label Dec 10, 2017

simonw reopened this Mar 15, 2019

simonw added feature medium large and removed wontfix medium labels Mar 15, 2019

simonw changed the title ~~Datasette should accept CSV file paths directly~~ Datasette serve should accept paths/URLs to CSVs and other file formats Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasette serve should accept paths/URLs to CSVs and other file formats #123

Datasette serve should accept paths/URLs to CSVs and other file formats #123

simonw commented Nov 19, 2017

simonw commented Dec 10, 2017

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019 •

edited

Loading

obra commented Sep 24, 2020

simonw commented Sep 24, 2020

obra commented Sep 24, 2020 via email

jsancho-gpl commented Nov 29, 2020

RayBB commented Jul 18, 2021

simonw commented Jul 19, 2021

Datasette serve should accept paths/URLs to CSVs and other file formats #123

Datasette serve should accept paths/URLs to CSVs and other file formats #123

Comments

simonw commented Nov 19, 2017

simonw commented Dec 10, 2017

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019 • edited Loading

obra commented Sep 24, 2020

simonw commented Sep 24, 2020

obra commented Sep 24, 2020 via email

jsancho-gpl commented Nov 29, 2020

RayBB commented Jul 18, 2021

simonw commented Jul 19, 2021

simonw commented Mar 15, 2019 •

edited

Loading