-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map out the workflow from data collection to interactive and reproducible data publication #26
Comments
Thanks — I've added the lightning talks tag. Since this does go deeper and has potential for becoming the focus of a team at the event, I also added the idea tag. |
Sounds good! Some resources that might be useful for work on this:
-- raw dataset Zenodo or Kaggle More general info on the project these are all results from: 101 Innovations in Scholarly Communication Can't really contribute today/tomorrow due to other commitments, but will try to check in at some point! |
Thanks Bianca - super helpful! |
I'm really interested in this! Been wanting to make some pathways for the openly-available climate data (https://www.esrl.noaa.gov/gmd/ccgg/trends/) to be turned into takeaway arguments/points about climate change, hopefully to remove the "us versus them" mentality of "scientists say this, so you should/shouldn't trust it blindly". One issue that comes up with things like Jupyter etc is differentiating the way computers need to be spoken to and what the audience needs for transparency. One possible connection is the ideas here: http://worrydream.com/#!/LearnableProgramming |
I am currently collecting data that I want to publish effectively, so I guess I'm a user of this project? Is that helpful? |
@edsaperia Yes, user insight will be crucial. Do you know of any tools already that you would use to illustrate your data collection and analysis steps, so that when you come to publish, readers can see what you've done and perhaps give it a go themselves? Alternatively, would you like to play with any of the tools listed at https://github.com/sparcopen/open-research-doathon/blob/master/reproducible_open_data_resources.md and see what you think of them? Would they work for you? Why (not)? |
@goodwingibbins This is totally on point. The trouble with lots of these tools is that they are geared towards the programmer-user. If there's a way to adapt them for people less comfortable with the lingo, that would be very useful. To start:
|
@edsaperia's resources:
For example, there are academics who produce/research these methodologies, e.g. LSE's department of methodology (for social sciences) http://www.lse.ac.uk/methodology/Home.aspx |
One issue that we have identified that distances the final data visualisation from the 'core' data set is the retrospective way data management may kick in at the publication stage, attempting to make sense of essentially unstructured data at the end of the process.
|
Thanks @npscience for pointing out Vega, didn't know that one! @Daniel-Mietchen: with the Wikimedia Graph extension, another one to add to the Wikimedia workflow? (referencing a separate discussion yesterday) Further responding to/building on @npscience comment above:
|
@bmkramer In terms of workflows, I would not add individual MediaWiki extensions, just mention that there are thousands of them that together cover all aspects of many research cycles. I had briefly mentioned (but not shown) one of them yesterday: https://www.mediawiki.org/wiki/Extension:Jmol . |
@gpa-smith Some thoughts on this, following your 3 questions:
|
@bmkramer - Thank you, I'll explore these new tools and standards. All - is it worth creating a map of this space? Or an ideal workflow to see where the gaps remain? |
@bmkramer - ISA framework is an interesting one; the journal Scientific Data uses ISA-Tab to generate the structured side of metadata for its Data Descriptor articles, which are focused around datasets as opposed to traditional articles that have data submitted as supporting material. The ability to create machine readable metadata for other article types at earlier stages, or at least to feed into something like the ISA-framework at an end point would be beneficial. We have talked about a similar process for integration between something simple like an excel spreadsheet feeding into a JSON solution like Vega. The early collaborative working space is a useful area to look at developing, for example https://github.com/jupyter/colaboratory, Google drive integration |
Ok. Tomorrow's tasks, for me at least (feel free to add):
|
Does the http://www.nltk.org/ fit in here? I agree with Bianca that easy tools for the non-coder are essential. Tableau Community is nice but a suite of open source tools is ideal. I just learning Python because it's popular in humanities and social science (along with R) but it will be a long time before being able to do anything useful with it. :) |
One other aspect of this as a workflow is integration in the writing process. Overleaf and Authorea both (in varying aspects) integrate with Jupyter notebooks, for example, and Authorea works with git-based versioning. Integration with such a workflow would also allow publishers to stimulate/facilitate reproducible reporting, while not tying that aspect of manuscript preparation/submission to a locked-in, proprietary system*. With preprint services offering similar integrations, focus could be more on publications themselves than on publication venue. Back to workflows, I also like Kieran Healy's take on the difference between the 'office based' and the 'engineering model' http://plain-text.co *Elsevier at some point piloted executable papers (again, for computer science only), but then dropped the pilot: https://www.elsevier.com/physical-sciences/computer-science/executable-papers-improving-the-article-format-in-computer-science |
Check out the Data Stack at https://blog.liip.ch/archive/2017/02/13/data-stack.html Tools to consider:
|
Tasks: |
Hi,
You have inadvertently cc'd me (Brian Kramer [email protected]) on this thread.
…-b
Sunday, March 5, 2017, 06:19 -0500 from Bianca Kramer <[email protected]>:
One other aspect of this as a workflow is integration in the writing process. Overleaf and Authorea both (in varying aspects) integrate with Jupyter notebooks, for example, and Authorea works with git-based versioning.
Integration with such a workflow would also allow publishers to stimulate/facilitate reproducible reporting, while not tying that aspect of manuscript preparation/submission to a locked-in, proprietary system*. With preprint services offering similar integrations, focus could be more on publications themselves than on publication venue.
Back to workflows, I also like Kieran Healy's take on the difference between the 'office based' and the 'engineering model' http://plain-text.co
*Elsevier at some point piloted executable papers (again, for computer science only), but then dropped the pilot: https://www.elsevier.com/physical-sciences/computer-science/executable-papers-improving-the-article-format-in-computer-science
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .
|
What data analysis tools for life scientists use? Source data: 101innovations Aim: Methods:
Results:
|
Side comments: Plot.ly is not easy to use. And slow. |
6 high level workflows for publishing data in more traditional journals: |
@HKLondon Great! Here's the url for the workflow diagram [WIP] https://www.draw.io/#G0B_a2JekZMrW8Z0xRc3hoR1Baczg |
Whilst most academic publishers can link to data it seems like very few academic publishers can (easily) publish interactive data sets like the OECD: http://stats.oecd.org/index.aspx?DataSetCode=PDB_LV or integrate interactive figures within the HTML versions of articles (many more examples of interactive PDFs see https://peerj.com/preprints/1594.pdf. Some examples: 3D visualization (Elsevier) Might be interesting to survey publishers to find out what the stumbling blocks are... publishers slow to change, few researchers wanting to publish interactive items, complexity of managing these items through submission process, tagging issues in article XML/JATS files, problems with platform integrations, long term archiving issues - including problems with submission of files to PubMed Central, etc... |
Remembered this a bit too late, but there was a RDA working group on publishing workflows, but it might be a bit too generic compared to what you're interested in. http://doi.org/10.5281/zenodo.20308 |
Outstanding:
--> this is happening in my repo at: npscience#2 Notes: really difficult for a novice to start using any of the above tools for visualisation.... |
@npscience your comments about plotly befuddle me! In my experience plot.ly was great to go from data to interactive, configurable visualisations with rapidity. Especially for data layouts I wasn't familiar with e.g. choropleth maps. Ultimately I didn't find it quite had capability to do all the complicated fiddling necessary for "publication quality" figures - I had to dive back into R and do it 'the hard way'. But for quick, interactive, exploratory data analysis i still find plotly very easy to use - definitely here to stay in my playbook. |
@rossmounce noted, the more opinions the better, so thanks for chiming in. I think there's a huge gap in our literacy here; but I'm on the upward learning curve. |
@npscience having said all that I haven't tried Tableau so maybe Tableau or other such services are even better than plot.ly, but from a standpoint of a user with experience of spreadsheet software, R and plot.ly (admittedly limited experience of the wide breadth of available options!) I can definitely see plot.ly & web services like it have a niche / use-case. If R is one's base reference (as is the case for many biologists?) almost anything else is going to be "easier" & "quicker" ! |
I'd like to offer a lightning talk on an idea for how a journal could display open research data in a more engaging and useful manner for researchers, including a showcase of a few known tools already available to bring this idea to light. The aim is to inspire attendees to answer/address the following questions, perhaps collating resources in a wiki:
The text was updated successfully, but these errors were encountered: