Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up files #1233

Merged
merged 23 commits into from
Mar 8, 2022
Merged

Tidy up files #1233

merged 23 commits into from
Mar 8, 2022

Conversation

gcapes
Copy link
Contributor

@gcapes gcapes commented Sep 7, 2021

An initial stab at #1225.

@gcapes
Copy link
Contributor Author

gcapes commented Sep 7, 2021

This will want checking for errors, but also I'd like to discuss the best solution to some of the extra complexity the data and scripts directory has introduced.

Episode 5 has only a small modification to access the goostats.sh script, which is now ../scripts/goostats.sh.
Episode 6 starts to get complicated, which is probably why there hasn't been a separation of data and scripts yet.

If we were to be doing this IRL, we'd add the scripts to PATH so they could be easily accessed, given the script we write calls the other we already have (goostat.sh). PATH isn't covered so we can't do that. The problem then is that one of the scripts makes a copy of the original data files stats-$datafile which means it has to be run from the data directory to avoid polluting the scripts directory with back ups of data files. There is an error I just spotted where the do-stats.sh script calls bash goostats.sh but it should be bash ../scripts/goostats.sh. But anyway, this hard-coding of the script file path is jarring and undermines the flexibility that is taught by using $@ to run on any user input.

So, put data and scripts back into the same file? Yes it's an anti-pattern, but without covering PATH I think it might be the best solution. Having the scripts just one level up from the data directory doesn't solve the relative path issue. It's going to have to be a compromise somewhere, the question is where?

@deppen8
Copy link
Contributor

deppen8 commented Sep 8, 2021

@gcapes, I think I was the one who raised the "scripts next to data" as a bit "smelly", but I want to stress that I don't feel too strongly about that. If scripts next to data makes the whole thing run with minimal disruption, we should do that.

@gcapes gcapes force-pushed the tidy-up-files branch 3 times, most recently from 7309dad to b287be3 Compare September 9, 2021 11:06
@gcapes gcapes force-pushed the tidy-up-files branch 2 times, most recently from 0e8cbb7 to 063dfc0 Compare September 16, 2021 11:02
@deppen8
Copy link
Contributor

deppen8 commented Sep 22, 2021

@gcapes , is this ready for detailed review yet?

@gcapes
Copy link
Contributor Author

gcapes commented Sep 22, 2021

Yep, please go ahead!

@deppen8 deppen8 added the status:in review Being review by the community label Sep 24, 2021
Copy link
Contributor

@deppen8 deppen8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work you have put in here, @gcapes. To review, I did a fresh make serve of the lesson from the PR then went through the whole thing as an instructor/learner would: download the data, start from episode 1, and carry on through episode 7. There are a few rough edges to smooth out, but hopefully those are not too onerous.

I made individual comments/suggestions where I could on the diffs. There were some places where I couldn't put comments directly in the diffs because they were on parts that you didn't change (yet). Those are:

  • The path that leads off episode 7 needs to be Desktop/shell-lesson-data/exercise-data/writing
  • The path in the Little Women exercise at the end of episode 7 also needs to be updated: shell-lesson-data/writing/exercise-data/writing/LittleWomen.txt

@@ -433,8 +433,7 @@ $ ls -F
{: .language-bash}
~~~
amino-acids.txt elements/ pdb/ salmon.txt
animals.txt morse.txt planets.txt sunspot.txt
animal-counts/ backup/ creatures/ numbers.txt project/ proteins/ writing/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only these directories are available at this stage in the lesson.

Suggested change
animal-counts/ backup/ creatures/ numbers.txt project/ proteins/ writing/
animal-counts/ creatures/ numbers.txt proteins/ writing/

Comment on lines 752 to 760
$ ls -s Desktop/shell-lesson-data/exercise-data/
total 28
4 animal-counts 4 creatures 12 numbers.txt 4 proteins 4 writing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing:

Suggested change
$ ls -s Desktop/shell-lesson-data/exercise-data/
total 28
4 animal-counts 4 creatures 12 numbers.txt 4 proteins 4 writing
$ ls -s Desktop/shell-lesson-data/exercise-data/
total 8
0 animal-counts 0 creatures 8 numbers.txt 0 proteins 0 writing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be an OS thing - I've just double checked the values. Which OS do you use?

$ ls -S Desktop/shell-lesson-data/data
sunspot.txt animal-counts pdb amino-acids.txt salmon.txt
planets.txt elements morse.txt animals.txt
animal-counts creatures proteins writing numbers.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sorting looks like this:

Suggested change
animal-counts creatures proteins writing numbers.txt
proteins creatures writing animal-counts numbers.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is going on here. This is what my bash is reporting...

$ du . | sort -n
4	./writing/thesis
16	./animal-counts
40	./creatures
76	./proteins
1060	./writing
1208	.

> November ('11') would come before July ('7'). Similarly, putting the year first
> means that June 2012 will come before June 2013.
{: .callout}
she creates a directory called `data`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data/ dir no longer exists. Everything is at the top level of north-pacific-gyre

$ ls north-pacific-gyre/
NENE01729A.txt	NENE01751B.txt	NENE01971Z.txt	NENE02040A.txt	NENE02043B.txt
NENE01729B.txt	NENE01812A.txt	NENE01978A.txt	NENE02040B.txt	goodiff.sh
NENE01736A.txt	NENE01843A.txt	NENE01978B.txt	NENE02040Z.txt	goostats.sh
NENE01751A.txt	NENE01843B.txt	NENE02018B.txt	NENE02043A.txt

Comment on lines 799 to 812
She also creates a `scripts` directory to contain the processing script `goostats.sh`
written by her supervisor, and any new scripts she will write for this project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts is also obsolete now.

> 2. `sort -t, -k2,2 animals.csv | uniq -c`
> 3. `cut -d, -f 2 animals.csv | uniq -c`
> 4. `cut -d, -f 2 animals.csv | sort | uniq -c`
> 5. `cut -d, -f 2 animals.csv | sort | uniq -c | wc -l`
>
> > ## Solution
> > Option 4. is the correct answer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should refer to shell-lesson-data/exercise-data not shell-lesson-data/data

@@ -603,11 +605,11 @@ so that you and other people can put those programs into pipes to multiply their
## Nelle's Pipeline: Checking Files

Nelle has run her samples through the assay machines
and created 17 files in the `north-pacific-gyre/2012-07-03` directory described earlier.
and created 17 files in the `north-pacific-gyre` directory described earlier.
As a quick check, starting from her home directory, Nelle types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a quick check, starting from her home directory, Nelle types:
As a quick check, starting from the `shell-lesson-data` directory, Nelle types:

Comment on lines 198 to 199
> The file
> [`shell-lesson-data/exercise-data/numbers.txt`](../shell-lesson-data/exericse-data/numbers.txt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this needs to be a link.

Comment on lines 164 to 203
> `ls` gives the following output:
>
> ~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have lengths.txt and sorted-lengths.txt in this dir, FWIW.

@@ -550,7 +558,7 @@ Of course, this introduces another tradeoff between flexibility and complexity.

> ## Script Reading Comprehension
>
> For this question, consider the `shell-lesson-data/molecules` directory once again.
> For this question, consider the `shell-lesson-data/exercise-data/proteins` directory once again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is OK, but there are still a number of references to molecules in this lesson. Probably just need to find/replace those.

@gcapes
Copy link
Contributor Author

gcapes commented Sep 28, 2021

Thanks for the review! It's a lot to work through. I think I've addressed all your comments, but there's also some stuff in episode 7 I've overlooked in that the filesystem diagram and subsequent output is now incorrect. I'll add a couple of to-do items below so I don't forget.

  • Update diagram to match current file structure
  • Update subsequent output

@gcapes
Copy link
Contributor Author

gcapes commented Oct 27, 2021

I taught this today and it went well. I've pushed a fix to an example that gave the wrong answer, but otherwise nothing broke.

@gcapes
Copy link
Contributor Author

gcapes commented Jan 6, 2022

The only thing which is (possibly) outstanding is an updated file tree diagram, but I think it would be too large given the new directory layout.
Is this ok to merge now if I fix the conflicts? @swcarpentry/shell-novice-maintainers

@deppen8
Copy link
Contributor

deppen8 commented Jan 6, 2022

I say merge. The only way we can know the rough edges is if more people start using it.

@gcapes
Copy link
Contributor Author

gcapes commented Feb 22, 2022

I've just replaced the ls output with the visually easier output from tree. I don't think it matters that we've not covered tree, and I've not shown the command - just the output. If no-one objects (?) I'll go ahead and merge this, and people can start teaching it.

@gcapes gcapes merged commit 636f637 into swcarpentry:gh-pages Mar 8, 2022
@deppen8
Copy link
Contributor

deppen8 commented Mar 16, 2022

Just want to add a comment to commemorate this PR 😄 getting merged. It was a huge lift by @gcapes and I am really excited to see how it improves the lesson. 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:in review Being review by the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants