-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #50 from Lambda-School-Labs/docs/README
README completion
- Loading branch information
Showing
11 changed files
with
214 additions
and
14 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
# Story Squad Data | ||
|
||
- Children's story submissions are not able to be made public due to [COPPA](https://www.ecfr.gov/cgi-bin/text-idx?SID=4939e77c77a1a1a08c1cbf905fc4b409&node=16%3A1.0.1.3.36&rgn=div5) guidelines, and our team decided to extend that to transcriptions as well out of precaution. | ||
- The `squad_score_metrics` csv file in this folder contains the Squad Score v1.1 metrics from all 167 provided stories in our training data set, and was generated from the `squad_score_mvp` notebook. | ||
- The `squad_score_metrics` csv file in this folder contains the Squad Score v1.1 metrics from all 167 provided stories in our training data set, and was generated from the `squad_score_mvp` [notebook](../notebooks/squad_score_mvp.ipynb). | ||
- features: story_id, story_length, avg_word_len, quotes_num, unique_words_num, adj_num, squad_score | ||
- The `rankings` csv file contains the hand-rankings of 25 stories in the dataset, which is the only piece of labeled data provided by the stakeholder. | ||
- features: ranking, story_id | ||
- Anyone with access to the Story Squad data can download any of the notebooks in this repository to generate any additional needed csv files. | ||
- Anyone with access to the Story Squad data can download any of the notebooks in this repository to generate any additional needed csv files. The [README](../notebooks) in the notebooks folder will list any csv a notebook creates. | ||
- Note: for anyone with access to Story Squad data, be advised that the human transcriptions of the stories corresponding to the following Story IDs are missing pages, and are therefore inaccurate and should be removed from any comparisons of human vs computer transcriptions: 3213, 3215, 3240, 5104, 5109, 5262 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
#### Overview of `notebooks` content: | ||
- `clustering`: This notebook explores three different clustering methods to create groupings of users for the gamification portion of Story Squad. The currently implemented version creates groups based on the ranking of the squad scores. The other methods explored were `KMeans Clustering` and `Nearest Neighbors`. These have not been implemented in our application due to time constraints. | ||
- `clustering`: This notebook explores three different clustering methods to create groupings of users for the gamification portion of Story Squad. The currently implemented version ([clustering_mvp.py](../project/app/utils/clustering/clustering_mvp.py)) creates groups based on the ranking of the squad scores. The other methods explored were `KMeans Clustering` and `Nearest Neighbors`. These have not been implemented in our application due to time constraints. | ||
- `count_spelling_errors`: This notebook contains exploration of various spell check libraries to explore whether spell check could correct transcription errors, act as a metric for student writing, and/or increase the reliability of other metrics. For the time being, we did not see enough improvement and consistency to implement this feature. | ||
- `score_visual`: This notebook explores different visualizations to display on the parent's dashboard. Several versions were mocked up and presented to the stakeholder. `histogram.py` and `line_graph.py` are the resulting final visuals per the feedback provided by the stakeholders. Each of these `.py` files are implemented in our application at the visualization endpoint. | ||
- `squad_score_mvp`: This notebook contains data exploration of training data, generation of MinMaxScaler, and Squad Score formula composition for complexity metric. Also produces `squad_score_metrics.csv` which contains a row for each training data transcription. Features include `story_id`, all features used in the most recent Squad Score formula, and `squad_score`. | ||
- `submission_endpoint_interactions`: This Notebook demonstrates the functionality for `submission.py` endpoints and outlines the file structure that is required from the endpoints `UploadFile` type. | ||
- `transcribed_stories`: This notebook connects to the Google Cloud Vision API and transcribes the given 167 stories. Produces the `transcribed_stories.csv` which includes the Submission ID and the Transcribed Text. The `transcribe` method is used to create `transcription.py` which is used in the app. | ||
- `transcription_confidence`: This notebook explores Google Cloud Vision API's method to return confidence levels of its transcription. Produces the `error_confidence.csv` which includes story_id, error (calculated between the api transcription and provided human transcription) and confidence for each submission. The `image_confidence` method is modified to create the `confidence_flag.py` which is used in the app. | ||
- `score_visual`: This notebook explores different visualizations to display on the parent's dashboard. Several versions were mocked up and presented to the stakeholder. [`histogram.py`](../project/app/utils/visualizations/histogram.py) and [`line_graph.py`](../project/app/utils/visualizations/line_graph.py) are the resulting final visuals per the feedback provided by the stakeholders. Each of these `.py` files are implemented in our application at the visualization endpoint. | ||
- `squad_score_mvp`: This notebook contains data exploration of training data, generation of MinMaxScaler, and Squad Score formula composition for complexity metric. Also produces [`squad_score_metrics.csv`](../data/squad_score_metrics.csv) which contains a row for each training data transcription. Features include `story_id`, all features used in the most recent Squad Score formula, and [`squad_score.py`](../project/app/utils/complexity/squad_score.py). | ||
- `submission_endpoint_interactions`: This Notebook demonstrates the functionality for [`submission.py`](../project/app/api/submission.py) endpoints and outlines the file structure that is required from the endpoints `UploadFile` type. | ||
- `transcribed_stories`: This notebook connects to the Google Cloud Vision API and transcribes the given 167 stories. Produces the [`transcribed_stories.csv`](../data) which includes the Submission ID and the Transcribed Text. The `transcribe` method is used to create [`transcription.py`](../project/app/utils/img_processing/transcription.py) which is used in the application. | ||
- `transcription_confidence`: This notebook explores Google Cloud Vision API's method to return confidence levels of its transcription. Produces the [`error_confidence_metrics.csv`](../data) which includes story_id, error (calculated between the api transcription and provided human transcription) and confidence for each submission. The `image_confidence` method is modified to create the [`confidence_flag.py`](../project/app/utils/img_processing/confidence_flag.py) which is used in the application. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters