Skip to content

Commit

Permalink
Merge pull request #148 from pkeilbach/winter-term-24-25-preparation-…
Browse files Browse the repository at this point in the history
…part-2

Winter term 2024/25 preparation part 2
  • Loading branch information
pkeilbach authored Oct 19, 2024
2 parents 599291c + a7b2a4c commit 2079abd
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 27 deletions.
22 changes: 5 additions & 17 deletions docs/assignments.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ See the [Getting Started](./getting_started.md) guide for instructions on how to

To submit an assignment, you will need to provide screenshots of a successful test run.

<!-- TODO issue-121 provide example screenshot -->
![Example of a successful test run](./img/assignment-test-run.png)

You can submit the screenshots via GitHub issue using the issue template for [Assignment Submission](https://github.com/pkeilbach/htwg-practical-nlp/issues/new/choose).

Expand Down Expand Up @@ -134,22 +134,10 @@ If you don't want to deal with git, you could also work purely locally without c
When pulling updates, you probably need to [stash](https://git-scm.com/docs/git-stash) your changes.
But be careful: if not done properly, you may lose your progress! 😱

## Pulling Updates
## Fetching Updates

As described in the getting started guide, there will be [updates](./getting_started.md#fetching-updates) from time to time.
It can happen that these updates affect the assignments (just in case you are wondering why your tests suddenly fail 😅).
As described in the [getting started](./getting_started.md) guide, there will be updates from time to time.

Given that you work on a separate branch on your assignments, you can merge the latest version of the `main` branch into your assignment branch as follows:
It can happen that these updates affect the assignments - just in case you are wondering why your tests suddenly fail 😅).

```sh
# Fetch the latest changes from the remote main branch
git fetch origin main

# Merge the main branch into your current feature branch, e.g. my-assignments
git merge origin/main
```

!!! info

In case you work on a fork (which is awesome 🙌), the process is similar, but you need to fetch from the `upstream` remote repository.
This is described in the [contributing guide](https://github.com/pkeilbach/htwg-practical-nlp/blob/main/CONTRIBUTING.md#syncing-you-fork) in more detail.
So make sure to always [fetch the latest updates](./getting_started.md#fetching-updates) before working on your assignments.
16 changes: 14 additions & 2 deletions docs/course_profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,20 @@ My goal is to keep the entry barrier as low as possible!

If you are new to **GitHub** (a popular Git hosting service), you might want to check out [this module](https://learn.microsoft.com/en-us/training/modules/introduction-to-github/).



!!! tip

In general, [Microsoft Learn](https://docs.microsoft.com/en-us/learn/) offers some great tutorials for all kinds of technologies.

## Literature

Here is a list of recommended literature for this course:

- Bird, Steven, Ewan Klein, and Edward Loper. *Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit*. Sebastopol, CA: O'Reilly Media, 2009. <https://www.nltk.org/book_1ed/>.

- Bishop, Christopher M. *Pattern Recognition and Machine Learning*. New York: Springer, 2006. <https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf>.

- Jurafsky, Daniel, and James H. Martin. *Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition*. 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2008. <https://web.stanford.edu/~jurafsky/slp3/>.

- Raschka, Sebastian. *Build a Large Language Model (From Scratch)*. Shelter Island, NY: Manning, 2024. <https://www.manning.com/books/build-a-large-language-model-from-scratch>.

- Vajjala, Sowmya, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana. *Practical Natural Language Processing: A Comprehensive Guide to Building Real-world NLP Systems*. Sebastopol, CA: O'Reilly Media, 2020. <https://www.practicalnlp.ai/>.
Binary file added docs/img/assignment-test-run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 24 additions & 8 deletions tests/htwgnlp/test_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,35 @@ def test_get_embeddings(loaded_embeddings):
def test_euclidean_distance(loaded_embeddings, test_vector):
assert isinstance(loaded_embeddings.euclidean_distance(test_vector), np.ndarray)
assert loaded_embeddings.euclidean_distance(test_vector).shape == (243,)
assert loaded_embeddings.euclidean_distance(test_vector)[0] == 17.507894003796004
assert loaded_embeddings.euclidean_distance(test_vector)[1] == 17.76195946823725
assert loaded_embeddings.euclidean_distance(test_vector)[42] == 17.787844721963356
assert loaded_embeddings.euclidean_distance(test_vector)[242] == 17.745477284490963
np.testing.assert_allclose(
loaded_embeddings.euclidean_distance(test_vector)[0], 17.507894003796004
)
np.testing.assert_allclose(
loaded_embeddings.euclidean_distance(test_vector)[1], 17.76195946823725
)
np.testing.assert_allclose(
loaded_embeddings.euclidean_distance(test_vector)[42], 17.787844721963356
)
np.testing.assert_allclose(
loaded_embeddings.euclidean_distance(test_vector)[242], 17.745477284490963
)


def test_cosine_similarity(loaded_embeddings, test_vector):
assert isinstance(loaded_embeddings.cosine_similarity(test_vector), np.ndarray)
assert loaded_embeddings.cosine_similarity(test_vector).shape == (243,)
assert loaded_embeddings.cosine_similarity(test_vector)[0] == -0.037310105006509546
assert loaded_embeddings.cosine_similarity(test_vector)[1] == -0.12679458247346523
assert loaded_embeddings.cosine_similarity(test_vector)[42] == -0.026496807469057613
assert loaded_embeddings.cosine_similarity(test_vector)[242] == -0.0657470030012723
np.testing.assert_allclose(
loaded_embeddings.cosine_similarity(test_vector)[0], -0.037310105006509546
)
np.testing.assert_allclose(
loaded_embeddings.cosine_similarity(test_vector)[1], -0.12679458247346523
)
np.testing.assert_allclose(
loaded_embeddings.cosine_similarity(test_vector)[42], -0.026496807469057613
)
np.testing.assert_allclose(
loaded_embeddings.cosine_similarity(test_vector)[242], -0.0657470030012723
)


def test_find_closest_word(loaded_embeddings, test_vector):
Expand Down

0 comments on commit 2079abd

Please sign in to comment.