Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article huggingface #61

Merged
merged 9 commits into from
Jan 15, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/presentations/articles/hugging_face.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Hugging Face

_Author: [Luis Nothvogel](mailto:[email protected])_

## TL;DR

Hugging Face has emerged as a pivotal player in the AI and machine learning arena, specializing in natural language processing (NLP). This article delves into its core offerings, including model hosting, spaces, datasets, pricing, and the Terraformer API. Hugging Face is not only a repository for cutting-edge models but also a platform for collaboration and innovation in AI.

### Model Hosting on Hugging Face

Hugging Face has made a name for itself in model hosting. It offers a vast repository of pre-trained models, primarily focused on NLP tasks such as text classification, question answering, and language generation. According to them they host over 350k models. Users can easily download and deploy these models. Moreover, Hugging Face allows users to upload and share their models with the community.

```python
from transformers import pipeline, set_seed

# Example of using a pre-trained model
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generated_texts = generator("The student worked on", max_length=30, num_return_sequences=2)
print(generated_texts)

Output: [{'generated_text': 'The student worked on his paper, which you can read about here. You can get an ebook with that part, or an audiobook with some of'}, {'generated_text': 'The student worked on this particular task by making the same basic task in his head again and again, without the help of some external helper, even when'}]
Involute1 marked this conversation as resolved.
Show resolved Hide resolved
```

### Spaces: A Collaborative Environment

Spaces are an innovative feature of Hugging Face, offering a collaborative environment where developers can build, showcase, and share machine learning applications. Users can deploy models as web applications, creating interactive demos that are accessible to a broader audience. Spaces support various frameworks like Streamlit and Gradio. According to them they host over 150k spaces.

### Diverse Datasets at Your Disposal

The Hugging Face ecosystem includes a wide range of datasets, catering to different NLP tasks. The Datasets library simplifies the process of loading and processing data, ensuring efficiency and consistency in model training. According to them they host over 75k datasets.

[Wikipdia Referenz](https://huggingface.co/datasets/wikimedia/wikipedia)
```python
from datasets import load_dataset

# Example of loading a dataset
ds = load_dataset("wikimedia/wikipedia", "20231101.en")
```


### Transformers API: Transform Text Effortlessly

The Transformers API is a testament to Hugging Face's innovation. This API simplifies the process of text transformation, making it accessible even to those with limited programming skills. It supports a variety of NLP tasks and can be integrated into various applications.

```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
```

### Tokenizers

Hugging Face Tokenizers sind ein wesentlicher Bestandteil der modernen Verarbeitung natürlicher Sprache. Diese Tokenizer zeichnen sich durch ihre Schnelligkeit und Flexibilität aus und unterstützen eine Vielzahl von Tokenisierungsmethoden wie BPE (Byte-Pair-Encoding), WordPiece oder SentencePiece. Eines der Hauptmerkmale der Hugging Face Tokenizer ist ihre Kompatibilität mit verschiedenen Sprachmodellen wie BERT, GPT und RoBERTa, was sie zu einem universellen Werkzeug in der NLP-Community macht. Mit fortlaufenden Innovationen und Updates bleiben die Hugging Face Tokenizers an der Spitze der NLP-Technologie, indem sie kontinuierlich verbesserte Effizienz und Genauigkeit in der Sprachverarbeitung bieten.

### Inference

Hugging Face Inference spielt eine entscheidende Rolle in der Umsetzung von trainierten Sprachmodellen in produktive Anwendungen. Die Plattform bietet eine intuitive und leistungsfähige Infrastruktur für das Inferenzieren von Modellen, was bedeutet, dass Entwickler mühelos auf bereits trainierte Modelle zugreifen können, um Echtzeit-Vorhersagen für verschiedenste NLP-Aufgaben zu generieren. Dank der effizienten Implementierung und der Unterstützung von Hardware-Beschleunigungstechnologien ermöglicht Hugging Face Inference die nahtlose Integration von Sprachmodellen in Anwendungen, angefangen von Chatbots über maschinelles Übersetzen bis hin zu Sentimentanalysen.
Involute1 marked this conversation as resolved.
Show resolved Hide resolved

## Key Takeaways

- Hugging Face is at the forefront of NLP, offering a wealth of models, datasets, and tools.
- Its **model hosting** platform is robust, user-friendly, and widely adopted in the AI community.
- **Spaces** foster **collaboration and accessibility**, allowing users to easily share and demonstrate their ML applications.
- The platform's commitment to providing diverse **datasets** accelerates research and development in NLP.
- The **Terraformer API** is a notable tool for simplifying text transformations, enhancing the accessibility of NLP.

## References

- [Hugging Face: The AI community building the future.](https://HuggingFace.co/)
- [Hugging Face Transformers Documentation](https://HuggingFace.co/docs/transformers/index)
- [Hugging Face Datasets Library](https://HuggingFace.co/docs/datasets/index)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,5 @@ nav:
- presentations/presentations.md
- Articles:
- Article Template: presentations/articles/template.md
- HuggingFace: presentations/articles/hugging_face.md
Involute1 marked this conversation as resolved.
Show resolved Hide resolved
- FAQ: faq.md