Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Moonshine #34784

Merged
merged 93 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
35434da
config draft
eustlb Nov 18, 2024
7e18038
full encoder forward
eustlb Dec 2, 2024
6517251
full decoder forward
eustlb Dec 3, 2024
b0efed1
fix sdpa and FA2
eustlb Dec 12, 2024
b4d18f9
fix sdpa and FA2
eustlb Dec 12, 2024
b3777e0
moonshine model
eustlb Dec 13, 2024
e313ab5
moonshine model forward
eustlb Dec 13, 2024
7a6935a
fix attention with past_key_values
eustlb Dec 13, 2024
8fda426
add MoonshineForConditionalGeneration
eustlb Dec 13, 2024
d0ed917
fix cache handling and causality for cross attention
eustlb Dec 13, 2024
461f210
no causal attention mask for the encoder
eustlb Dec 15, 2024
22dbaae
model addition (imports etc)
eustlb Dec 15, 2024
72ba8c4
small nit
eustlb Dec 15, 2024
f548504
nits
eustlb Dec 16, 2024
3d52b1e
Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py
eustlb Dec 16, 2024
8f82a40
add rope_theta
eustlb Dec 16, 2024
fc73b37
nits
eustlb Dec 16, 2024
aedccf5
model doc
eustlb Dec 16, 2024
0954133
Update src/transformers/models/auto/configuration_auto.py
eustlb Dec 17, 2024
b1f0909
imports
eustlb Dec 17, 2024
f647a9f
add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES
eustlb Dec 17, 2024
2da255d
updates modular
eustlb Dec 17, 2024
407cd3e
make
eustlb Dec 17, 2024
b3da73d
make fix-copies
eustlb Dec 17, 2024
8cf195d
ruff check examples fix
eustlb Dec 17, 2024
f78db3d
fix check_modular_conversion
eustlb Dec 17, 2024
5927846
Merge branch 'main' into add-moonshine
eustlb Dec 17, 2024
b680f0d
nit
eustlb Dec 17, 2024
01a4ea1
nits
eustlb Dec 17, 2024
5124cbd
nits
eustlb Dec 17, 2024
a528bd3
copied from -> imports
eustlb Dec 18, 2024
5648c38
Merge branch 'main' into add-moonshine
eustlb Dec 20, 2024
338c7c0
imports fix
eustlb Dec 20, 2024
2ec366a
integrate attention refacto
eustlb Dec 20, 2024
5261886
modular edge case
eustlb Dec 20, 2024
467b8e4
remove encoder
eustlb Jan 2, 2025
445dfcb
convolutions params in config
eustlb Jan 2, 2025
a656e8c
run modular_model_converter
eustlb Jan 2, 2025
6547634
make
eustlb Jan 2, 2025
9eec03b
Update docs/source/en/model_doc/moonshine.md
eustlb Jan 2, 2025
53b9b9d
MoonshineModelTest
eustlb Jan 2, 2025
fb2258f
correct typo
eustlb Jan 7, 2025
259c6b3
make style
eustlb Jan 7, 2025
59dd57f
integration tests
eustlb Jan 7, 2025
8fe2cdc
Merge branch 'main' into add-moonshine
eustlb Jan 7, 2025
02c4343
make
eustlb Jan 7, 2025
6c7dc79
modular convert
eustlb Jan 7, 2025
89904f9
Merge branch 'main' into add-moonshine
eustlb Jan 8, 2025
829a2b8
name conversion update (up_proj -> fc1 etc)
eustlb Jan 9, 2025
02a1f1f
update config
eustlb Jan 9, 2025
80c39cd
update MLP
eustlb Jan 9, 2025
55ac7e2
update attention
eustlb Jan 9, 2025
4d540af
update encoder layer
eustlb Jan 9, 2025
4f52158
update decoder layer
eustlb Jan 9, 2025
6568576
update convolutions parameters
eustlb Jan 9, 2025
495bd73
update encoder
eustlb Jan 9, 2025
48492c2
remove INPUTS_DOCSTRING
eustlb Jan 9, 2025
21a743f
update decoder
eustlb Jan 9, 2025
3b7af33
update conditional generation
eustlb Jan 9, 2025
563a360
update pretrained model
eustlb Jan 9, 2025
0ebd9a5
imports
eustlb Jan 9, 2025
d0233a7
modular converted
eustlb Jan 9, 2025
f9a6cac
update doc
eustlb Jan 9, 2025
b8a6e3b
fix
eustlb Jan 9, 2025
bfea59a
typo
eustlb Jan 9, 2025
ea150d3
update doc
eustlb Jan 9, 2025
8170cc8
update license
eustlb Jan 9, 2025
9bc7b35
update init
eustlb Jan 9, 2025
2bbb8ee
split config in file
eustlb Jan 9, 2025
b42d7f6
two classes for MLP
eustlb Jan 9, 2025
08272b9
attention from GLM
eustlb Jan 9, 2025
61462db
from GlmRotaryEmbedding
eustlb Jan 9, 2025
c866e58
split MLP
eustlb Jan 9, 2025
b0183be
apply arthur's review suggestions
eustlb Jan 9, 2025
3aa4f8d
apply arthur's review suggestions
eustlb Jan 9, 2025
d64190c
apply arthur's review suggestions
eustlb Jan 9, 2025
0bfb6bc
auto feature extractor
eustlb Jan 9, 2025
1268e13
convert modular
eustlb Jan 9, 2025
141a70b
fix + make
eustlb Jan 9, 2025
70f16dc
Merge branch 'main' into add-moonshine
eustlb Jan 9, 2025
1f3cc63
convert modular
eustlb Jan 9, 2025
77ca19a
make
eustlb Jan 9, 2025
e6b19db
unsplit config
eustlb Jan 9, 2025
7155fbd
use correct checkpoint
eustlb Jan 9, 2025
15b2552
wrap generate
eustlb Jan 9, 2025
89ceca8
Merge branch 'main' into add-moonshine
eustlb Jan 9, 2025
fc4febc
update tests
eustlb Jan 10, 2025
3e2a84b
typos
eustlb Jan 10, 2025
9a8f91d
make
eustlb Jan 10, 2025
834c364
typo
eustlb Jan 10, 2025
b491d95
update doc
eustlb Jan 10, 2025
d357fff
Merge branch 'main' into add-moonshine
eustlb Jan 10, 2025
2a6c59e
Merge branch 'main' into add-moonshine
eustlb Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,8 @@
title: mLUKE
- local: model_doc/mobilebert
title: MobileBERT
- local: model_doc/moonshine
title: moonshine
- local: model_doc/mpnet
title: MPNet
- local: model_doc/mpt
Expand Down
60 changes: 60 additions & 0 deletions docs/source/en/model_doc/moonshine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# moonshine
eustlb marked this conversation as resolved.
Show resolved Hide resolved

eustlb marked this conversation as resolved.
Show resolved Hide resolved
# moonshine

## Overview

The moonshine model was proposed in [Moonshine: Speech Recognition for Live Transcription and Voice Commands
eustlb marked this conversation as resolved.
Show resolved Hide resolved
](https://arxiv.org/abs/2410.15608) by Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden.

The abstract from the paper is the following:

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency for the encoder during inference time. When benchmarked against OpenAI's Whisper tiny-en, Moonshine Tiny demonstrates a 5x reduction in compute requirements for transcribing a 10-second speech segment while incurring no increase in word error rates across standard evaluation datasets. These results highlight Moonshine's potential for real-time and resource-constrained applications.
eustlb marked this conversation as resolved.
Show resolved Hide resolved

Tips:

- Moonshine improves upon Whisper's architecture:
1. It uses SwiGLU activation instead of GELU in the decoder layers
2. Most importantly, it replaces absolute position embeddings with Rotary Position Embeddings (RoPE). This allows Moonshine to handle audio inputs of any length, unlike Whisper which is restricted to fixed 30-second windows.

eustlb marked this conversation as resolved.
Show resolved Hide resolved
This model was contributed by [Eustache Le Bihan (eustlb)](https://huggingface.co/eustlb).
The original code can be found [here](https://github.com/usefulsensors/moonshine).


## MoonshineConfig

[[autodoc]] MoonshineConfig

<frameworkcontent>
<pt>

## MoonshineModel

[[autodoc]] MoonshineModel
- forward
- _mask_input_features

## MoonshineForConditionalGeneration

[[autodoc]] MoonshineForConditionalGeneration
- forward
- generate

</pt>
<tf>
eustlb marked this conversation as resolved.
Show resolved Hide resolved
22 changes: 22 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,7 @@
"models.gpt_bigcode": ["GPTBigCodeConfig"],
"models.gpt_neo": ["GPTNeoConfig"],
"models.gpt_neox": ["GPTNeoXConfig"],
"models.moonshine": ["MoonshineConfig"],
"models.gpt_neox_japanese": ["GPTNeoXJapaneseConfig"],
"models.gpt_sw3": [],
"models.gptj": ["GPTJConfig"],
Expand Down Expand Up @@ -1081,6 +1082,7 @@
_import_structure["models.gemma"].append("GemmaTokenizerFast")
_import_structure["models.gpt2"].append("GPT2TokenizerFast")
_import_structure["models.gpt_neox"].append("GPTNeoXTokenizerFast")
_import_structure["models.moonshine"].append("MoonshineTokenizer")
_import_structure["models.gpt_neox_japanese"].append("GPTNeoXJapaneseTokenizer")
_import_structure["models.herbert"].append("HerbertTokenizerFast")
_import_structure["models.layoutlm"].append("LayoutLMTokenizerFast")
Expand Down Expand Up @@ -2360,6 +2362,16 @@
"GPTNeoXPreTrainedModel",
]
)
_import_structure["models.moonshine"].extend(
[
"MoonshineForCausalLM",
"MoonshineForQuestionAnswering",
"MoonshineForSequenceClassification",
"MoonshineForTokenClassification",
"MoonshineModel",
"MoonshinePreTrainedModel",
]
)
_import_structure["models.gpt_neox_japanese"].extend(
[
"GPTNeoXJapaneseForCausalLM",
Expand Down Expand Up @@ -5486,6 +5498,7 @@
from .models.mobilevitv2 import (
MobileViTV2Config,
)
from .models.moonshine import MoonshineConfig
from .models.moshi import (
MoshiConfig,
MoshiDepthConfig,
Expand Down Expand Up @@ -6023,6 +6036,7 @@
from .models.mbart import MBartTokenizerFast
from .models.mbart50 import MBart50TokenizerFast
from .models.mobilebert import MobileBertTokenizerFast
from .models.moonshine import MoonshineTokenizer
from .models.mpnet import MPNetTokenizerFast
from .models.mt5 import MT5TokenizerFast
from .models.mvp import MvpTokenizerFast
Expand Down Expand Up @@ -7441,6 +7455,14 @@
MobileViTV2Model,
MobileViTV2PreTrainedModel,
)
from .models.moonshine import (
MoonshineForCausalLM,
MoonshineForQuestionAnswering,
MoonshineForSequenceClassification,
MoonshineForTokenClassification,
MoonshineModel,
MoonshinePreTrainedModel,
)
from .models.moshi import (
MoshiForCausalLM,
MoshiForConditionalGeneration,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@
mobilenet_v2,
mobilevit,
mobilevitv2,
moonshine,
moshi,
mpnet,
mpt,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@
("mobilenet_v2", "MobileNetV2Config"),
("mobilevit", "MobileViTConfig"),
("mobilevitv2", "MobileViTV2Config"),
("moonshine", "MoonshineConfig"),
("moshi", "MoshiConfig"),
("mpnet", "MPNetConfig"),
("mpt", "MptConfig"),
Expand Down Expand Up @@ -494,6 +495,7 @@
("mobilenet_v2", "MobileNetV2"),
("mobilevit", "MobileViT"),
("mobilevitv2", "MobileViTV2"),
("moonshine", "moonshine"),
eustlb marked this conversation as resolved.
Show resolved Hide resolved
("moshi", "Moshi"),
("mpnet", "MPNet"),
("mpt", "MPT"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@
("mobilenet_v2", "MobileNetV2Model"),
("mobilevit", "MobileViTModel"),
("mobilevitv2", "MobileViTV2Model"),
("moonshine", "MoonshineModel"),
("moshi", "MoshiModel"),
("mpnet", "MPNetModel"),
("mpt", "MptModel"),
Expand Down Expand Up @@ -424,6 +425,7 @@
("mega", "MegaForMaskedLM"),
("megatron-bert", "MegatronBertForCausalLM"),
("mobilebert", "MobileBertForMaskedLM"),
("moonshine", "MoonshineForConditionalGeneration"),
eustlb marked this conversation as resolved.
Show resolved Hide resolved
("mpnet", "MPNetForMaskedLM"),
("mpt", "MptForCausalLM"),
("mra", "MraForMaskedLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@
("mctct", "MCTCTProcessor"),
("mgp-str", "MgpstrProcessor"),
("mllama", "MllamaProcessor"),
("moonshine", "Wav2Vec2Processor"),
eustlb marked this conversation as resolved.
Show resolved Hide resolved
("oneformer", "OneFormerProcessor"),
("owlv2", "Owlv2Processor"),
("owlvit", "OwlViTProcessor"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,7 @@
("mllama", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("mluke", ("MLukeTokenizer" if is_sentencepiece_available() else None, None)),
("mobilebert", ("MobileBertTokenizer", "MobileBertTokenizerFast" if is_tokenizers_available() else None)),
("moonshine", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("moshi", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("mpnet", ("MPNetTokenizer", "MPNetTokenizerFast" if is_tokenizers_available() else None)),
("mpt", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
Expand Down
59 changes: 59 additions & 0 deletions src/transformers/models/moonshine/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_moonshine": ["MoonshineConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_moonshine"] = [
"MoonshineForConditionalGeneration",
"MoonshineModel",
"MoonshinePreTrainedModel",
]


if TYPE_CHECKING:
from .configuration_moonshine import MoonshineConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_moonshine import (
MoonshineForConditionalGeneration,
MoonshineModel,
MoonshinePreTrainedModel,
)


else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading
Loading