TCGGPT is a series of python scripts that can be used to train and run inference with a ~1-million parameter transformer-based model capable of generating the mechanically-relevant portions of Magic: The Gathering cards.
TCGGPT is built on MLX and will only run on Macs with an Apple Silicon processor.
This repo does not contain the model or training data, just the scripts needed to build and run the model. A standard training run takes 20 minutes and a full training run takes 60 minutes on my M4 Max MBP.
The character |
represents a new line in the rules text.
<Type> artifact
<ManaCost> {4}
<Text> {1} , {t} , sacrifice ~ : target attacking creature get - 4 / - 4 until end of turn .
<Type> instant
<ManaCost> {1} {u}
<Text> replicate {2} {u} | counter target spell unless its controller pays {1} .
<Type> creature - - beast beast
<ManaCost> {4} {g}
<Stats> 3 / 3
<Text> {u} : ~ gains hexproof until end of turn .
It also frequently produces nonsense.
<Type> enchantment
<ManaCost> {2} {r}
<Text> whenever you cast a spell that targets a creature or that shares a creature type with at the end step , ~ deals 2 damage to any target .
Before beginning, you will need the raw data to train the network. You can download a json file in the appropriate format from Scryfall's Bulk Data page. The best set for training is 'Oracle Cards' because it is de-duplicated by card name.
For this section, I will assume you have downloaded the oracle json file and placed it on the path ./data/raw/oracle-cards.json
.
- Create a new virtualenv and install the python dependencies.
pip install -r ./requirements.txt
- Run the preprocess script to extract the data from the oracle json file. This will produce a few files in
./data/
.python ./0_preprocess.py ./data/raw/oracle-cards.json
- Run the training script to train the network with default configuration. This will produce a model file at
./model/model.safetensors
and a tokenizer definition at./model/model.tokenizer
. It will also write some details about the training run to./train_log/default/
.python ./1_train_model.py
- By default this will do a 'fast' training run. The alternative is to use the argument
--full
to do a full length training run which will take three times as long and result in a slightly higher quality model.
- Run the inference script to generate any number of cards. Cards are delimited by the string
<NewCard>
.python ./2_inference.py ./model/model.safetensors --count 5
- This will automatically locate the tokenizer definition by replacing
.safetensors
with.tokenizer
in the model path. - Run
python ./2_inference.py --help
to see available inference options.
The architecture of this model is largely a copy of Andrej Karpathy's nanoGPT, which itself it based on GPT-2, with some minor modifications and re-tuned hyperparameters. It has also been extended to support some more modern features including SwiGLU feed-forward blocks and RoPE.
Because this is a very small model at around 1 million paramaters, it has no chance of being trained as a general purpose language model capable of outputting coherent natural language. Instead, it is designed around the fact that Magic: The Gathering cards are written in a very strict subset of English with many common words and phrases being repeated on thousands of different cards.
This model is broadly capable of outputting cards that make sense with respect to:
- Effects cards can have
- How multiple effects can be composed into one card
- Permanents having static/triggered/activatable effects and instants/sorceries having immediate effects.
- Which effects correlate with which card types and mana types (blue for card draw, red for direct damage, etc).
It struggles to ensure that a card's 'strength' is proportional to its mana cost.
This model uses whole-word tokenization because we only want it to output words that are found in the training data set and it does not need to understand unseen words as input. Tokenization is case-insensitive.
To facilitate letting the model learn all of the above information, we want to make sure we feed it only data that is directly relevant to how a card functions in the game. For now, this means that we only care about:
- Type line
- Mana cost
- Rules text
- Stats (creatures only)
Things like card name and flavor text do not follow MTG's strict templating rules and use unique words and phrases that are rarely repeated in the training data, so the model will only be made less effective by their presence.
The raw data from scryfall is cleaned in a way to help the model learn meaningful associations between different tokens. This is accomplished by replacing rare tokens with more generic tokens as described below, as well as decomposing nouns and verbs.
- Anywhere a card references itself in rules text, its name is replaced by
~
. - Anywhere a card references another card by name, the name is replaced by
$named_card$
. - Flavor ability words are replaced by
$flavor_ability_word$
. - Any time a card creates a permanent, the permanent's name is replaced by
$named_permanent$
. - When any counter is created by two or fewer cards, the counter's name is replaced by
$unique_counter$
. - Any planeswalker type that appears on two or fewer cards is replaced with
$unique_planeswalker_type$
. - Some plural words are decomposed into the singular version of the word followed by
*s
. (ex:creatures
->creature *s
) - Some verbs are decomposed into a base verb followed by a suffix. (ex:
tapped
->tap `ed
)
The following data was gathered from a full training run of commit a2b2f876
with the scryfall data from Jan 22, 2025.
- Training set: 24912 cards
- Test set: 2492 cards
- Training iterations: 90000
- Batch sizes: 16, 20, 24, 32, 40
- Parameter count: 928960
This graph shows both the train and test loss over time. Dotted vertical lines indicate points where the batch size was inceased.