- Finland
-
01:05
(UTC +02:00) - @IvanYashchuk
Lists (1)
Sort Name ascending (A-Z)
Stars
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Entropy Based Sampling and Parallel CoT Decoding
A formalized proof of Carleson's theorem in Lean
llama3 implementation one matrix multiplication at a time
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
The user home repository for the Mathematics in Lean tutorial.
The best repository showing why transformers might not be the answer for time series forecasting and showcasing the best SOTA non transformer models.
GPU programming related news and material links
Website containing illustrations about Machine Learning theory!
A performance library for machine learning applications.
A book about compiling Racket and Python to x86-64 assembly
An experimental simple method overlay mechanism for Julia
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Automatically use the awesome walrus operator
✨ Programming Language Research, Applied PLT & Compilers
Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
Model parallel transformers in JAX and Haiku
Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
You should use PySR to find scaling laws. Here's an example.
A Python extension module that uses C, C++, Fortran and Rust
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…