Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream merge jan09 #151

Merged
merged 275 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
275 commits
Select commit Hold shift + click to select a range
204860b
[Fix] ChatModule incorrect temperature buffer shape (#1070)
MasterJH5574 Oct 15, 2023
d202077
[ParamManager] Added progress bar for get_item/set_item (#1063)
Lunderberg Oct 16, 2023
9872c48
[Python] Extract common device str parse function in ChatModule (#1074)
MasterJH5574 Oct 16, 2023
3aefd9f
[Bugfix] Compilation Error in q4f32_1 (#1078)
junrushao Oct 17, 2023
2625945
Establish `mlc_chat.compiler` (#1082)
junrushao Oct 19, 2023
56a8004
Update README.md for Multi-GPU (#1090)
junrushao Oct 19, 2023
b0373d1
Support lib_path override in C++. Improvements on docs and error mess…
rickzx Oct 19, 2023
830656f
StreamIterator (#1057)
varshith15 Oct 19, 2023
9bf5723
Update `benchmark.py` according to #1086 (#1091)
junrushao Oct 19, 2023
62d0c03
Disable Disco for q4f16_ft and q8f16_ft quantization (#1094)
LeshengJin Oct 20, 2023
cf39bf6
[Format] Apply isort and black for `python/` (#1097)
junrushao Oct 20, 2023
e9b85ce
More formatting (#1099)
junrushao Oct 21, 2023
03c641a
Enable Python Linter (#1098)
junrushao Oct 21, 2023
46d11e6
Add Basic Pylint and Mypy Tooling (#1100)
junrushao Oct 21, 2023
6159cc4
[CI] Add clang-format (#1103)
junrushao Oct 22, 2023
16dd2ae
[Slim-LM] Smart path finding for config and weight (#1088)
LeshengJin Oct 23, 2023
f57c9c9
[Transform] Provide IRModule transform for rewrite_attention (#1052)
Lunderberg Oct 23, 2023
e5927ce
[ParamManager] Use BundleModelParams for transform_dequantize (#1056)
Lunderberg Oct 23, 2023
7ae8c6d
[Slim-LM] Introduce HFLoad for loading Pytorch and SafeTensor weights…
LeshengJin Oct 23, 2023
5a7dcd8
[WINDOWS] reduce noise in windows build (#1115)
tqchen Oct 24, 2023
61179a0
Add CLI commands for compilation (#1109)
junrushao Oct 24, 2023
8ce7793
Auto updated submodule references
Oct 24, 2023
488017d
fix mismatched argument name (#1117)
Sing-Li Oct 24, 2023
206103b
[Docs] Add doc for max and mean gen len, shift factor; and buildArgs …
CharlieFRuan Oct 24, 2023
2aa6809
Revert "[ParamManager] Use BundleModelParams for transform_dequantize…
junrushao Oct 24, 2023
9cb8e8e
Remove inaccurate warning message (#1121)
junrushao Oct 24, 2023
9166edb
[REST] OpenAI compatible Rest API (#1107)
Kartik14 Oct 24, 2023
a4279e3
Add --opt flag parsing to CLI (#1123)
junrushao Oct 25, 2023
973f9fc
[ParamManager][Redo] Use BundleModelParams for transform_dequantize (…
Lunderberg Oct 25, 2023
24f795e
added details to windows installation (#1133)
goutham2688 Oct 27, 2023
2c492e5
Grammatical and Typographical improvements (#1139)
tmsagarofficial Oct 28, 2023
2ec0cc8
Minor enhancements to `ChatModule` (#1132)
YuchenJin Oct 28, 2023
27ac5ac
Updating tvm install docs (#1143)
David-Sharma Oct 29, 2023
2b6d832
Make the help info consistent with program name (#1137)
fennecJ Oct 29, 2023
878ae84
Support parameter packing (#1146)
junrushao Oct 29, 2023
c0c3a8d
[Slim-LM] Enable Group Quant (#1129)
zxybazh Oct 29, 2023
2193767
Enable Mypy and Pylint in mlc_chat Python Package (#1149)
junrushao Oct 29, 2023
0a25374
Migrate Compiler Passes (#1150)
junrushao Oct 30, 2023
1a79a53
Compile Model Preset without External `config.json` (#1151)
junrushao Oct 30, 2023
ba67835
Update attention layer (#1153)
junrushao Oct 30, 2023
fee2cb5
Add batched Llama model definition using vLLM paged attention (#1134)
masahi Oct 30, 2023
ece97b1
[Transform][Redo] Apply split_rotary optimization on prefill (#1125)
Lunderberg Oct 30, 2023
b190578
Apply rewrite for normal attention and MQA (#1138)
Lunderberg Oct 30, 2023
8ca0176
[Rest] Fix emoji handling in Rest API. (#1142)
YuchenJin Oct 30, 2023
3cf5605
[Utility] Check for isinstance(exc, Exception) before entering pdb (#…
Lunderberg Oct 30, 2023
0a9d6c7
[Utils] Remove conversion to numpy array in utils.save_params (#1083)
Lunderberg Oct 30, 2023
425a2cb
[Fix][REST] Use lowered-cased "app" (#1159)
junrushao Oct 30, 2023
9076d01
[Rest] Document emoji handling (#1160)
YuchenJin Oct 31, 2023
b5bfa5b
Enable group quant transform with nn.Module (#1154)
cyx-6 Oct 31, 2023
8438b27
Misc Cleanups of Compilation Pipeline (#1165)
junrushao Oct 31, 2023
02d1e57
Support CUDA Multi-Arch Compilation (#1166)
junrushao Oct 31, 2023
e0cd3f6
[Bugfix] Cannot find global function `mlc.llm_chat_create` (#1167)
junrushao Oct 31, 2023
f5b2e88
Fix RWKV Support (#1136)
BBuf Nov 1, 2023
200653a
Auto updated submodule references
Nov 1, 2023
9831135
Fix Android app Permission denied error on Android 10 (#1175)
anibohara2000 Nov 1, 2023
1757777
[SLM] Fix group quantization (#1172)
cyx-6 Nov 1, 2023
2ca7d15
[Fix] TIR block name of dequantization (#1177)
junrushao Nov 2, 2023
53060af
[SLM][AutoLLM] Enable Command Line Weight Conversion (#1170)
zxybazh Nov 2, 2023
2dc8183
[Fix][SLM] Update q4f16 quantization with the new mutator name rule (…
LeshengJin Nov 3, 2023
6ae02dd
[Model Support][SWA] Add support for sliding window attention for Mis…
CharlieFRuan Nov 3, 2023
4716704
Add Python API for Weight Conversion (#1182)
junrushao Nov 4, 2023
9d20575
Merge `llama_config.CONFIG` into `MODEL_PRESETS` (#1188)
junrushao Nov 4, 2023
5d1dc34
Merge llama_config.py into llama_model.py (#1189)
junrushao Nov 4, 2023
4832c2f
Add CodeLlama as part of model presets (#1190)
junrushao Nov 4, 2023
78424f0
[Docs] Clarify zstd installation on Windows (#1191)
junrushao Nov 4, 2023
5d63f7e
[Docs] Clarify zstd installation on Windows (#1196)
junrushao Nov 4, 2023
3417505
Support overriding `--max-sequence-length` in command line (#1197)
junrushao Nov 5, 2023
0e08845
[RestAPI] Added docs (#1193)
anibohara2000 Nov 5, 2023
145a984
[API] ```llm-vscode``` extension support (#1198)
davidpissarra Nov 5, 2023
3413d17
[Fix] Use `fabs` as floating point abs function in C++ (#1202)
junrushao Nov 5, 2023
7ccb51a
Integrating MLC runtime with the new compilation workflow (#1203)
junrushao Nov 6, 2023
65478c8
[Fix] Remove Redundant Warnings (#1204)
junrushao Nov 6, 2023
01d4339
Try fix macOS build with picojson (#1206)
junrushao Nov 6, 2023
51d6f9c
Try fix macOS build with picojson again (#1207)
junrushao Nov 6, 2023
a7f1183
Auto updated submodule references
Nov 6, 2023
e2c99a8
[Fix] Keep update-to-date with upstream API change (#1209)
junrushao Nov 6, 2023
e00220c
Detect `mtriple` via LLVM (#1211)
junrushao Nov 6, 2023
9869ca6
Fix Python3.8 compatibility breakage (#1210)
Lunderberg Nov 6, 2023
4042626
[Slim-LM] Enable loading from AWQ pre-quantized weight. (#1114)
LeshengJin Nov 6, 2023
be1c18b
[Bugfix] Fix Cannot import name '_LIB' from 'mlc_chat.base' (#1214)
CharlieFRuan Nov 7, 2023
1015aae
[SLM] Support `q3f16_1` and `q4f32_1` (#1215)
cyx-6 Nov 8, 2023
1a6fadd
Make the Compilation Working E2E (#1218)
junrushao Nov 8, 2023
616ca42
[Mistral][SWA] Add sliding window to metadata (#1217)
CharlieFRuan Nov 8, 2023
e52f449
Support for `chatml` format conversation (for TinyLlama-1.1B-Chat-v0.…
acalatrava Nov 8, 2023
fbe75e3
Add Rust Support for MLC-LLM (#1213)
YuchenJin Nov 8, 2023
beca2ab
[Bugfix] Remove dependency on openai_api in chat module (#1222)
CharlieFRuan Nov 8, 2023
9ee5705
Bake in RAM Usage in the Generated DSO (#1224)
junrushao Nov 8, 2023
069181c
[Fix] ChatModule python messages and offset types (#1220)
YuchenJin Nov 8, 2023
f1bc951
[Fix] Variable Upperbound Should be Injected before Build Pipeline (#…
junrushao Nov 8, 2023
834811f
[MultiGPU] Support pre-sharded model weights (#1096)
Lunderberg Nov 9, 2023
45bf1c5
[AWQ] e2e awq-quantized model (#1229)
LeshengJin Nov 10, 2023
d08b009
[SLM] Support `q0f16` and `q0f32` (#1228)
cyx-6 Nov 10, 2023
fab4486
[Core][Llama] Argument `max_vocab_size` and `max_batch_size` (#1076)
MasterJH5574 Nov 11, 2023
cd71665
[Llama] Support batched prefill (#1233)
MasterJH5574 Nov 11, 2023
a21c759
[Core] Skip PrimExpr index int32 downcasting for batching (#1234)
MasterJH5574 Nov 11, 2023
cb68e7b
Auto updated submodule references
Nov 12, 2023
1400cd9
Update index.rst (#1236)
a7k3 Nov 12, 2023
c2082d8
Update android.rst (#1237)
a7k3 Nov 12, 2023
26fd019
Correct typo in cuda device name for rust chat model (#1241)
malramsay64 Nov 13, 2023
ab2a05b
Generating mlc-chat-config.json (#1238)
junrushao Nov 13, 2023
d24379c
Rename `--config` to `--model` and Consolidate CLI Messages (#1244)
junrushao Nov 13, 2023
4021785
Specify argument "dest" in argparse (#1245)
junrushao Nov 13, 2023
5005772
Add more stats during quantization (#1246)
junrushao Nov 13, 2023
34c15f2
ensure that max_gen_len is set properly in mlc_chat_config (#1249)
denise-k Nov 13, 2023
7da81a4
[Fix] Memory usage statistics (#1252)
LeshengJin Nov 13, 2023
cd4a8ed
Introduce mlc_chat subcommands (#1251)
junrushao Nov 13, 2023
8305b22
Update mlc-chat-config.json (#1254)
junrushao Nov 14, 2023
5e02cac
[Rust] Support multiple prompts (#1253)
YuchenJin Nov 14, 2023
77a4b69
[UI] Correct "convert_weight_only" to "convert_weights_only" (#1227)
Lunderberg Nov 14, 2023
12efd45
Add a downloader from HuggingFace (#1258)
junrushao Nov 14, 2023
1dbfac5
[Fix] Add prefix_tokens to `ConvConfig` in Python to match C++ implem…
YuchenJin Nov 14, 2023
8d9effe
[nn.Module] Mistral implementation (#1230)
davidpissarra Nov 15, 2023
8304d4c
Add `mlc_chat.__main__` as command line entrypoint (#1263)
junrushao Nov 15, 2023
64e3410
[Rust] Improve ergonomics of `generate` function in `ChatModule` (#1…
YuchenJin Nov 15, 2023
2c00373
[Fix] mistral `max_gen_len` (#1264)
davidpissarra Nov 15, 2023
ceb27d5
Rename `max-sequence-length` to `context-window-size` (#1265)
junrushao Nov 15, 2023
17aa5bf
Auto updated submodule references
Nov 16, 2023
fde2e85
Fix group quantization shape infer (#1273)
cyx-6 Nov 16, 2023
4a137d3
Continuous Model Delivery (#1272)
junrushao Nov 16, 2023
2600b9a
Auto updated submodule references
Nov 17, 2023
31910dd
Enhance Model Delivery (#1283)
junrushao Nov 17, 2023
fb7a224
add python, rest api test (#1278)
Kartik14 Nov 18, 2023
d3b7aad
Enable Jenkins CI (#1292)
Hzfengsy Nov 19, 2023
5fac856
Update android.rst (#1289)
a7k3 Nov 19, 2023
49f75d2
Consolidate Logics for GPU Detection (#1297)
junrushao Nov 20, 2023
01daa64
[CI] Fix lint concurrent clone issue (#1299)
MasterJH5574 Nov 20, 2023
418b9a9
Auto updated submodule references
Nov 20, 2023
b4ba7ca
[Feature] Prefill chunking for non-SWA models (#1280)
davidpissarra Nov 20, 2023
488f65d
Compatible with chatglm (#979)
qc903113684 Nov 20, 2023
2fd1bf5
Add q4/q8_ft_group quantization mode (#1284)
vinx13 Nov 21, 2023
5d96740
[CI] Clean workspace before build (#1304)
MasterJH5574 Nov 21, 2023
9a04de6
[Python] Detect Driver/Device in a Separate Process (#1311)
junrushao Nov 22, 2023
9641676
add chatglm3 support (#1313)
Jasonsey Nov 22, 2023
95f9abe
[SLIM] Skip None param when loading rather than failing (#1308)
CharlieFRuan Nov 22, 2023
9e28540
Auto updated submodule references
Nov 22, 2023
53f2747
[nn.Module] Implement GPT-2 Model Support (#1314)
rickzx Nov 23, 2023
b561810
remove ndk referencce from mali build target (#1312)
shanbady Nov 23, 2023
13759fd
[Rust] A few enhancements (#1310)
YuchenJin Nov 23, 2023
48df439
[iOS] Mistral support (#1320)
davidpissarra Nov 23, 2023
da07940
Add terminator for streaming REST API (#1325)
Sing-Li Nov 23, 2023
992ed42
read CUDA_ARCH_LIST to set CUDA capability versions for nvcc (#1326)
technillogue Nov 24, 2023
3358029
Update emcc.rst
tqchen Nov 24, 2023
fa40ec1
[AUTO-DEVICE] In process early exit device detection (#1333)
tqchen Nov 27, 2023
e7d2ce6
[RestAPI] Update parameters for /v1/completions and add tests (#1335)
anibohara2000 Nov 28, 2023
5dc809e
fix broken REST examples due to recent compatibility change (#1345)
Sing-Li Nov 29, 2023
1f8c2d0
[Bugfix] Ignore exit code in device detection (#1350)
junrushao Nov 29, 2023
02a41e1
[OpenHermes] Add conversation template for OpenHermes Mistral (#1354)
CharlieFRuan Nov 30, 2023
5315d18
[Tokenizer] Prioritize huggingface tokenizer.json, generate one if no…
CharlieFRuan Nov 30, 2023
76c2807
[Rust] Prepare for publishing (#1342)
YuchenJin Nov 30, 2023
2ab39c9
Fix gen_mlc_chat_config for mistral (#1353)
jinhongyii Nov 30, 2023
a4a06d5
Fix ft quantization scale computation (#1321)
vinx13 Nov 30, 2023
dd245c1
[nn.Module] rename `--prefix-symbols` to `--system-lib-prefix` so it …
davidpissarra Dec 1, 2023
ee34b65
[SLIM] Add info to metadata in SLIM and make compatible with llm_chat…
CharlieFRuan Dec 2, 2023
65506f3
[SLM] Support fuse split rotary in SLM flow (#1365)
jinhongyii Dec 3, 2023
9200380
[SLM] UX `mlc_chat compile` improvement (#1371)
davidpissarra Dec 3, 2023
d854d53
[SLM] fix pep 604 (#1372)
davidpissarra Dec 4, 2023
fa604e0
Rename subcommand `gen_mlc_chat_config` to `gen_config` (#1375)
junrushao Dec 4, 2023
a00be77
docs(compile_models.rst) (#1377)
baiyutang Dec 4, 2023
fe4a0fa
Auto updated submodule references
Dec 4, 2023
4216da6
Update compile_models.rst (#1381)
baiyutang Dec 5, 2023
36e3e39
Add OpenHermes to list of conv templates (#1385)
davidpissarra Dec 5, 2023
9bdc8b6
[iOS] OpenHermes-2.5-Mistral-7B support (#1388)
davidpissarra Dec 6, 2023
4e2f4e1
Add scale_attn_by_inverse_layer_idx to GPT-2 model (#1389)
rickzx Dec 7, 2023
8aaedd7
Check config.json before applying system defaults in gen_config (#1391)
rickzx Dec 7, 2023
25cd1b5
Refactor android project into a library and application module (#1248)
tobrun Dec 8, 2023
6e820f5
[Model Support] Add Neural Hermes conversation template (#1402)
CharlieFRuan Dec 8, 2023
4285865
[REST] OpenAI Rest API compatible function calling (#1378)
Kartik14 Dec 9, 2023
1badc69
[Fix] conv templates (#1405)
davidpissarra Dec 9, 2023
f91b5d0
[SLM] Integration with Disco sharding. (#1212)
LeshengJin Dec 10, 2023
73c4762
[SLIM] Support out dtype for quantizeLinear (#1412)
CharlieFRuan Dec 11, 2023
b3199a1
[SLIM] Red-Pajama(GPT-Neox class) (#1408)
LeshengJin Dec 11, 2023
bc5cb4b
[SLIM][Fix] Quick fix on sharding config for non supported models (#1…
CharlieFRuan Dec 11, 2023
002dc89
[SLM] UX improvement: HF download & presets (#1374)
davidpissarra Dec 11, 2023
02cc9f6
Make the generate call optionally stateless (#1413)
plasticchris Dec 11, 2023
7f83ad6
Auto updated submodule references
Dec 11, 2023
53e159b
Establish CUDA CI (#1419)
junrushao Dec 11, 2023
799a47a
[SLM] gpt_bigcode migration (#1418)
davidpissarra Dec 12, 2023
74e795f
[SLM] rename `sliding_window` to `sliding_window_size` (#1425)
davidpissarra Dec 13, 2023
03219f9
[SLM] auto system lib prefix (#1427)
davidpissarra Dec 13, 2023
18c38c4
Update index.md
tqchen Dec 13, 2023
f66103a
[Doc][SLM] Update documentation to rollout SLM (#1421)
CharlieFRuan Dec 13, 2023
11daf9a
Use SizeVar instead of Var for model definition (#1432)
vinx13 Dec 14, 2023
02f1fdf
[Feature] Attention sinks kvcache on mistral (#1435)
davidpissarra Dec 15, 2023
1d49135
[3rdparty] Add back TVM in 3rdparty (#1438)
MasterJH5574 Dec 15, 2023
9d6ae50
[SLM][CLI] cli naming so it matches slm `sliding_window_size` (#1437)
davidpissarra Dec 15, 2023
51fb0f4
[iOS] Update mistral v2 support (#1439)
tqchen Dec 15, 2023
96a68ee
[SLM] Streamlined Compilation Pipeline (#1436)
junrushao Dec 15, 2023
d406c17
Integrate FlashInfer into llama model (#1415)
cyx-6 Dec 16, 2023
fabc092
[Benchmark] Support loading prompt from file (#1445)
junrushao Dec 17, 2023
9e2a79c
Fix MQA and RoPE Application (#1446)
junrushao Dec 17, 2023
b10a499
[Fix][MultiGPU] Compatibility with old flow (#1447)
LeshengJin Dec 17, 2023
a4fefa8
Simplify Multi-GPU Loader Logics (#1449)
junrushao Dec 17, 2023
cbe0702
Auto updated submodule references
Dec 17, 2023
aefefc4
[Delivery] Update delivery script (#1451)
CharlieFRuan Dec 17, 2023
bf2528b
Add Sanity Check in Multi-GPU Loader (#1453)
junrushao Dec 18, 2023
76a77bc
Define `__STDC_FORMAT_MACROS` by default (#1454)
junrushao Dec 18, 2023
26d348a
[SLIM][AWQ] AWQ GEMM support (#1362)
LeshengJin Dec 18, 2023
00f948d
[Smallfix] Fix wizard math template (#1463)
CharlieFRuan Dec 19, 2023
36c0d61
[Docs][SLM] Split compile models into convert weight and compile mode…
CharlieFRuan Dec 20, 2023
1075d68
[Sink] Add sink size to metadata (#1459)
CharlieFRuan Dec 22, 2023
f5ba93c
[Docs][SLM] Add end-to-end tutorial for webLLM in SLM (#1476)
CharlieFRuan Dec 22, 2023
55be546
Create privacy.md
tqchen Dec 22, 2023
f79189b
[Docs][SLM] Change folder struct in doc, update webllm (#1479)
CharlieFRuan Dec 23, 2023
95b17ee
[Docs][Fix] Change webllm model lib link (#1484)
CharlieFRuan Dec 24, 2023
66b0473
Update index.rst (#1482)
zanovis Dec 24, 2023
af4cb7a
[Docs][Prebuilt] Add new prebuilt page (#1490)
CharlieFRuan Dec 25, 2023
37e1de9
[Fix] Skip FuseSplitRotaryEmbedding via attrs (#1472)
LeshengJin Dec 25, 2023
88572b9
Auto updated submodule references
Dec 25, 2023
3e3ccf9
Prefer Extern Operators to `nn.SourceModule` (#1488)
junrushao Dec 26, 2023
7684c07
[Docs][SLM] Update runtime tutorials (#1491)
CharlieFRuan Dec 26, 2023
b1dd17a
[SLM] Store fp32 weight as fp16 on disk (#1496)
cyx-6 Dec 27, 2023
cd84bed
Introduce a Fused RoPE Kernel in TIR (#1493)
junrushao Dec 27, 2023
09ec207
Fix CMake for Android to avoid unlinked android log symbol (#1501)
xunkai55 Dec 27, 2023
779b1a5
Reorganize folder structure (#1502)
junrushao Dec 28, 2023
a9f1b72
[SLM] Enable Debug Dump (#1499)
Hzfengsy Dec 28, 2023
67e1482
[SLIM] Introduce microsoft/Phi-2 (#1469)
LeshengJin Dec 28, 2023
f9a56cd
[SLM] Android refactor for SLM (#1494)
Kartik14 Dec 28, 2023
7ab050e
Improvements to the formatting of Android deploy instructions under t…
Karn Dec 28, 2023
fc3a6ac
[Docs] Add phi1_5 and phi2 prebuilt doc, also gpt2 (#1506)
CharlieFRuan Dec 28, 2023
9b27fc9
[SLM] Red-pajama Disco support (#1507)
LeshengJin Dec 28, 2023
b596ca8
Auto-downloading and Just-in-Time Compilation in ChatModule (#1508)
junrushao Dec 28, 2023
1ec441f
[Fix][Delivery] Use sys executable in delivery (#1510)
CharlieFRuan Dec 29, 2023
7398c87
[Doc] Update define new models page (#1511)
CharlieFRuan Dec 29, 2023
c76b85c
[SLM] Add Prebuilt lib url for Mistral in Docs (#1514)
Kartik14 Dec 29, 2023
2ca6cb9
[Fix] tp bigcode (#1515)
davidpissarra Dec 29, 2023
cb7bd10
[SLM] cublas dispatch (#1380)
jinhongyii Dec 29, 2023
14e74c5
[SLM] Fused Rope Integration (#1518)
LeshengJin Dec 30, 2023
09f5e24
Rename subpackage operator => op (#1521)
junrushao Dec 31, 2023
aa6fdd6
[Doc] Minor Fix Android Doc (#1524)
Hzfengsy Dec 31, 2023
8a8113d
Introduce MLC_JIT_POLICY and Dissolve JITOptions into ChatConfig (#1522)
junrushao Dec 31, 2023
b820ab5
Add support for loading weights from a safetensor file (#1523)
felladrin Jan 1, 2024
fa924b0
Report upper-bound memory usage based on SLM model lib (#1525)
junrushao Jan 1, 2024
cc94447
Refactor tensor parallel (#1509)
jinhongyii Jan 2, 2024
2761e6e
Enable FlashInfer in O2 mode (#1527)
junrushao Jan 2, 2024
073e007
Auto updated submodule references
Jan 2, 2024
791b09a
[SLM] Enable FasterTransformer quantization (#1480)
cyx-6 Jan 2, 2024
1480d32
Support TinyLlama in SLM flow (#1531)
junrushao Jan 3, 2024
095858c
[SLM] Mistral Multi-GPU support (#1532)
LeshengJin Jan 3, 2024
7239a91
[SLM] Batched Llama (#1520)
MasterJH5574 Jan 4, 2024
5e23900
[Fix][Python] `__init__.py` under `nn` subpackage (#1548)
MasterJH5574 Jan 5, 2024
78f283c
[SLM] Fuse FasterTransformer dequantize matmul epilogue (#1544)
cyx-6 Jan 7, 2024
e32c6c9
Mixtral support (#1529)
jinhongyii Jan 8, 2024
0bfb6c0
Support Mixtral on macOS (#1558)
junrushao Jan 8, 2024
e9afc9c
Add Mistral and Phi to model table (#1553)
philippgille Jan 8, 2024
eddc5b1
[Pass] PruneRelaxFunc to remove Relax function based on target (#1555)
MasterJH5574 Jan 8, 2024
6fbbc64
hot fix
masahi Jan 9, 2024
7406684
Merge branch 'main' into upstream-merge-jan09
masahi Jan 9, 2024
89d319e
rm
masahi Jan 9, 2024
10a48be
more rm
masahi Jan 9, 2024
a0ace9d
remove dup
masahi Jan 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
tmp/
dist/
params/
debug/
*.bak
# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ set(
)

set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS} DMLC_USE_LOGGING_LIBRARY=<tvm/runtime/logging.h>)
set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS} __STDC_FORMAT_MACROS=1)
set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS} PICOJSON_USE_INT64)

target_include_directories(mlc_llm_objs PRIVATE ${MLC_LLM_INCLUDES})
target_compile_definitions(mlc_llm_objs PRIVATE ${MLC_LLM_COMPILE_DEFS})
Expand Down Expand Up @@ -125,6 +127,7 @@ target_compile_definitions(mlc_cli_objs PRIVATE ${MLC_LLM_COMPILE_DEFS})
if (CMAKE_SYSTEM_NAME STREQUAL "Android")
target_link_libraries(mlc_llm PRIVATE log)
target_link_libraries(mlc_chat_cli PRIVATE log)
target_link_libraries(tokenizers_cpp PRIVATE log)
endif()

if (MLC_LLM_INSTALL_STATIC_LIB)
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ use off-the-shelf. Visit [Prebuilt Models](https://llm.mlc.ai/docs/prebuilt_mode
<td>StableLM</td>
<td></td>
</tr>
<tr>
<td>Mistral</td>
<td></td>
</tr>
<tr>
<td>Phi</td>
<td></td>
</tr>
</tbody>
</table>

Expand Down
22 changes: 19 additions & 3 deletions android/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
MLCChat/app/src/main/jni/*.h
MLCChat/app/src/main/jni/*.cc
MLCChat/app/src/main/obj
app/src/main/jni/*.h
app/src/main/jni/*.cc
app/src/main/obj

*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties
15 changes: 0 additions & 15 deletions android/MLCChat/.gitignore

This file was deleted.

22 changes: 0 additions & 22 deletions android/MLCChat/app/src/main/assets/app-config.json

This file was deleted.

File renamed without changes.
10 changes: 2 additions & 8 deletions android/MLCChat/app/build.gradle → android/app/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins {

android {
namespace 'ai.mlc.mlcchat'
compileSdk 33
compileSdk 34

defaultConfig {
applicationId "ai.mlc.mlcchat"
Expand All @@ -26,12 +26,6 @@ android {
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
}
sourceSets {
main {
jni.srcDirs = []
jniLibs.srcDirs = ['src/main/libs']
}
}
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
Expand All @@ -53,7 +47,7 @@ android {
}

dependencies {

implementation project(":library")
implementation 'androidx.core:core-ktx:1.10.1'
implementation 'androidx.lifecycle:lifecycle-runtime-ktx:2.6.1'
implementation 'androidx.activity:activity-compose:1.7.1'
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools">
xmlns:tools="http://schemas.android.com/tools"
package="ai.mlc.mlcchat">

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
private var alertMessage = mutableStateOf("")
private var appConfig = AppConfig(
emptyList(),
emptyList<ModelRecord>().toMutableList(),
emptyList<ModelRecord>().toMutableList()
)
private val application = getApplication<Application>()
Expand All @@ -49,10 +48,6 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
loadAppConfig()
}

fun supportedModelLibs(): List<String> {
return appConfig.modelLibs
}

fun isShowingAlert(): Boolean {
return showAlert.value
}
Expand All @@ -78,14 +73,6 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
alertMessage.value = error
}

fun requestAddModel(url: String, localId: String?) {
if (localId != null && localIdSet.contains(localId)) {
issueAlert("localId: $localId has been occupied")
} else {
downloadModelConfig(if (url.endsWith("/")) url else "$url/", localId, false)
}
}

fun requestDeleteModel(localId: String) {
deleteModel(localId)
issueAlert("Model: $localId has been deleted")
Expand All @@ -109,16 +96,17 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
if (modelConfigFile.exists()) {
val modelConfigString = modelConfigFile.readText()
val modelConfig = gson.fromJson(modelConfigString, ModelConfig::class.java)
modelConfig.localId = modelRecord.localId
modelConfig.modelLib = modelRecord.modelLib
addModelConfig(modelConfig, modelRecord.modelUrl, true)
} else {
downloadModelConfig(
if (modelRecord.modelUrl.endsWith("/")) modelRecord.modelUrl else "$modelRecord.modelUrl/",
modelRecord.localId,
if (modelRecord.modelUrl.endsWith("/")) modelRecord.modelUrl else "${modelRecord.modelUrl}/",
modelRecord,
true
)
}
}
modelSampleList += appConfig.modelSamples
}

private fun updateAppConfig(action: () -> Unit) {
Expand All @@ -140,7 +128,7 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
)
if (!isBuiltin) {
updateAppConfig {
appConfig.modelList.add(ModelRecord(modelUrl, modelConfig.localId))
appConfig.modelList.add(ModelRecord(modelUrl, modelConfig.localId, modelConfig.modelLib))
}
}
}
Expand All @@ -157,15 +145,15 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
}

private fun isModelConfigAllowed(modelConfig: ModelConfig): Boolean {
if (appConfig.modelLibs.contains(modelConfig.modelLib)) return true;
if (appConfig.modelLibs.contains(modelConfig.modelLib)) return true
viewModelScope.launch {
issueAlert("Model lib ${modelConfig.modelLib} is not supported.")
}
return false
}


private fun downloadModelConfig(modelUrl: String, localId: String?, isBuiltin: Boolean) {
private fun downloadModelConfig(modelUrl: String, modelRecord: ModelRecord, isBuiltin: Boolean) {
thread(start = true) {
try {
val url = URL("${modelUrl}${ModelUrlSuffix}${ModelConfigFilename}")
Expand All @@ -186,9 +174,8 @@ class AppViewModel(application: Application) : AndroidViewModel(application) {
try {
val modelConfigString = tempFile.readText()
val modelConfig = gson.fromJson(modelConfigString, ModelConfig::class.java)
if (localId != null) {
require(modelConfig.localId == localId)
}
modelConfig.localId = modelRecord.localId
modelConfig.modelLib = modelRecord.modelLib
if (localIdSet.contains(modelConfig.localId)) {
tempFile.delete()
issueAlert("${modelConfig.localId} has been used, please consider another local ID")
Expand Down Expand Up @@ -728,17 +715,17 @@ data class MessageData(val role: MessageRole, val text: String, val id: UUID = U
data class AppConfig(
@SerializedName("model_libs") val modelLibs: List<String>,
@SerializedName("model_list") val modelList: MutableList<ModelRecord>,
@SerializedName("add_model_samples") val modelSamples: MutableList<ModelRecord>
)

data class ModelRecord(
@SerializedName("model_url") val modelUrl: String,
@SerializedName("local_id") val localId: String
@SerializedName("local_id") val localId: String,
@SerializedName("model_lib") val modelLib: String
)

data class ModelConfig(
@SerializedName("model_lib") val modelLib: String,
@SerializedName("local_id") val localId: String,
@SerializedName("model_lib") var modelLib: String,
@SerializedName("local_id") var localId: String,
@SerializedName("tokenizer_files") val tokenizerFiles: List<String>
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ fun StartView(
appViewModel: AppViewModel
) {
val localFocusManager = LocalFocusManager.current
var isAddingModel by rememberSaveable { mutableStateOf(false) }
Scaffold(
topBar = {
TopAppBar(
Expand Down Expand Up @@ -86,72 +85,6 @@ fun StartView(
appViewModel = appViewModel
)
}
if (!isAddingModel) {
item {
TextButton(onClick = { isAddingModel = true }) {
Text(text = "Add Model Variant")
}
}
}
}
if (isAddingModel) {
Text(text = "Supported Base Model Libs", modifier = Modifier.padding(top = 10.dp))
for (lib in appViewModel.supportedModelLibs()) {
Text(
text = lib,
style = MaterialTheme.typography.bodyMedium
)
}
Text(text = "Add Model Variant", modifier = Modifier.padding(top = 10.dp))
LazyColumn() {
items(
items = appViewModel.modelSampleList
) { modelRecord ->
TextButton(onClick = {
appViewModel.requestAddModel(
modelRecord.modelUrl,
modelRecord.localId
)
}) {
Text(text = modelRecord.localId)
}
}
}
Text(text = "Add Model by URL")
SelectionContainer {
Text(
text = "Sample URL: https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int4/",
style = MaterialTheme.typography.bodyMedium
)
}
var url by rememberSaveable { mutableStateOf("") }
OutlinedTextField(
value = url,
onValueChange = { url = it },
label = { Text(text = "Model URL") },
maxLines = 3,
modifier = Modifier.fillMaxWidth()
)
Row(
horizontalArrangement = Arrangement.End,
verticalAlignment = Alignment.CenterVertically,
modifier = Modifier
.fillMaxWidth()
.wrapContentHeight()
) {
TextButton(onClick = { url = "" }) {
Text(text = "Clear")
}
TextButton(onClick = { isAddingModel = false }) {
Text(text = "Cancel")
}
TextButton(onClick = {
appViewModel.requestAddModel(url, null)
url = ""
}) {
Text(text = "Add")
}
}
}
}
if (appViewModel.isShowingAlert()) {
Expand Down
4 changes: 2 additions & 2 deletions android/MLCChat/build.gradle → android/build.gradle
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// Top-level build file where you can add configuration options common to all sub-projects/modules.
plugins {
id 'com.android.application' version '8.0.1' apply false
id 'com.android.library' version '8.0.1' apply false
id 'com.android.application' version '8.1.2' apply false
id 'com.android.library' version '8.1.2' apply false
id 'org.jetbrains.kotlin.android' version '1.8.10' apply false
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions android/library/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/build
2 changes: 1 addition & 1 deletion android/CMakeLists.txt → android/library/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ project(mlc-chat C CXX)
set(ANDROID_DIR ${CMAKE_CURRENT_LIST_DIR})
set(ANDROID_BIN_DIR ${CMAKE_CURRENT_BINARY_DIR})

set(MLC_LLM_DIR ${ANDROID_DIR}/..)
set(MLC_LLM_DIR ${ANDROID_DIR}/../..)
set(MLC_LLM_BINARY_DIR mlc_llm)
set(MLC_LLM_COMPILE_DEFS TVM_LOG_CUSTOMIZE=1)
add_subdirectory(${MLC_LLM_DIR} ${MLC_LLM_BINARY_DIR} EXCLUDE_FROM_ALL)
Expand Down
38 changes: 38 additions & 0 deletions android/library/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
plugins {
id 'com.android.library'
id 'org.jetbrains.kotlin.android'
}

android {
namespace 'ai.mlc.mlcllm'
compileSdk 34

defaultConfig {
minSdk 22
}
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
kotlinOptions {
jvmTarget = '1.8'
}
sourceSets {
main {
jniLibs.srcDirs = ['build/output']
}
}
}

dependencies {
implementation fileTree(dir: 'build/output', include: ['*.jar'])
implementation 'androidx.core:core-ktx:1.9.0'
implementation 'androidx.appcompat:appcompat:1.6.1'
implementation 'com.google.android.material:material:1.10.0'
}






Loading