nexveridian-web/content/blog/llm-inference-benchmarks.md
2025-08-13 21:43:52 -07:00

5 KiB

+++ title = "LLM Inference Benchmarks - Apple M4 Max 48GB 16 Core 16-inch using LM Studio" date = 2025-05-06

[taxonomies] tags = ["llm", "benchmarks", "llm-benchmarks", "lm-studio"] +++

Size (B) Speed (T/s) Model Type Quant Spec Dec (B) Spec Quant
1.5 282 qwen 2.5 MLX 4 - -
1.5 76 qwen 2.5 MLX 8 - -
7 70 qwen 2.5 GUFF Q4_K_M - -
7 101 qwen 2.5 MLX 4 - -
7 58 qwen 2.5 MLX 8 - -
12 35 wayfarer GUFF Q6_K - -
12 65 wayfarer MLX 4 - -
12 45 wayfarer MLX 6 - -
12 36 wayfarer MLX 8 - -
14 36 qwen 2.5 GUFF Q4_K_M - -
14 52 qwen 2.5 MLX 4 - -
14 55 qwen 2.5 MLX 4 1.5 4
14 30 qwen 2.5 MLX 8 - -
24 35 mistral small 3 MLX 4 - -
32 18 qwen 2.5 GUFF Q4_K_M - -
32 23 qwen 2.5 MLX 4 - -
32 30 qwen 2.5 MLX 4 1.5 4
32 30 qwen 2.5 MLX 4 1.5 4
32 34 qwen 2.5 MLX 4 1.5 8
32 26 qwen 2.5 r1 MLX 4 1.5 4
32 33 qwen 2.5 coder MLX 4 1.5 4
32 31 qwen 2.5 coder MLX 4 3 4
32 25 qwq MLX 3 - -
32 24 qwq MLX 4 - -
32 18 qwq MLX 4 1.5 4
32 22 qwq MLX 4 1.5 8
32 16 qwq MLX 4 7 4
32 16 qwq MLX 4 7 8
32 16 qwq MLX 6 - -
32 16 qwq MLX 6 1.5 4
32 16 qwq MLX 6 1.5 8
70 12 wayfarer large GUFF Q2_K_S - -
70 15 wayfarer large MLX 3 - -
30 - A3 93 qwen 3 MLX 4 - -
30 - A3 76 qwen 3 MLX 4 1.7 4
30 - A3 81 qwen 3 MLX 6 - -
30 - A3 70 qwen 3 MLX 6 1.7 4
30 - A3 70 qwen 3 MLX 8 - -
32 22 qwen 3 MLX 4 - -
32 26 qwen 3 MLX 4 1.7 4
24 18 Devstral Small 2507 MLX 8 - -

mlx convert and upload to huggingface

https://huggingface.co/docs/hub/en/mlx

https://huggingface.co/mlx-community

git clone git@github.com:NexVeridian/NexVeridian-web.git

just uv

just mlx_create "Qwen/QwQ-32B" "4 6 8" "/Users/elijahmcmorris/.cache/lm-studio/models" "mlx-community" fasle false
# or
uv venv
uv pip install huggingface_hub hf_transfer mlx_lm
uv run huggingface-cli login

uv run mlx_lm.convert --hf-path Qwen/QwQ-32B -q --q-bits 4 --upload-repo mlx-community/QwQ-32B-4bit --mlx-path /Users/elijahmcmorris/.cache/lm-studio/models/mlx-community/QwQ-32B-4bit

or use https://huggingface.co/spaces/mlx-community/mlx-my-repo

LLM Settings.md

Qwen 3

Temp Min P Top P Top K Repeat P
0.6 0.00 0.95 20 -

Qwen 3 /no_think

Temp Min P Top P Top K Repeat P
0.7 0.00 0.80 20 1.5