🏆 The GPU-Poor LLM Gladiator Arena 🏆 v25.03

Step right up to the arena where frugal meets fabulous in the world of AI! Watch as our compact contenders (maxing out at 14B parameters) duke it out in a battle of wits and words.

What started as a simple experiment has grown into a popular platform for evaluating compact language models. As the arena continues to expand with more models, features, and battles, it requires computational resources to maintain and improve. If you find this project valuable and would like to support its development, consider sponsoring:

Main Leaderboard

This leaderboard uses a scoring system that balances win rate and total battles. The score is calculated using the formula: Score = Win Rate * (1 - 1 / (Total Battles + 1))

This formula rewards models with higher win rates and more battles. As the number of battles increases, the score approaches the win rate.

Leaderboard

Leaderboard

📅 2025-03-19

🚀 Added new models: - Gemma 3 Abliterated (12B, 4-bit) by Maxime Labonne - Gemma 3 Abliterated (4B, 4-bit) by Maxime Labonne 🎭 Other changes: - Removed Granite 3 Moe (1B, Q5_K_M) - Added Granite 3 Moe (1B, 8-bit) with fixed weights. 🚨 Notes: Please be aware that the total storage used by all models has surpassed 400GB, so adding new models will depend on the server's available storage capacity. Models that have participated in approximately 200 battles may be retired from the server and removed from future battles to optimize storage space.

📅 2025-03-13

🚀 Added new models: - OpenThinker (7B, 4-bit) - Granite3.2 (2B, 4-bit) - Granite3.2 (8B, 4-bit) - Phi4 Mini (3.8B, 4-bit) - Gemma 3 (12B, 4-bit) - Gemma 3 (4B, 4-bit) - Gemma 3 (1B, 4-bit)

📅 2025-01-29

🚀 New Models: - Added DeepSeek R1 Qwen (14B, 4-bit) - Added DeepSeek R1 Qwen (7B, 4-bit) - Added DeepSeek R1 Llama (8B, 4-bit) - Added Command R7B (7B, 4-bit) - Added OLMo-2 (13B, 4-bit) 🚀 Performance Updates: - Increased API timeout from 100s to 180s to better support 'reasoning' models - These models often take longer to generate high-quality responses as they perform more thorough analysis 🛠️ Technical Improvements: - Added expander for the thinking process - Enhanced error logging for timeout events - Better error message clarity for users - More robust exception handling for API timeouts

📅 2025-01-10

🚀 New Models: - Added Phi4 Unsloth (14B, 4-bit)

📅 2025-01-08

🚀 New Models: - Added Phi4 (14B, 4-bit)

📅 2025-01-06

🚀 New Models: - Added Virtuoso Small (14B, 4-bit) - Added Llama 3.1 SuperNova Lite TIES with Base (8B, 4-bit) - Added Llama 3.1 Storm (8B, 4-bit) - Added Dolphin 3 (8B, 4-bit) - Added SmallThinker (3B, 4-bit) - Added OLMo-2 (7B, 4-bit) - Added Homer v1.0 (7B, 4-bit) - Improved model selection algorithm - Enhanced error handling 🐛 Fixes: - Fixed incompatibility with the latest OpenAI Python library - Fixed various minor issues - Updated documentation

Last updated: 2025-03-19

Model Performance Statistics

This tab shows detailed performance metrics for each model, tested using a creative writing prompt. The tests were performed on an AMD Radeon RX 7600 XT 16GB GPU.

For detailed information about the testing methodology, parameters, and hardware setup, please refer to the README_model_stats.md.

Model Performance Statistics

Model Performance Statistics

hf.co/bartowski/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base-GGUF:Q4_K_M	10.65	10035342336	10.30B	Q4_K_M	137.44	122.27	1279	105.37


qwen2.5:0.5b-instruct-q8_0	1.34	1438029824	0.49B	Q8_0	137.44	122.27	589	4.29
llama3.2:1b-instruct-q8_0	2.53	2712725504	1.20B	Q8_0	91.13	81.68	627	6.88
tinyllama:1.1b-chat-v1-q8_0	2.22	2388813824	1.00B	Q8_0	85.55	75.64	561	6.56
falcon3:1b-instruct-q8_0	2.74	2944983040	1.70B	Q8_0	80.16	72.21	656	8.18
hf.co/Felladrin/gguf-1.5-Pints-16K-v0.1:Q8_0	2.72	2917140480	1.57B	Q8_0	77.44	69.24	614	7.93
hf.co/bartowski/Replete-LLM-V2.5-Qwen-1.5b-GGUF:Q8_0	2.54	2727139328	1.78B	Q8_0	75.91	65.36	468	6.17
qwen2.5:1.5b-instruct-q8_0	2.54	2727139328	1.50B	Q8_0	69.62	61.51	558	8.02
stablelm2:1.6b-chat-q8_0	2.5	2682466304	2.00B	Q8_0	69.11	58.42	420	6.08
hf.co/bartowski/google_gemma-3-1b-it-GGUF:Q4_K_M	1.92	2063938560	1.00B	Q4_K_M	64.1	58.51	746	11.64
smollm2:1.7b-instruct-q8_0	4.3	4617299968	1.70B	Q8_0	61.7	53.29	477	7.73
gemma2:2b-instruct-q4_0	3.34	3581469696	2.60B	Q4_0	59	52.45	585	9.91
qwen2.5:3b-instruct-q4_K_M	2.87	3078209536	3.10B	Q4_K_M	50.71	45.74	663	13.07
hf.co/bartowski/Llama-3.2-3B-Instruct-uncensored-GGUF:Q4_K_M	3.7	3972362240	3.61B	Q4_K_M	50.55	45.84	698	13.81
llama3.2:3b-instruct-q4_K_M	3.7	3972362240	3.20B	Q4_K_M	49.66	44.68	648	13.05
smallthinker:3b-preview-q4_K_M	2.65	2845425664	3.40B	Q4_K_M	48.55	46.08	1279	26.34
hf.co/bartowski/Nemotron-Mini-4B-Instruct-GGUF:Q4_K_M	4.12	4426211328	4.19B	Q4_K_M	47.74	35.42	252	5.28
falcon3:3b-instruct-q8_0	4.33	4644655104	3.20B	Q8_0	47.58	42.78	645	13.56
hf.co/bartowski/OLMoE-1B-7B-0924-Instruct-GGUF:Q4_K_M	5.8	6224958122	6.92B	Q4_K_M	46.98	42.37	662	14.09
exaone3.5:2.4b-instruct-q8_0	3.89	4180785152	2.70B	Q8_0	45.78	41.56	706	15.42
phi4-mini:3.8b-q4_K_M	4.35	4672091136	3.80B	Q4_K_M	44.09	35.16	321	7.28
granite3.2:2b-instruct-q4_K_M	2.98	3196829274	2.50B	Q4_K_M	38.18	34.59	692	18.13
granite3.1-moe:1b-instruct-q8_0	2.34	2510306496	1.30B	Q8_0	36.97	32.57	546	14.77
yi:6b-chat-v1.5-q4_0	4.7	5047781376	6.00B	Q4_0	36.61	32.48	576	15.74
granite3-dense:2b-instruct-q8_0	3.28	3520013354	2.60B	Q8_0	34.37	28.27	366	10.65
falcon3:7b-instruct-q4_K_M	5.79	6218170368	7.50B	Q4_K_M	32.94	29	543	16.49
granite3-moe:1b-instruct-q5_K_M	1.85	1985343488	1.30B	Q5_K_M	32.64	24.81	271	8.3
qwen2.5:7b-instruct-q4_K_M	5.58	5986631680	7.60B	Q4_K_M	32.43	28.89	596	18.38
hermes3:8b-llama3.1-q4_0	6.2	6654289920	8.00B	Q4_0	32.38	26.91	385	11.89
deepseek-r1:7b-qwen-distill-q4_K_M	5.58	5986631680	7.60B	Q4_K_M	32.24	30.94	1604	49.75
hf.co/bartowski/Replete-LLM-V2.5-Qwen-7b-GGUF:Q4_K_M	5.58	5986631680	7.62B	Q4_K_M	31.5	28.8	758	24.06
marco-o1:7b-q4_K_M	5.58	5986631680	7.60B	Q4_K_M	31.5	29.62	1092	34.67
granite3-moe:3b-instruct-q4_K_M	3.15	3381620420	3.40B	Q4_K_M	30.85	26	414	13.42
dolphin3:8b-llama3.1-q4_K_M	6.45	6930039360	8.00B	Q4_K_M	30.6	25.93	426	13.92
llama3.1:8b-instruct-q4_0	6.2	6654289920	8.00B	Q4_0	30.35	27.29	646	21.29
aya:8b-23-q4_0	6.6	7082971136	8.00B	F16	29.88	27.18	719	24.06
hf.co/bartowski/Llama-3.1-SuperNova-Lite-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	29.49	26.2	583	19.77
openthinker:7b-q4_K_M	5.58	5986631680	7.60B	Q4_K_M	29.37	28.47	2129	72.49
granite3.1-dense:2b-instruct-q8_0	4.07	4371457066	2.50B	Q8_0	28.95	25.22	505	17.44
hf.co/unsloth/gemma-3-4b-it-GGUF:Q4_K_M	5.21	5590989504	3.88B	Q4_K_M	28.6	25.84	674	23.57
exaone3.5:7.8b-instruct-q4_K_M	6.49	6971779754	7.80B	Q4_K_M	28.43	25.2	573	20.16
phi3.5:3.8b-mini-instruct-q4_0	6.07	6521716224	3.80B	Q4_0	28.29	27.67	2981	105.37
hf.co/bartowski/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	28.14	25.34	653	23.21
hf.co/bartowski/Humanish-LLama3-8B-Instruct-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	28.1	24.14	461	16.4
hf.co/akjindal53244/Llama-3.1-Storm-8B-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	27.83	24.85	608	21.85
mistral:7b-instruct-v0.3-q4_0	5.9	6333450240	7.20B	Q4_0	27.69	23.89	473	17.08
tulu3:8b-q4_K_M	6.45	6930059520	8.00B	Q4_K_M	27.67	24.44	556	20.09
granite3.1-moe:3b-instruct-q8_0	4.58	4919532320	3.30B	Q8_0	27.42	23.87	502	18.31
hf.co/bartowski/Llama-3.1-Hawkish-8B-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	27.37	24.57	637	23.28
hf.co/bartowski/Llama-3.1-SauerkrautLM-8b-Instruct-GGUF:Q4_K_M	6.45	6930032640	8.03B	Q4_K_M	27.35	24.46	616	22.52
hf.co/bartowski/OLMo-2-1124-7B-Instruct-GGUF:Q4_K_M	5.71	6132550314	7.30B	Q4_K_M	27.09	24.15	599	22.11
yi:9b-chat-v1.5-q4_0	6.4	6873876480	9.00B	Q4_0	27.05	24.24	626	23.15
deepseek-r1:8b-llama-distill-q4_K_M	6.45	6930032640	8.00B	Q4_K_M	27.03	26.06	1809	66.94
command-r7b:7b-12-2024-q4_K_M	6.97	7487335082	8.00B	Q4_K_M	26.57	23.41	546	20.55
hf.co/bartowski/aya-expanse-8b-GGUF:Q4_K_M	6.85	7358713856	8.03B	Q4_K_M	26.38	23.46	587	22.25
glm4:9b-chat-q4_K_M	6.45	6924847104	9.40B	Q4_K_M	25.13	22.78	695	27.66
internlm2:7b-chat-v2.5-q4_K_M	6.46	6938663594	7.70B	Q4_K_M	25.07	21.77	494	19.71
gemma2:9b-instruct-q4_0	8.8	9445128192	9.20B	Q4_0	24.15	20.3	408	16.9
falcon3:10b-instruct-q4_K_M	7.77	8338612224	10.30B	Q4_K_M	23.33	20.52	540	23.15
hf.co/bartowski/INTELLECT-1-Instruct-GGUF:Q4_K_M	7.99	8573839360	10.20B	Q4_K_M	22.31	20.08	650	29.14
granite3-dense:8b-instruct-q4_K_M	5.69	6114639029	8.20B	Q4_K_M	21.1	17.26	357	16.92
hf.co/bartowski/EuroLLM-9B-Instruct-GGUF:Q4_K_M	6.07	6517899264	9.15B	Q4_K_M	20.93	18.23	504	24.08
falcon2:11b-q4_K_M	7.58	8143568896	11.00B	Q4_K_M	20.63	18.61	664	32.18
stablelm2:12b-chat-q4_K_M	9.35	10035342336	12.00B	Q4_K_M	19.08	16.16	424	22.22
granite3.2:8b-instruct-q4_K_M	7.28	7817526453	8.20B	Q4_K_M	18.73	16.52	551	29.41
granite3.1-dense:8b-instruct-q4_K_M	5.74	6163400789	8.20B	Q4_K_M	18.71	16.87	661	35.33
solar:10.7b-instruct-v1-q4_K_M	10.65	11434518528	11.00B	Q4_K_M	17.86	15.46	484	27.1
hf.co/unsloth/phi-4-GGUF:Q4_K_M	10.94	11743481856	14.70B	Q4_K_M	17.36	15.51	610	35.15
mistral-nemo:12b-instruct-2407-q4_K_M	9.05	9716029440	12.20B	Q4_K_M	16.57	15.34	880	53.11
phi4:14b-q4_K_M	9.47	10169584298	14.70B	Q4_K_M	16.44	15.13	816	49.63
hf.co/arcee-ai/Virtuoso-Small-GGUF:Q4_K_M	10.76	11550902272	14.80B	Q4_K_M	16.12	14.11	521	32.32
olmo2:13b-1124-instruct-q4_K_M	15.59	16741616298	13.70B	Q4_K_M	15.53	13.93	629	40.5
deepseek-r1:14b-qwen-distill-q4_K_M	10.76	11550902272	14.80B	Q4_K_M	15.09	14.33	1280	84.8
hf.co/unsloth/gemma-3-12b-it-GGUF:Q4_K_M	9.44	10137922880	11.80B	Q4_K_M	10.58	9.91	1024	96.77

🏆 The GPU-Poor LLM Gladiator Arena 🏆 v25.03

Main Leaderboard

ELO Rating System

Model Performance Statistics