AI vs Thai Exams

This dashboard showcases how different large language models (LLMs) perform on various Thai standardized tests.

View on GitHub

Overall Ranking

Model Cost Overall Acc
gemini-3-flash-preview[high] ฿98.97 322/375 85.87%
gemini-3-pro-preview ฿42.52 321/375 85.60%
claude-opus-4-6[high] ฿518.04 321/375 85.60%
claude-sonnet-4-6[high] ฿340.54 316/375 84.27%
gemini-2.5-pro ฿22.01 315/375 84.00%
claude-opus-4-5-20251101[thinking=16k] ฿495.30 313/375 83.47%
gemini-3-flash-preview ฿8.55 311/375 82.93%
o1-2024-12-17 ฿1138.36 310/375 82.67%
o3-2025-04-16[high] ฿690.56 310/374 82.67%
claude-opus-4-5-20251101 ฿183.53 309/375 82.40%
kimi-k2.5 ฿84.50 309/375 82.40%
gpt-5.2 ฿71.67 309/375 82.40%
claude-opus-4-6 ฿168.26 308/375 82.13%
o3-2025-04-16[medium] ฿505.95 307/375 81.87%
o3-2025-04-16[low] ฿291.69 307/375 81.87%
glm-5 ฿101.59 307/375 81.87%
qwen3.5-397b-a17b ฿36.07 306/375 81.60%
gemini-2.5-flash ฿14.70 305/375 81.33%
claude-sonnet-4-6 ฿110.11 305/375 81.33%
claude-sonnet-4-5-20250929[thinking=16k] ฿204.08 302/375 80.53%
grok-4 ฿485.99 302/375 80.53%
o4-mini-2025-04-16[medium] ฿46.65 298/375 79.47%
o4-mini-2025-04-16[high] ฿79.33 298/375 79.47%
gpt-5.2-chat ฿47.42 297/375 79.20%
deepseek-r1-0528 ฿65.87 295/375 78.67%
gpt-5-mini-2025-08-07 ฿23.55 293/375 78.13%
glm-4.7 ฿108.63 292/375 77.87%
qwen3-next-80b-a3b-thinking ฿18.40 291/375 77.60%
o4-mini-2025-04-16[low] ฿48.12 290/375 77.33%
qwen3.5-plus-02-15 ฿11.78 290/375 77.33%
deepseek-reasoner-v3.1 ฿20.98 289/375 77.07%
qwen3-max-thinking ฿43.65 288/375 76.80%
claude-sonnet-4-5-20250929 ฿89.53 287/375 76.53%
qwq-32b ฿30.91 282/375 75.20%
glm-4.5 ฿56.42 282/375 75.20%
qwen3-235b-a22b ฿10.92 281/375 74.93%
gemini-2.5-flash[no-thinking] ฿6.50 280/375 74.67%
qwen3-next-80b-a3b-instruct ฿5.22 280/375 74.67%
deepseek-chat-v3.1 ฿5.85 278/375 74.13%
claude-haiku-4-5-20251001[thinking=16k] ฿248.59 276/375 73.60%
glm-4.5-air ฿36.94 275/375 73.33%
gpt-5.1-2025-11-13 ฿27.25 273/375 72.80%
llama-4-maverick ฿3.03 273/375 72.80%
minimax-m2.5 ฿15.99 272/375 72.53%
qwen-max-2025-01-25 ฿36.14 271/375 72.27%
qwen3-32b ฿3.54 268/375 71.47%
gpt-5-nano-2025-08-07 ฿9.31 265/375 70.67%
gpt-oss-120b ฿3.75 264/375 70.40%
typhoon-v2-r1-70b-preview ฿10.33 264/375 70.40%
qwen3-30b-a3b ฿4.20 261/375 69.60%
claude-haiku-4-5-20251001 ฿109.19 260/375 69.33%
gemma-3-27b-it ฿0.92 256/375 68.27%
typhoon-v2-70b-instruct ฿7.71 249/375 66.40%
nova-pro-v1 ฿12.07 248/375 66.13%
gpt-oss-20b ฿2.25 246/375 65.60%
typhoon-v2.1-12b-instruct ฿1.68 232/375 61.87%
llama-4-scout ฿3.15 228/375 60.80%
glm-4.7-flash ฿29.32 222/375 59.20%
phi-4 ฿1.46 211/375 56.27%
nova-micro-v1 ฿1.06 208/375 55.47%
nova-lite-v1 ฿0.93 204/375 54.40%