AI vs Thai Exams

This dashboard showcases how different large language models (LLMs) perform on various Thai standardized tests.

View on GitHub

Overall Ranking

Model Cost Overall Acc
gemini-2.5-pro-preview-03-25 ฿176.93 312/375 83.20%
o1-2024-12-17 ฿1138.36 310/375 82.67%
claude-3-7-sonnet-20250219[thinking=16k] ฿316.02 308/375 82.13%
o3-2025-04-16[high] ฿663.36 308/372 82.13%
o3-2025-04-16[low] ฿291.69 307/375 81.87%
o3-2025-04-16[medium] ฿455.48 305/372 81.33%
gemini-2.5-flash-preview-04-17 ฿68.42 299/375 79.73%
gemini-2.5-flash-preview-04-17[no-thinking] ฿2.91 299/374 79.73%
o4-mini-2025-04-16[medium] ฿46.65 298/375 79.47%
o4-mini-2025-04-16[high] ฿79.33 298/375 79.47%
deepseek-r1 ฿105.33 291/375 77.60%
gpt-4.5-preview-2025-02-27 ฿725.65 290/375 77.33%
o4-mini-2025-04-16[low] ฿48.12 290/375 77.33%
grok-3-mini-beta[thinking] ฿2.48 289/375 77.07%
claude-3-5-sonnet-20241022 ฿77.15 287/375 76.53%
gemini-2.0-flash-thinking-exp-01-21 287/375 76.53%
gpt-4.1-2025-04-14 ฿26.93 287/375 76.53%
claude-3-7-sonnet-20250219 ฿62.32 286/375 76.27%
o3-mini-2025-01-31[low] ฿75.48 284/375 75.73%
qwq-32b ฿30.91 282/375 75.20%
o3-mini-2025-01-31[high] ฿158.30 280/375 74.67%
deepseek-chat-v3-0324 ฿11.87 279/375 74.40%
o3-mini-2025-01-31[medium] ฿74.98 274/375 73.07%
gemini-1.5-pro-002 ฿20.36 273/375 72.80%
llama-4-maverick ฿3.03 273/375 72.80%
qwen-max-2025-01-25 ฿36.14 271/375 72.27%
gemini-2.0-flash-001 ฿1.44 270/375 72.00%
grok-3-beta ฿57.40 270/375 72.00%
gpt-4o-2024-08-06 ฿44.69 265/375 70.67%
typhoon-v2-r1-70b-preview ฿10.33 264/375 70.40%
gpt-4.1-mini-2025-04-14 ฿5.54 261/375 69.60%
gemini-2.0-flash-lite-001 ฿1.78 259/375 69.07%
gemma-3-27b-it ฿0.92 256/375 68.27%
typhoon-v2-70b-instruct ฿7.71 249/375 66.40%
llama-3.3-70b-instruct ฿1.50 244/375 65.07%
mistral-large-2411 ฿35.12 233/375 62.13%
command-a-03-2025 ฿64.42 229/375 61.07%
llama-4-scout ฿3.15 228/375 60.80%
gpt-4o-mini-2024-07-18 ฿2.02 227/375 60.53%
phi-4 ฿1.46 211/375 56.27%
gpt-4.1-nano-2025-04-14 ฿1.56 206/375 54.93%