feat: add price score and efficiency ranking

This commit is contained in:
laurent
2026-02-22 16:50:58 +01:00
parent 4f9459e558
commit b47f2a17a2
2 changed files with 130 additions and 109 deletions

132
README.md
View File

@@ -1,76 +1,80 @@
# Table des Modèles Mammouth.ai # Dashboard des Modèles Mammouth.ai
*Généré automatiquement à partir des benchmarks d'Artificial Analysis et des tarifs Mammouth.ai.* *Analyse comparative basée sur le prix (Mammouth) et la performance (Artificial Analysis).*
Dernière mise à jour : 2026-02-22 16:47:56 Dernière mise à jour : 2026-02-22 16:50:58
### Légende :
- **Note Prix** : 10 = Le moins cher, 0 = Le plus cher.
- **Efficience** : Ratio Performance / Prix. Un score élevé indique un excellent rapport qualité/prix.
## Coding ## Coding
| Modèle | Prix (In / Out / 1M) | Score (Intelligence) | Vitesse (TPS) | | Modèle | Prix (In/Out 1M) | Score | Vitesse | Note Prix | **Efficience** |
| :--- | :--- | :--- | :--- | | :--- | :--- | :--- | :--- | :--- | :--- |
| grok-code-fast-1 | $0.20 / $1.50 | **23.7** | 314.1 | | grok-code-fast-1 | $0.20/$1.50 | **23.7** | 314.1 | 9.8 | **23.3** |
| qwen3-coder | $0.22 / $0.95 | N/A | N/A | | qwen3-coder | $0.22/$0.95 | N/A | N/A | 9.9 | **19.7** |
| codestral-2508 | $0.30 / $0.90 | N/A | N/A | | codestral-2508 | $0.30/$0.90 | N/A | N/A | 9.9 | **19.7** |
| qwen3-coder-plus | $1.80 / $9.00 | N/A | N/A | | qwen3-coder-flash | $0.50/$2.00 | N/A | N/A | 9.7 | **19.4** |
| qwen3-coder-flash | $0.50 / $2.00 | N/A | N/A | | qwen3-coder-plus | $1.80/$9.00 | N/A | N/A | 8.8 | **17.6** |
## Agents ## Agents
| Modèle | Prix (In / Out / 1M) | Score (Intelligence) | Vitesse (TPS) | | Modèle | Prix (In/Out 1M) | Score | Vitesse | Note Prix | **Efficience** |
| :--- | :--- | :--- | :--- | | :--- | :--- | :--- | :--- | :--- | :--- |
| sonar-pro | $3.00 / $15.00 | **15.2** | 129.2 | | sonar-deep-research | $2.00/$8.00 | N/A | N/A | 8.8 | **17.7** |
| sonar-deep-research | $2.00 / $8.00 | N/A | N/A | | sonar-pro | $3.00/$15.00 | **15.2** | 129.2 | 8.0 | **12.2** |
## General ## General
| Modèle | Prix (In / Out / 1M) | Score (Intelligence) | Vitesse (TPS) | | Modèle | Prix (In/Out 1M) | Score | Vitesse | Note Prix | **Efficience** |
| :--- | :--- | :--- | :--- | | :--- | :--- | :--- | :--- | :--- | :--- |
| gemini-3-pro-preview | $2.00 / $12.00 | **48.4** | 138.8 | | kimi-k2.5 | $0.60/$3.00 | **46.7** | 44.8 | 9.6 | **44.9** |
| kimi-k2.5 | $0.60 / $3.00 | **46.7** | 44.8 | | gemini-3-pro-preview | $2.00/$12.00 | **48.4** | 138.8 | 8.5 | **41.2** |
| claude-opus-4-6 | $5.00 / $25.00 | **46.4** | 66.8 | | gpt-5-mini | $0.25/$2.00 | **41.0** | 75.3 | 9.8 | **40.1** |
| claude-opus-4-5 | $5.00 / $25.00 | **43.0** | 65.1 | | kimi-k2-thinking | $0.55/$2.50 | **40.7** | 86.9 | 9.7 | **39.3** |
| grok-4-0709 | $3.00 / $15.00 | **41.4** | 39.4 | | gemini-3-flash-preview | $0.50/$3.00 | **35.1** | 177.9 | 9.6 | **33.8** |
| gpt-5-mini | $0.25 / $2.00 | **41.0** | 75.3 | | grok-4-0709 | $3.00/$15.00 | **41.4** | 39.4 | 8.0 | **33.1** |
| kimi-k2-thinking | $0.55 / $2.50 | **40.7** | 86.9 | | deepseek-v3.2 | $0.27/$0.42 | **32.1** | 49.1 | 9.9 | **31.8** |
| gemini-3-flash-preview | $0.50 / $3.00 | **35.1** | 177.9 | | claude-opus-4-6 | $5.00/$25.00 | **46.4** | 66.8 | 6.7 | **30.9** |
| gemini-2.5-pro | $2.50 / $15.00 | **34.5** | 158.9 | | o4-mini | $1.10/$4.40 | **33.0** | 133.8 | 9.4 | **30.9** |
| o4-mini | $1.10 / $4.40 | **33.0** | 133.8 | | claude-opus-4-5 | $5.00/$25.00 | **43.0** | 65.1 | 6.7 | **28.7** |
| claude-4-sonnet-20250522 | $3.00 / $15.00 | **33.0** | 72.8 | | gemini-2.5-pro | $2.50/$15.00 | **34.5** | 158.9 | 8.1 | **28.0** |
| deepseek-v3.2 | $0.27 / $0.42 | **32.1** | 49.1 | | deepseek-v3.1-terminus | $0.27/$1.00 | **28.4** | N/A | 9.9 | **28.0** |
| claude-3-7-sonnet-20250219 | $3.00 / $15.00 | **30.8** | N/A | | deepseek-v3.1 | $0.27/$1.00 | **28.0** | N/A | 9.9 | **27.6** |
| deepseek-v3.1-terminus | $0.27 / $1.00 | **28.4** | N/A | | gpt-5-nano | $0.05/$0.40 | **26.7** | 130.9 | 10.0 | **26.6** |
| deepseek-v3.1 | $0.27 / $1.00 | **28.0** | N/A | | claude-4-sonnet-20250522 | $3.00/$15.00 | **33.0** | 72.8 | 8.0 | **26.4** |
| deepseek-r1-0528 | $0.50 / $2.18 | **27.0** | N/A | | deepseek-r1-0528 | $0.50/$2.18 | **27.0** | N/A | 9.7 | **26.2** |
| gpt-5-nano | $0.05 / $0.40 | **26.7** | 130.9 | | kimi-k2-instruct | $0.50/$2.50 | **26.2** | 40.8 | 9.7 | **25.3** |
| kimi-k2-instruct | $0.50 / $2.50 | **26.2** | 40.8 | | claude-3-7-sonnet-20250219 | $3.00/$15.00 | **30.8** | N/A | 8.0 | **24.7** |
| gpt-4.1 | $2.00 / $8.00 | **25.6** | 103.9 | | grok-4-1-fast | $0.20/$0.50 | **23.5** | 119.7 | 9.9 | **23.3** |
| grok-3 | $3.00 / $15.00 | **25.0** | 67.7 | | gpt-4.1 | $2.00/$8.00 | **25.6** | 103.9 | 8.8 | **22.6** |
| grok-4-1-fast | $0.20 / $0.50 | **23.5** | 119.7 | | mistral-large-3 | $0.50/$1.50 | **22.7** | 55.9 | 9.8 | **22.1** |
| mistral-large-3 | $0.50 / $1.50 | **22.7** | 55.9 | | gpt-4.1-mini | $0.40/$1.60 | **22.4** | 77.4 | 9.8 | **21.9** |
| gpt-4.1-mini | $0.40 / $1.60 | **22.4** | 77.4 | | mistral-medium-3.1 | $0.40/$2.00 | **21.1** | 86.8 | 9.7 | **20.5** |
| mistral-medium-3.1 | $0.40 / $2.00 | **21.1** | 86.8 | | grok-3 | $3.00/$15.00 | **25.0** | 67.7 | 8.0 | **20.0** |
| gemini-2.5-flash | $0.30 / $2.50 | **20.5** | 235.9 | | text-embedding-3-small | $0.02/$0.00 | N/A | N/A | 10.0 | **20.0** |
| mistral-medium-3 | $0.40 / $2.00 | **18.7** | 90.2 | | text-embedding-3-large | $0.13/$0.00 | N/A | N/A | 10.0 | **19.9** |
| claude-3-5-haiku-20241022 | $0.80 / $4.00 | **18.7** | 46.4 | | gemini-2.5-flash | $0.30/$2.50 | **20.5** | 235.9 | 9.7 | **19.9** |
| llama-4-maverick | $0.15 / $0.60 | **18.3** | 126.7 | | mistral-small-3.2-24b-instruct | $0.10/$0.30 | N/A | N/A | 10.0 | **19.9** |
| gpt-4o | $2.50 / $10.00 | **17.3** | 168.6 | | deepseek-v3.2-exp | $0.27/$0.41 | N/A | N/A | 9.9 | **19.8** |
| deepseek-v3-0324 | $0.25 / $1.00 | **16.4** | N/A | | grok-3-mini | $0.30/$0.50 | N/A | N/A | 9.9 | **19.8** |
| claude-3-5-sonnet-20241022 | $3.00 / $15.00 | **15.9** | N/A | | grok-4-fast-non-reasoning | $0.40/$1.00 | N/A | N/A | 9.8 | **19.6** |
| llama-4-scout | $0.08 / $0.50 | **13.5** | 158.6 | | gemini-2.5-flash-image | $0.30/$2.50 | N/A | N/A | 9.7 | **19.4** |
| gpt-4.1-nano | $0.10 / $0.40 | **12.9** | 141.9 | | claude-haiku-4-5 | $1.00/$5.00 | N/A | N/A | 9.3 | **18.7** |
| mistral-large-2411 | $2.00 / $6.00 | **9.9** | N/A | | mistral-medium-3 | $0.40/$2.00 | **18.7** | 90.2 | 9.7 | **18.2** |
| text-embedding-3-large | $0.13 / $0.00 | N/A | N/A | | llama-4-maverick | $0.15/$0.60 | **18.3** | 126.7 | 9.9 | **18.1** |
| gpt-5-chat | $1.25 / $10.00 | N/A | N/A | | gpt-5-chat | $1.25/$10.00 | N/A | N/A | 8.9 | **17.7** |
| grok-4-fast-non-reasoning | $0.40 / $1.00 | N/A | N/A | | gpt-5.1-chat | $1.25/$10.00 | N/A | N/A | 8.9 | **17.7** |
| claude-sonnet-4-5 | $3.00 / $15.00 | N/A | N/A | | claude-3-5-haiku-20241022 | $0.80/$4.00 | **18.7** | 46.4 | 9.5 | **17.7** |
| gpt-5.1-chat | $1.25 / $10.00 | N/A | N/A | | gemini-3-pro-image-preview | $2.00/$12.00 | N/A | N/A | 8.5 | **17.0** |
| claude-haiku-4-5 | $1.00 / $5.00 | N/A | N/A | | gpt-5.2-chat | $1.75/$14.00 | N/A | N/A | 8.4 | **16.8** |
| gemini-2.5-flash-image | $0.30 / $2.50 | N/A | N/A | | deepseek-v3-0324 | $0.25/$1.00 | **16.4** | N/A | 9.9 | **16.2** |
| claude-opus-4-1-20250805 | $15.00 / $75.00 | N/A | N/A | | claude-sonnet-4-5 | $3.00/$15.00 | N/A | N/A | 8.0 | **16.0** |
| deepseek-v3.2-exp | $0.27 / $0.41 | N/A | N/A | | gpt-4o | $2.50/$10.00 | **17.3** | 168.6 | 8.5 | **14.8** |
| gpt-5.2-chat | $1.75 / $14.00 | N/A | N/A | | llama-4-scout | $0.08/$0.50 | **13.5** | 158.6 | 9.9 | **13.4** |
| grok-3-mini | $0.30 / $0.50 | N/A | N/A | | gpt-4.1-nano | $0.10/$0.40 | **12.9** | 141.9 | 9.9 | **12.8** |
| mistral-small-3.2-24b-instruct | $0.10 / $0.30 | N/A | N/A | | claude-3-5-sonnet-20241022 | $3.00/$15.00 | **15.9** | N/A | 8.0 | **12.7** |
| gemini-3-pro-image-preview | $2.00 / $12.00 | N/A | N/A | | mistral-large-2411 | $2.00/$6.00 | **9.9** | N/A | 9.0 | **8.9** |
| text-embedding-3-small | $0.02 / $0.00 | N/A | N/A | | claude-opus-4-1-20250805 | $15.00/$75.00 | N/A | N/A | 0.0 | N/A |

View File

@@ -5,14 +5,12 @@ import time
import re import re
from dotenv import load_dotenv from dotenv import load_dotenv
# Charger .env.global
load_dotenv("../.env.global") load_dotenv("../.env.global")
AIANALASYS_APIKEY = os.getenv("AIANALASYS_APIKEY") AIANALASYS_APIKEY = os.getenv("AIANALASYS_APIKEY")
def get_mammouth_models(): def get_mammouth_models():
url = "https://api.mammouth.ai/public/models" url = "https://api.mammouth.ai/public/models"
try: try:
# Désactiver les warnings InsecureRequest car verify=False est utilisé
requests.packages.urllib3.disable_warnings() requests.packages.urllib3.disable_warnings()
response = requests.get(url, verify=False) response = requests.get(url, verify=False)
response.raise_for_status() response.raise_for_status()
@@ -33,7 +31,6 @@ def get_aa_data():
return [] return []
def clean_id(model_id): def clean_id(model_id):
# Nettoyage agressif pour favoriser le mapping
id_clean = re.sub(r'-\d{4,8}', '', model_id.lower()) id_clean = re.sub(r'-\d{4,8}', '', model_id.lower())
id_clean = id_clean.replace('-latest', '').replace('-preview', '').replace('-instruct', '') id_clean = id_clean.replace('-latest', '').replace('-preview', '').replace('-instruct', '')
return id_clean.strip() return id_clean.strip()
@@ -45,96 +42,116 @@ def generate_markdown(models_data):
if cat not in categories: categories[cat] = [] if cat not in categories: categories[cat] = []
categories[cat].append(m) categories[cat].append(m)
md = "# Table des Modèles Mammouth.ai\n\n" md = "# Dashboard des Modèles Mammouth.ai\n\n"
md += "*Généré automatiquement à partir des benchmarks d'Artificial Analysis et des tarifs Mammouth.ai.*\n\n" md += "*Analyse comparative basée sur le prix (Mammouth) et la performance (Artificial Analysis).*\n\n"
md += f"Dernière mise à jour : {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n" md += f"Dernière mise à jour : {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n"
md += "### Légende :\n"
md += "- **Note Prix** : 10 = Le moins cher, 0 = Le plus cher.\n"
md += "- **Efficience** : Ratio Performance / Prix. Un score élevé indique un excellent rapport qualité/prix.\n\n"
order = ['Coding', 'Agents', 'General'] order = ['Coding', 'Agents', 'General']
sorted_cats = sorted(categories.keys(), key=lambda x: order.index(x) if x in order else 99) sorted_cats = sorted(categories.keys(), key=lambda x: order.index(x) if x in order else 99)
for cat in sorted_cats: for cat in sorted_cats:
md += f"## {cat}\n\n" md += f"## {cat}\n\n"
md += "| Modèle | Prix (In / Out / 1M) | Score (Intelligence) | Vitesse (TPS) |\n" md += "| Modèle | Prix (In/Out 1M) | Score | Vitesse | Note Prix | **Efficience** |\n"
md += "| :--- | :--- | :--- | :--- |\n" md += "| :--- | :--- | :--- | :--- | :--- | :--- |\n"
models = categories[cat] models = categories[cat]
# Tri : Score (desc), puis Nom # Tri par Efficience décroissante
models.sort(key=lambda x: (x.get('score') or 0), reverse=True) models.sort(key=lambda x: x.get('efficiency_score', 0), reverse=True)
for m in models: for m in models:
score_str = f"**{m['score']:.1f}**" if m['score'] else "N/A" score_str = f"**{m['score']:.1f}**" if m['score'] else "N/A"
speed_str = f"{m['speed']:.1f}" if m['speed'] else "N/A" speed_str = f"{m['speed']:.1f}" if m['speed'] else "N/A"
md += f"| {m['name']} | ${m['price_in']:.2f} / ${m['price_out']:.2f} | {score_str} | {speed_str} |\n" p_score = f"{m['price_score']:.1f}" if m['price_score'] is not None else "N/A"
eff_score = f"**{m['efficiency_score']:.1f}**" if m['efficiency_score'] else "N/A"
md += f"| {m['name']} | ${m['price_in']:.2f}/${m['price_out']:.2f} | {score_str} | {speed_str} | {p_score} | {eff_score} |\n"
md += "\n" md += "\n"
return md return md
def main(): def main():
print("Fetching data from Mammouth and Artificial Analysis...") print("Calcul de l'efficience des modèles...")
m_models = get_mammouth_models() m_models = get_mammouth_models()
aa_data = get_aa_data() aa_data = get_aa_data()
# Mapping table (slug -> data) aa_map = {m.get('slug', '').lower(): m for m in aa_data}
aa_map = {}
for aa_m in aa_data: for aa_m in aa_data:
slug = aa_m.get('slug', '').lower() aa_map[aa_m.get('name', '').lower()] = aa_m
name = aa_m.get('name', '').lower()
if slug: aa_map[slug] = aa_m
if name: aa_map[name] = aa_m
enriched = [] enriched = []
# On calcule d'abord les prix pour déterminer les échelles de note
temp_list = []
for m in m_models: for m in m_models:
m_id = m.get('id', '') m_id = m.get('id', '')
info = m.get('model_info', {}) info = m.get('model_info', {})
if not m_id: continue if not m_id: continue
price_in = float(info.get('input_cost_per_token', 0)) * 1000000
price_out = float(info.get('output_cost_per_token', 0)) * 1000000
# Prix combiné (moyenne pondérée 3:1 comme AA)
blended_price = (price_in * 0.75) + (price_out * 0.25)
if blended_price > 0:
temp_list.append((m, blended_price, price_in, price_out))
if not temp_list: return
# Calcul des échelles de prix pour la note (Log scale pour mieux différencier)
min_p = min(x[1] for x in temp_list)
max_p = max(x[1] for x in temp_list)
for m_data, b_price, p_in, p_out in temp_list:
m_id = m_data.get('id', '')
m_id_clean = clean_id(m_id) m_id_clean = clean_id(m_id)
short_id = m_id_clean.split('/')[-1] short_id = m_id_clean.split('/')[-1]
# Match mapping
aa_info = aa_map.get(m_id_clean) or aa_map.get(short_id) aa_info = aa_map.get(m_id_clean) or aa_map.get(short_id)
# Recherche floue (ex: claude-3-5-sonnet -> claude-3.5-sonnet)
if not aa_info: if not aa_info:
normalized_m_id = m_id_clean.replace('-', '').replace('.', '') norm = m_id_clean.replace('-', '').replace('.', '')
for key, val in aa_map.items(): for k, v in aa_map.items():
if key.replace('-', '').replace('.', '') == normalized_m_id: if k.replace('-', '').replace('.', '') == norm:
aa_info = val aa_info = v
break break
price_in = float(info.get('input_cost_per_token', 0)) * 1000000 # Note Prix : 10 pour le moins cher, 0 pour le plus cher
price_out = float(info.get('output_cost_per_token', 0)) * 1000000 # Formule : 10 * (1 - (price - min) / (max - min))
price_score = 10 * (1 - (b_price - min_p) / (max_p - min_p)) if max_p > min_p else 10
category = "General"
if any(x in m_id_clean for x in ['coding', 'code', 'starcoder', 'codestral', 'coder']):
category = "Coding"
elif any(x in m_id_clean for x in ['agent', 'hermes', 'tool', 'function', 'sonar']):
category = "Agents"
score = None score = None
speed = None speed = None
category = "General"
if any(x in m_id_clean for x in ['coding', 'code', 'starcoder', 'codestral', 'coder']): category = "Coding"
elif any(x in m_id_clean for x in ['agent', 'hermes', 'tool', 'function', 'sonar']): category = "Agents"
if aa_info: if aa_info:
evals = aa_info.get('evaluations', {}) evals = aa_info.get('evaluations', {})
# On prend le score coding si c'est la catégorie, sinon intelligence index score = evals.get('artificial_analysis_coding_index') if category == "Coding" else evals.get('artificial_analysis_intelligence_index')
score = evals.get('artificial_analysis_coding_index') if category == "Coding" else None if not score: score = evals.get('artificial_analysis_intelligence_index')
if not score:
score = evals.get('artificial_analysis_intelligence_index')
speed = aa_info.get('median_output_tokens_per_second') speed = aa_info.get('median_output_tokens_per_second')
# Efficience : On combine la performance et le prix
# Si pas de score AA, on base l'efficience uniquement sur le prix (avec un bonus de base)
efficiency_score = 0
if score:
# Score normalisé (0-100) * Note Prix (0-10) / 10
efficiency_score = (score * price_score) / 10
else:
# Modèle sans benchmark : on lui donne une efficience basée sur son prix seul
efficiency_score = price_score * 2 # Moins prioritaire que ceux avec score
enriched.append({ enriched.append({
'name': m_id, 'name': m_id,
'price_in': price_in, 'price_in': p_in,
'price_out': price_out, 'price_out': p_out,
'score': score, 'score': score,
'speed': speed, 'speed': speed,
'price_score': price_score,
'efficiency_score': efficiency_score,
'category': category 'category': category
}) })
# On ne garde que les modèles avec prix > 0
final = [x for x in enriched if x['price_in'] > 0]
with open("README.md", "w", encoding="utf-8") as f: with open("README.md", "w", encoding="utf-8") as f:
f.write(generate_markdown(final)) f.write(generate_markdown(enriched))
print(f"Success! Dashboard updated with {len(enriched)} models.")
print(f"Success! {len(final)} models processed.")
if __name__ == "__main__": if __name__ == "__main__":
main() main()