Table of Contents

🤖 Modèles

Grands modèles de langage

LLM

GPT

Transformers

Entraînement

schema_llm_open_source_pmdia_whi3h.jpg

Source : Schéma des étapes de construction d'un modèle LLM

Pré-entraînement

Post-entraînement

To post-train models, we take a pre-trained base model, do supervised fine-tuning on a broad set of ideal responses written by humans or existing models, and then run reinforcement learning with reward signals from a variety of sources.
During reinforcement learning, we present the language model with a prompt and ask it to write responses. We then rate its response according to the reward signals, and update the language model to make it more likely to produce higher-rated responses and less likely to produce lower-rated responses.

Source : OpenAI, Expanding on what we missed with sycophancy

Waylon the Sycophant

Évolution