{{backlinks>.}}

====== 🤖 Modèles ======

===== Grands modèles de langage =====

===== LLM =====

  * [[wp>Large language model]]
  * [[wpfr>Grand modèle de langage]]

===== GPT =====

  * [[wp>Generative pre-trained transformer]]

===== Transformers =====

  * [[wp>Transformer (deep learning architecture)]]
  * [[https://www.youtube.com/watch?v=wjZofJX0v4M|Transformers (how LLMs work) explained visually]] (YouTube)


===== Entraînement =====


{{:ia:schema_llm_open_source_pmdia_whi3h.jpg}}

Source : [[https://parlezmoidia.fr/content/Z416fPxsJN7RopgIFzSO|Schéma des étapes de construction d'un modèle LLM]]


===== Pré-entraînement =====

  * [[https://arxiv.org/pdf/2401.02038v2|Understanding LLMs: A Comprehensive Overview from Training to Inference]]
    * The training of LLMs can be broadly divided into three steps.
    * [1] The first step involves **data collection** and processing.
    * [2] The second step encompasses the **pre-training** process, which includes determining the model’s architecture and pretraining tasks and utilizing suitable parallel training algorithms to complete the training.
    * [3] The third step involves **finetuning** and alignment. In this section, we will provide an overview of the model training techniques. This will include an introduction to the relevant training datasets, data preparation and preprocessing, model architecture, specific training methodologies, model evaluation, and commonly used training frameworks for LLMs

===== Post-entraînement =====

  * [[wp>Reinforcement learning]]
  * [[wpfr>Apprentissage par renforcement]]
  * [[https://codeberg.org/rvba/osai/src/branch/main/grpo.md|GRPO]]


> To **post-train** models, we take a pre-trained base model, do supervised fine-tuning on a broad set of ideal responses written by humans or existing models, and then run reinforcement learning with reward signals from a variety of sources.

> During **reinforcement learning**, we present the language model with a prompt and ask it to write responses. We then rate its response according to the reward signals, and update the language model to make it more likely to produce higher-rated responses and less likely to produce lower-rated responses.

Source : [[wp>OpenAI]], [[https://openai.com/index/expanding-on-sycophancy/|Expanding on what we missed with sycophancy]] 

{{https://upload.wikimedia.org/wikipedia/en/8/86/Waylon_Smithers_1.png?50}}

[[wp>Waylon Smithers|Waylon]] the [[wp>Sycophancy|Sycophant]] 


===== Évolution =====

  * [[https://arxiv.org/abs/2403.05812|Algorithmic progress in language models]]