Title | : | Codex: Large Language Models Trained on Code |
Duration | : | 21:06 |
Viewed | : | 0 |
Published | : | 07-12-2022 |
Source | : | Youtube |
Codex is a GPT language model finetuned on publicly available code from GitHub. It has been finetuned on large amounts of Github code and further finetuned on standalone functions code. It powers GitHub Copilot. It helps reduce context switching for experienced coders; improves productivity. It also enables non-programmers to write specifications and have Codex draft implementations. As part of this work, the HumanEval dataset and pass@k are great contributions.
In this video, I will briefly provide an overview of these models: Codex and Codex-S. We will also talk about details of finetuning, HumanEval dataset and pass@k metric.
Here is the agenda:
00:00:00 Github Copilot
00:01:23 What is Codex?
00:03:43 HumanEval: Hand-Written Evaluation Set, and Pass@k
00:06:07 Code Finetuning to train Codex
00:10:17 Codex vs GPT-Neo, GPT-J and TabNine
00:12:15 Supervised Fine-Tuning to train Codex-S
00:15:35 DocString Generation with Codex
00:16:48 Codex: Limitations and Hazards
For more details, please look at
arxiv.org/pdf/2107.03374.pdf and
github.com/features/copilot
Chen, Mark, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards et al. "Evaluating large language models trained on code." arXiv preprint arXiv:2107.03374 (2021).
SHARE TO YOUR FRIENDS
Scan me