Archive - Vizuara’s Substack

September 2025

August 2025

RPT : Reinforcement Learning during Pretraining

Making RL (RLHF/RLVR) part of Alignment Tuning step is one of the major reasons for recent development in LLM, but, what if we do do the same during…

Aug 26 •

Policy Gradient Methods in Reinforcement Learning

So far, our policy estimation has been defined based on the following rule: For every state, look at the action value function Q and ask the question…

Aug 22 •

Building AI agents to play video games

How did humans build agents which can play video games?

Aug 16 •

Why Your Transformer Might Not Need Normalization

A deepdive into different normalisation methods and how DyT from Meta offers stability and control, while keeping your architecture lean and…

Aug 12 •

The three horsemen of Classical Reinforcement Learning

All about Dynamic Programming, Monte-Carlo and Temporal Difference Methods

Aug 8 •

Hands-on RL Bootcamp Lecture 1

A practical and easy-to-follow program from Q-learning and DQNs to RLHF and GRPO!

Aug 1 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts