Vizuara’s Substack
Subscribe
Sign in
Home
Notes
Archive
About
Latest
Top
Discussions
A Primer on Re-Ranking for Retrieval Systems
In this article, we explore the What, Why, and How of advanced optimization over conventional RAG pipelines, known as reranking, which aims to balance…
Oct 22
•
Siddhant Rai
10
1
Tiny Recursive Model (TRM)
How does TRM work?
Oct 14
•
Vizuara AI
11
DINOv3: The Vision Foundation Model
Foundation models are cornerstones of recent advancements, on one hand we have FMs for text like GPT-5, Gemini, etc. But, for images there are still…
Oct 6
•
Siddhant Rai
10
2
September 2025
A Very Long Span of Attention
A primer on important optimization done on top of our attention mechanism, including methods like MHA, MQA, GQA, SWA, sparse and low-rank attention…
Sep 23
•
Siddhant Rai
11
1
Understanding RLHF From Scratch
A beginner's guide to understanding RLHF from Scratch
Sep 10
•
Vizuara AI
14
1
Hierarchical Reasoning Model : Thinking fast and Slow
What if the next leap in AI isn’t bigger models, but models that know when to think longer? HRMs promise the same utopia, HRMs are one of the early step…
Sep 9
•
Siddhant Rai
5
August 2025
RPT : Reinforcement Learning during Pretraining
Making RL (RLHF/RLVR) part of Alignment Tuning step is one of the major reasons for recent development in LLM, but, what if we do do the same during…
Aug 26
•
Siddhant Rai
8
1
Policy Gradient Methods in Reinforcement Learning
So far, our policy estimation has been defined based on the following rule: For every state, look at the action value function Q and ask the question…
Aug 22
•
Vizuara AI
10
Building AI agents to play video games
How did humans build agents which can play video games?
Aug 16
•
Vizuara AI
2
Why Your Transformer Might Not Need Normalization
A deepdive into different normalisation methods and how DyT from Meta offers stability and control, while keeping your architecture lean and…
Aug 12
•
Siddhant Rai
29
2
The three horsemen of Classical Reinforcement Learning
All about Dynamic Programming, Monte-Carlo and Temporal Difference Methods
Aug 8
•
Vizuara AI
10
1
Hands-on RL Bootcamp Lecture 1
A practical and easy-to-follow program from Q-learning and DQNs to RLHF and GRPO!
Aug 1
•
Vizuara AI
20
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts