PostHole
Compose Login
You are browsing eu.zone1 in read-only mode. Log in to participate.
rss-bridge 2025-03-17T15:00:00+00:00

180: Reinforcement Learning

Intro topic: GrillsNews/Links:You can’t call yourself a senior until you’ve worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I’ve ever used — here’s whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF &  GRPO

★ Support this podcast on Patreon ★
]]


Programming Throwdown

Patrick Wheeler and Jason Gauci

179: Project Planning

181: Memory Management

Download Audio File

**Intro topic: Grills

**News/Links:

  • You can’t call yourself a senior until you’ve worked on a legacy project
  • Recraft might be the most powerful AI image platform I’ve ever used — here’s why
  • NASA has a list of 10 rules for software development
  • AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE

Book of the Show

  • Patrick:
  • The Player of Games (Ian M Banks)
  • Jason:
  • Basic Roleplaying Universal Game Engine

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

  • Patrick:
  • Pokemon Sword and Shield
  • Jason:

**Topic: Reinforcement Learning

  • Three types of AI
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
  • Value optimization
  • SARSA
  • Q-Learning
  • Policy optimization
  • Policy Gradients
  • Actor-Critic
  • Proximal Policy Optimization
  • Value vs Policy Optimization
  • Value optimization is more intuitive (Value loss)
  • Policy optimization is less intuitive at first (policy gradients)
  • Converting values to policies in deep learning is difficult
  • Imitation Learning
  • Supervised policy learning
  • Often used to bootstrap reinforcement learning
  • Policy Evaluation
  • Propensity scoring versus model-based
  • Challenges to training RL model
  • Two optimization loops
  • Collecting feedback vs updating the model
  • Difficult optimization target
  • Policy evaluation
  • RLHF &  GRPO

Original source

Reply