Course Resources

Table of contents

  1. Lecture Materials
    1. Lecture 1: Introduction
      1. Lecture 1 Papers
  2. Reference Papers by Topic
    1. 1/6/2026 Lagrangian methods
    2. 1/20/2026 Differentiable projections
    3. 1/27/2026 Reinforcement learning
    4. 2/3/2026 Reinforcement learning
    5. 2/10/2026 LLM alignment
    6. 2/17/2026 Post-training interventions
    7. 2/24/2026 Formal verification
    8. 3/3/2026 Declarative programming & tool use
    9. 3/10/2026 Synthesis and future directions

Lecture Materials

  • Zotero export contains selected papers I’ve found on constrained and safe AI as of the first day of the course.
  • Zotero bib contains bibliography entries only for the same papers.

Lecture 1: Introduction

Lecture 1 Papers

Reference Papers by Topic

1/6/2026 Lagrangian methods

  • 1 Position: Adopt Constraints Over Penalties in Deep Learning — Juan Ramirez, Meraj Hashemizadeh, Simon Lacoste-Julien (2025). arXiv preprint. arXiv:2505.20628; DOI: 10.48550/arXiv.2505.20628. Argues that penalty methods are often unreliable for enforcing constraints in deep learning and advocates constraint-first training (for example via Lagrangian methods) to achieve trustworthy behavior.
  • [7] Lagrangian Duality for Constrained Deep Learning — Ferdinando Fioretto, Pascal Van Hentenryck, Terrence W K Mak, Cuong Tran, Federico Baldo, Michele Lombardi (2020). arXiv preprint. arXiv:2001.09394; DOI: 10.48550/arXiv.2001.09394. Applies Lagrangian duality to train deep networks under hard constraints across domains such as energy systems and fairness, enabling improved solutions with principled constraint handling.
  • [19] Lagrangian Duality for Constrained Deep Learning — Ferdinando Fioretto, Pascal Van Hentenryck, Terrence W K Mak, Cuong Tran, Federico Baldo, Michele Lombardi (2020). arXiv preprint (duplicate Zotero record). arXiv:2001.09394; DOI: 10.48550/arXiv.2001.09394. Applies Lagrangian duality to train deep networks under hard constraints across domains such as energy systems and fairness, enabling improved solutions with principled constraint handling.
  • [23] Chance constraint Optimization Learning — Pascal Van Hentenryck, Ian Stainwright, Shmuel P Rubin, Kevin Duffy (2025). arXiv preprint. arXiv:2501.03443; DOI: 10.48550/arXiv.2501.03443. Introduces an optimization-learning perspective for chance-constrained problems, learning solution mappings and risk measures for applications such as security-constrained optimal power flow.
  • [21] Chance-Constrained Optimization (Stanford EE364a lecture notes) — Stephen Boyd (n.d.). Course notes (Stanford EE364a). URL: https://web.stanford.edu/class/ee364a/lectures/chance_constr.pdf. Lecture-note style introduction to chance-constrained optimization that develops probabilistic constraint formulations and standard convex or dual approximations.
  • [22] Chance-Constrained Optimization lecture notes (PDF attachment) — Stanford EE364a (n.d.). Course notes PDF. URL: https://web.stanford.edu/class/ee364a/lectures/chance_constr.pdf. A direct PDF attachment of the Stanford EE364a chance-constrained optimization lecture material used as a reference for chance constraints.

1/20/2026 Differentiable projections

  • 2 Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules — Gonzalo E. Constante-Flores, Hao Chen, Can Li (2025). arXiv preprint. arXiv:2505.13858; DOI: 10.48550/arXiv.2505.13858. Proposes a model-agnostic architecture that enforces input-dependent hard linear constraints via decision rules, providing formal constraint satisfaction while maintaining competitive accuracy and low latency.
  • 4 HardNet: Hard-Constrained Neural Networks with Universal Approximation Guarantees — Soobin Min, Navid Azizan (2025). arXiv preprint. arXiv:2410.10807; DOI: 10.48550/arXiv.2410.10807. Introduces HardNet, a hard-constrained neural network construction that exactly satisfies multiple input-dependent inequality constraints while retaining universal approximation guarantees.
  • [18] HardNet: Hard-Constrained Neural Networks with Universal Approximation Guarantees — Soobin Min, Navid Azizan (2025). arXiv preprint. arXiv:2410.10807; DOI: 10.48550/arXiv.2410.10807. Introduces HardNet, a hard-constrained neural network construction that exactly satisfies multiple input-dependent inequality constraints while retaining universal approximation guarantees.
  • [28] OptNet: Differentiable Optimization as a Layer in Neural Networks — Brandon Amos, Lei Xu, J. Zico Kolter (2021). arXiv preprint. arXiv:1703.00443; DOI: 10.48550/arXiv.1703.00443. Introduces OptNet, a differentiable optimization layer that embeds quadratic programs in neural networks and enables end-to-end training via differentiable solvers.
  • [35] BarrierNet: Differentiable Control Barrier Functions for Learning of Safe Robot Control — Ames, Aaron D, Xu, Xiangru, Grizzle, Jessy W, Tabuada, Paulo, et al. (2019). IEEE Transactions on Robotics, 35 (5) pp. 1107-1123. DOI: 10.1109/TRO.2019.2944364. Introduces BarrierNet, which integrates differentiable control barrier functions into learning to produce controllers with formal safety guarantees.
  • [38] BarrierNet: Differentiable Control Barrier Functions for Learning of Safe Robot Control — Ames, Aaron D, Xu, Xiangru, Grizzle, Jessy W, Tabuada, Paulo, et al. (2020). IEEE Transactions on Robotics, 35 (5) pp. 1107-1123. DOI: 10.1109/TRO.2019.2944364. Introduces BarrierNet, which integrates differentiable control barrier functions into learning to produce controllers with formal safety guarantees.
  • [36] Differentiable Control Barrier Functions for Vision-based End-to-End Autonomous Driving — Yixiao Guo, Anirudha Majumdar (2022). arXiv preprint. arXiv:2203.10971; DOI: 10.48550/arXiv.2203.10971. Introduces differentiable control barrier functions for vision-based end-to-end autonomous driving, coupling perception with a safety-certified control layer.
  • [39] Differentiable Control Barrier Functions for Vision-based End-to-End Autonomous Driving — Yixiao Guo, Anirudha Majumdar (2022). arXiv preprint. arXiv:2203.10971; DOI: 10.48550/arXiv.2203.10971. Introduces differentiable control barrier functions for vision-based end-to-end autonomous driving, coupling perception with a safety-certified control layer.

1/27/2026 Reinforcement learning

  • [29] Constrained Policy Optimization — Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel (2017). arXiv preprint. arXiv:1705.10528; DOI: 10.48550/arXiv.1705.10528. Introduces Constrained Policy Optimization, a trust-region reinforcement learning algorithm with theoretical motivation for monotonic improvement under constraints.
  • [26] Projection-Based Constrained Policy Optimization — Weiwei Cheng, Pengfei Ren, Tianyang Lu, Mingyuan Liu, Bo Dai, et al. (2021). arXiv preprint. arXiv:2010.03152; DOI: 10.48550/arXiv.2010.03152. Proposes Projection-Based Constrained Policy Optimization, enforcing constraints by projecting policy updates into the feasible set for improved stability in constrained reinforcement learning.
  • [27] First Order Constrained Optimization in Policy Space — Weiwei Cheng, Bo Dai, Zhenjie Zhang, Jia Liu, Zhuoran Yang (2020). arXiv preprint. arXiv:2002.06506; DOI: 10.48550/arXiv.2002.06506. Introduces a scalable first-order method for constrained optimization in policy space that aims to satisfy constraints while avoiding costly second-order computations.
  • [41] Safe Exploration in Continuous Action Spaces — Gregory Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa (2018). arXiv preprint. arXiv:1801.08757; DOI: 10.48550/arXiv.1801.08757. Introduces a safe exploration method for continuous-action reinforcement learning that enforces state constraints during learning and avoids the violations common in reward-shaping baselines.

2/3/2026 Reinforcement learning

  • [6] Distributionally Robust Constrained Reinforcement Learning under Strong Duality — Bo Zhang, Amy Zhang, Sangwon Kim, Matthew O. Jackson, Karan Singhal, et al. (2024). arXiv preprint. arXiv:2406.15788; DOI: 10.48550/arXiv.2406.15788. Develops a strong-duality-based framework for distributionally robust constrained reinforcement learning with convergence guarantees and highlights failure modes of existing iterative methods.
  • [37] End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks — Gautham K. Gopalakrishnan, Arnold L. Springer, Catherine J. Sun (2021). arXiv preprint. arXiv:2103.13464; DOI: 10.48550/arXiv.2103.13464. Proposes an end-to-end safe reinforcement learning approach that uses barrier functions to maintain safety during training in continuous-control tasks.
  • [32] A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems — Katherine E. Driggs-Campbell, Rachel Holladay, Rahil Shome, Mauricio Arcak (2024). arXiv preprint. arXiv:2406.05655; DOI: 10.48550/arXiv.2406.05655. Presents a general framework for safe learning-based control under uncertainty that combines learning with safety guarantees to ensure correct operation during and after training.
  • [40] A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems — Katherine E. Driggs-Campbell, Rachel Holladay, Rahil Shome, Mauricio Arcak (2025). arXiv preprint. arXiv:2406.05655; DOI: 10.48550/arXiv.2406.05655. Presents a general framework for safe learning-based control under uncertainty that combines learning with safety guarantees to ensure correct operation during and after training.
  • [33] One Filter to Deploy Them All: Robust Safety for Quadrupedal Navigation in Unknown Environments — Aman S. Abhishek, Nils Wagener, Matthew O. Jackson, Karan Singhal, et al. (2025). arXiv preprint. arXiv:2501.19503; DOI: 10.48550/arXiv.2501.19503. Proposes a robust safety filter for quadrupedal navigation that can be deployed across unknown environments to provide safe behavior while preserving performance.

2/10/2026 LLM alignment

  • [20] Advancing LLM Safe Alignment with Safety Representation Ranking — Tianqi Du, Zeming Wei, Quan Chen, Chenheng Zhang, Yisen Wang (2025). arXiv preprint (ICLR 2026 submission). arXiv:2505.15710; DOI: 10.48550/arXiv.2505.15710. (arXiv) Proposes Safety Representation Ranking, which ranks multiple candidate responses using internal LLM representations to select safer outputs at inference time without changing model weights.
  • [31] Learning Safety Constraints for Large Language Models — Jianwei Zhang, Qiming Zhu, Yifang Xu, Mingjie Zhao, Hengrui Zhao, et al. (2025). arXiv preprint. arXiv:2504.17446; DOI: 10.48550/arXiv.2504.17446. Proposes Safety Polytope, a geometric method that learns linear safety constraints in LLM representation space to improve safety generalization and robustness to jailbreak prompts.
  • [12] Persona Vectors: Monitoring and Controlling Character Traits in Language Models — Alex Schiebel, Wentao Li, Andrew Huang, Taylor Berg-Kirkpatrick, David Bau, et al. (2025). arXiv preprint. arXiv:2507.21509; DOI: 10.48550/arXiv.2507.21509. Identifies linear ‘persona vectors’ in large language model representations that enable monitoring and steering of character traits during generation.
  • 3 On Surjectivity of Neural Networks: Can you elicit any behavior from your model? — Haozhe Jiang, Nika Haghtalab (2025). arXiv preprint. arXiv:2508.19445; DOI: 10.48550/arXiv.2508.19445. Shows that many common neural network architectures are (almost) surjective, implying that an adversary can in principle elicit essentially any specified output behavior from a trained model.

2/17/2026 Post-training interventions

  • [13] AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models — Pei Ke, Fei Mi, Jingkun Liu, Huaiyu Zhu, Sun, et al. (2025). arXiv preprint. arXiv:2410.02355; DOI: 10.48550/arXiv.2410.02355. Proposes AlphaEdit, a knowledge-editing method that constrains weight updates to a null space so targeted edits minimally disrupt other model behaviors.
  • [17] Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation — Luca Beurer-Kellner, Marc Fischer, Martin Vechev (2024). arXiv preprint. arXiv:2403.06988; DOI: 10.48550/arXiv.2403.06988. (arXiv) Presents DOMINO, a constrained decoding algorithm that aligns subword vocabularies with formal constraints and uses precomputation and speculative decoding to achieve near-zero overhead.

2/24/2026 Formal verification

  • [11] Hilbert: Prompting Large Language Models to Formalize Theories using Recursive Proving — Anonymous (2025). arXiv preprint. arXiv:2509.22819; DOI: 10.48550/arXiv.2509.22819. Introduces Hilbert, a prompting framework that treats natural-language proofs as recursive programs to generate machine-checkable formal proofs with informal reasoning guidance.
  • [30] Backpropagation through Signal Temporal Logic Specifications: Infusing Logical Structure into Gradient-Based Methods — Aditya Prabhakar, Subhro Das, Siavash R. Lemay, Supratik Paul, et al. (2025). arXiv preprint. arXiv:2509.19062; DOI: 10.48550/arXiv.2509.19062. Develops differentiable surrogates for Signal Temporal Logic specifications to enable gradient-based learning that directly optimizes temporal-logic satisfaction.
  • [34] Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability — HJReachability Team (2022). arXiv preprint. arXiv:2210.16206; DOI: 10.48550/arXiv.2210.16206. Develops scalable approaches to learn Hamilton–Jacobi reachability-based safety certificates with neural approximations, enabling safety guarantees in higher-dimensional systems.

3/3/2026 Declarative programming & tool use

  • [5] OptiMind: Teaching LLMs to Think Like Optimization Experts — Hao Chen, Zixin Zhu, Dhruba Ray, Neel Sahoo, Neha Nayyar, et al. (2025). Document (www.microsoft.com). URL: https://www.microsoft.com/en-us/research/publication/optimind-teaching-llms-to-think-like-optimization-experts/. Presents OptiMind, an LLM-assisted framework for formulating optimization problems from natural language that improves accuracy through data cleaning, task decomposition, and iterative refinement.
  • [8] OptiMUS-0.3 — Hamidreza AhmadiTeshnizi, Farhad Babaei, Vladimir Bergman, Maxim Bilenko, et al. (2025). arXiv preprint. arXiv:2407.19633; DOI: 10.48550/arXiv.2407.19633. Introduces OptiMUS, a modular LLM agent system and benchmark that converts natural-language optimization problems into executable linear or mixed-integer programs with automated debugging and evaluation.
  • [9] OptiChat: A Conversational Agent for Optimization Modeling — Hao Chen, Dhruba Ray, Neel Sahoo, Lucas Leyton-Brown (2025). arXiv preprint. arXiv:2501.08406; DOI: 10.48550/arXiv.2501.08406. Introduces OptiChat, a conversational assistant that helps users build, interpret, and solve optimization models by combining LLM dialogue with solver-backed computation.
  • [10] Large Language Models for Supply Chain Decisions — Francesca Cianchi, Zongzhang Zhang, Tinglong Dai, Nilesh M. Patil (2025). arXiv preprint. arXiv:2506.03078; DOI: 10.48550/arXiv.2506.03078. Explores how large language models can be integrated with operations research workflows for supply chain decision-making, emphasizing both capabilities and practical limitations.
  • [14] SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, et al. (2023). arXiv preprint. arXiv:2310.06770; DOI: 10.48550/arXiv.2310.06770. Introduces SWE-bench, a benchmark of real GitHub issues with tests that measures whether language models can produce correct code changes in existing repositories.
  • [24] Decoding the Configuration of AI Coding Agents: Insights from Claude Code projects — Madeline Zucker, Annie Rauwerdink, Colton Aldridge, Kenneth Chan, et al. (2025). arXiv preprint. arXiv:2511.09268; DOI: 10.48550/arXiv.2511.09268. Empirically analyzes configuration practices in Claude Code projects and derives patterns for instructing and structuring agentic coding workflows.
  • [25] Claude Code Best Practices — Anthropic (2025). Engineering blog post. URL: https://www.anthropic.com/engineering/claude-code-best-practices. (Anthropic) Provides practical guidance for using Claude Code effectively, including context-file conventions, tool allowlists, and workflow tips for safer agentic coding.

3/10/2026 Synthesis and future directions

  • [15] GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks — Weijia Shi, Shuyue Hu, Houzheng Chen, Ziru Chen, et al. (2025). arXiv preprint. arXiv:2510.04374; DOI: 10.48550/arXiv.2510.04374. Proposes GDPval, an evaluation framework and dataset for measuring AI performance on real-world economically valuable tasks using outcome-linked metrics.
  • [16] Democratizing Optimization with Generative AI — David Simchi-Levi, Tinglong Dai, Karan Singhal, Michelle L. Zhang, Shellye Xiao Wu (2025). SSRN preprint. DOI: 10.2139/ssrn.5511218. (ResearchGate) Argues that generative AI can broaden access to optimization by translating natural-language problem descriptions into formal models and interactive decision-support workflows.

MS&E 318: Constrained and Safe AI | Stanford University | Winter 2026

This site uses Just the Docs, a documentation theme for Jekyll.