Structured Output from LLMs: Grammars, Regex, and State Machines
Efficient NLP
@efficientnlpAbout
Efficient NLP My name is Bai Li, I'm a machine learning engineer and PhD in natural language processing. Reach me at: Email: [email protected] LinkedIn: https://www.linkedin.com/in/libai/
Latest Posts
Video Description
Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Structured outputs are essential for applications that integrate LLMs to make decisions in downstream tasks. In this video, I explain how structured output generation works - a topic that is very relevant and also an area of active research. First, we look at OpenAI API's ability to produce structured outputs using formats like Pydantic or Zod. For open-source alternatives, I cover the Outlines library, which operates using state machines and regex under the hood. However, in many cases, we need to generate outputs according to a context-free grammar (CFG), which introduces the need for pushdown automata. Learn how advanced techniques address the challenges of grammar terminals mismatching with LLM tokenization, why this is a problem, and some creative solutions from recent research papers. 0:00 - Introduction 1:06 - OpenAI API example 3:02 - Outlines library example 4:07 - Pydantic to regex conversion 4:57 - Finite state machines and regex 5:58 - Regex matching with LLMs 8:41 - Context free grammars 9:40 - Incremental parsing of CFGs 11:22 - Pushdown automata 12:18 - Token-terminal mismatch problem 14:26 - Vocabulary-aligned subgrammars 15:12 - State machine composition 16:06 - Format restriction and LLM performance OpenAI Structured Outputs API: https://platform.openai.com/docs/guides/structured-outputs Outlines library: https://github.com/dottxt-ai/outlines References Willard, Brandon T., and Rémi Louf. "Efficient guided generation for large language models." arXiv preprint arXiv:2307.09702 (2023). https://arxiv.org/abs/2307.09702 Geng, Saibo, et al. "Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning." EMNLP 2023. https://arxiv.org/abs/2305.13971 Beurer-Kellner, Luca, Marc Fischer, and Martin Vechev. "Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation." ICML 2024. https://arxiv.org/abs/2403.06988 Koo, Terry, Frederick Liu, and Luheng He. "Automata-based constraints for language model decoding." COLM 2024. https://arxiv.org/abs/2407.08103 Tam, Zhi Rui, et al. "Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models.“ EMNLP 2024. https://arxiv.org/abs/2408.02442
You May Also Like
Master LLM Processing Today
AI-recommended products based on this video

LEGO Icons Leonardo da Vinci's Flying Machine DIY Kit - Ornithopter Building Set for Adults, Ages 18+ - Craft for Home or Office Decor with Display Stand - Gift Idea - 10363
![Electric Potato Peeler Automatic Potato Peeler Machine, Electric Peeler Fruits and Vegetables, Apple Peeling Machine for Kitchen [1 Adapter 2 Extra Blades] with Apple Corer (Black-A)](https://m.media-amazon.com/images/I/61aQY9oEE1L._AC_UL960_FMwebp_QL65_.jpg)
Electric Potato Peeler Automatic Potato Peeler Machine, Electric Peeler Fruits and Vegetables, Apple Peeling Machine for Kitchen [1 Adapter 2 Extra Blades] with Apple Corer (Black-A)

95 kPa Vacuum Sealer Machine, 10-IN-1 Modes Powerful Full Automatic Food Saver Vacuum Sealer with Cutter,1 Bag Roll,10 Pre-cut Bags and Accessory Hose Fully Starter Kits

95kpa Vacuum Sealer Machine, Commercial Full Automatic Food Saver, Food Vacuum Sealing Machine with Cutter, 10 Pre-cut Bags & Bag Rolls and Accessory Hose, Fully Starter Kits (Gold)

Pickleball Paddles Set of 2 or 4, USAPA Approved Pickleball Paddles Set with 4 Pickle Balls and Carry Bag, Fiberglass Rackets 2 Pack Gifts for Beginners&Pros



















