The Best Open-source OCR model | AI & ML Monthly

Daniel Bourke August 12, 2025
Video Thumbnail

About

No channel description available.

Video Description

Quick link: Title refers to the dots.ocr OCR model - https://huggingface.co/rednote-hilab/dots.ocr Welcome to machine learning & AI monthly for July 2025. This is the video version of the newsletter I write every month which covers the latest and greatest (but not always the latest) in the world of AI and ML. Read the issues online: - AI/ML Monthly July 2025 (this video) — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-july-2025/ - AI/ML Monthly June 2025 — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-june-2025/ - AI/ML Monthly May 2025 — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-may-2025/ My links: Download Nutrify (my startup) - https://nutrify.app Download KeepTrack (my other startup) - https://keeptrack.app Personal website - https://www.mrdbourke.com My ML blog - https://learnml.io Read my novel Charlie Walks - https://www.charliewalks.com Courses I teach: Learn Hugging Face - https://learnhuggingface.com Learn AI/ML (beginner-friendly course) - https://dbourke.link/ZTMMLcourse Learn TensorFlow - https://dbourke.link/ZTMTFcourse Learn PyTorch - https://dbourke.link/ZTMPyTorch Timestamps: 0:00 - Intro 0:16 - My work — ZTM Object Detection with Hugging Face - https://www.learnhuggingface.com/notebooks/hugging_face_object_detection_tutorial 2:20 - From the Internet 2:21 - My favourite AI use case for AI is writing logs (Vicky Boykis) - https://newsletter.vickiboykis.com/archive/my-favorite-use-case-for-ai-is-writing-logs/ 11:40 - Cloudflare helps creators block AI scrapers - https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/ 15:03 - Google DeepMind embeds the entire Earth (AlphaEarth Foundations) - https://deepmind.google/discover/blog/alphaearth-foundations-helps-map-our-planet-in-unprecedented-detail/ 22:05 - Apple: How they built FastVLM - https://machinelearning.apple.com/research/fast-vision-language-models 26:48 - Google’s SensorLM (60M hours of sensor data) - https://research.google/blog/sensorlm-learning-the-language-of-wearable-sensors/ 32:29 - Daniel’s Open-Source AI of the Month 32:30 - Mistral releases Voxtral (ASR/translation/understanding) - https://mistral.ai/news/voxtral 40:35 - Allen AI’s FlexOlmo (collaborative MoE training) - https://allenai.org/blog/flexolmo 42:50 - Franca (open data/code/weights vision backbones) - https://github.com/valeoai/Franca 49:00 - Hugging Face SmolLM3 + full training recipe - https://huggingface.co/HuggingFaceTB/SmolLM3-3B 53:30 - MedGemma goes multimodal (text+image) - https://research.google/blog/medgemma-our-most-capable-open-models-for-health-ai-development/ 1:00:03 - Roboflow upgrades RF-DETR (real-time detection) - https://github.com/roboflow/rf-detr 1:03:55 - MM-GroundingDINO on Hugging Face (zero-shot OD) - https://huggingface.co/openmmlab-community/mm_grounding_dino_large_all 1:06:40 - Meta Perception Encoders: new variants - https://github.com/facebookresearch/perception_models 1:08:50 - Apple: pixel-level fallback to expand LLM vocab - https://machinelearning.apple.com/research/overcoming-vocabulary-constraints 1:15:10 - Z.ai GLM-4.5 / GLM-4.1V releases - https://z.ai/blog/glm-4.5 1:18:50 - Qwen3 updates + 480B coding model - https://qwenlm.github.io/blog/qwen3-coder/ 1:21:20 - dots.ocr (small, mighty OCR VLM) - https://huggingface.co/rednote-hilab/dots.ocr 1:26:30 - Releases 1:27:10 - Google’s genai-processors library - https://github.com/google-gemini/genai-processors 1:28:45 - ChatGPT Study Mode - https://openai.com/index/chatgpt-study-mode/ 1:31:29 - Videos 1:31:30 - John Carmack talk (Keen Technologies research) - https://youtu.be/iz9lUMSQBfY 1:34:10 - François Chollet on getting to AGI (ARC v1/v2) - https://youtu.be/5QcCeSsNRks 1:36:00 - Outro

You May Also Like