NuMind AI has officially released NuMarkdown-8B-Thinking, an open-source (MIT License) reasoning OCR Vision-Language Model (VLM) that redefines how complex documents are digitized and structured. Unlike traditional OCR systems, NuMarkdown-8B-Thinking doesnβt just extract textβit thinks about a documentβs layout, structure, and formatting before generating a precise, ready-to-use Markdown file.
This makes it the first reasoning VLM purpose-built for converting PDFs, scanned documents, and spreadsheets into clean, structured Markdownβideal for Retrieval-Augmented Generation (RAG) workflows, AI-powered knowledge bases, and large-scale document archiving.
How NuMarkdown-8B-Thinking Is Different?
The model introduces a reasoning-first approach to OCR. Instead of directly rendering extracted text, NuMarkdown-8B-Thinking generates βthinking tokensβ β internal reasoning steps that help it understand document layouts before producing the final output.
This capability allows it to handle formats and structures that stump most conventional and even AI-powered OCR systems, including:
- Multi-column layouts with complex reading orders
- Tables with merged, nested, or irregular cells
- Mixed visual elements (images, decorative headers, watermarks)
- Historical or degraded scans where layout inference is crucial
The number of reasoning tokens varies with complexityβanywhere from 20% to 500% of the final Markdown lengthβshowing how much the model βthinksβ before it βwrites.β
Training and Architecture
NuMarkdown-8B-Thinking is a fine-tuned version of Qwen 2.5-VL-7B from Alibabaβone of the strongest open-source multi-modal models available.
Its training pipeline involved two key phases:
- Raw document input
- Intermediate reasoning steps (layout parsing, structure inference)
- Final Markdown representation
This two-stage process gave NuMarkdown-8B-Thinking the ability to maintain high accuracy even on challenging layouts that typically require human-level judgment.
Benchmark Results: Outperforming OCR Heavyweights
In independent evaluations and user testing, NuMarkdown-8B-Thinking demonstrates state-of-the-art reasoning for OCR-to-Markdown tasks:
- Beats:
- Generalist models like GPT-4o
- Specialized OCR-focused models like OCRFlux
- Competitive with:
- Large closed-source reasoning models like Gemini 2.5
- Just behind elite models like Gemini Flash Reasoning in blind, multi-model user rankings

Users particularly highlight its ability to:
- Correctly infer reading order in non-linear layouts
- Preserve intricate table formatting
- Output clean, parsing-friendly Markdown for RAG ingestion without further post-processing


Example in Action
Imagine a scanned annual report page with:
- Multi-level headings
- Sidebars and multiple columns
- A financial table with merged cells and uneven row spacing
- A footer with legal disclaimers
NuMarkdown-8B-Thinking first produces reasoning tokens outlining the structure (βColumn 1: Intro paragraphβ¦ Column 2: Continue paragraphβ¦ Footer text at bottomβ¦ Table spans two columnsβ¦β), then outputs Markdown that accurately reflects both content and layout.
This transparent reasoning layer makes the modelβs decisions auditableβa major plus in enterprise, legal, and archival contexts.


Deployment Options
Whether youβre a researcher, developer, or enterprise AI engineer, NuMarkdown-8B-Thinking is ready to slot into your workflow:
- Hugging Face: Available for direct testing and integration.
- Local Execution: Model weights and quantized GGUF versions are published for CPU/GPU-friendly deployment.
- API-friendly: Compatible with OpenAI-style APIs and Hugging Face Transformers for rapid integration into pipelines.
Its MIT License ensures full freedom for commercial, academic, or personal projectsβno vendor lock-in or costly API gates.
Why This Matters
For industries that rely on accurate document digitizationβfinance, legal, healthcare, government archivesβlayout fidelity is as important as textual accuracy. Most OCR systems treat layout as an afterthought; NuMarkdown-8B-Thinking treats it as a reasoning problem.
By combining open-sourcing, layout reasoning, and RAG-optimized Markdown output, NuMarkdown-8B-Thinking offers a transparent, verifiable, and high-performance alternative to proprietary document AI solutions.
Check out the Model on Hugging Face and GitHub Page. Feel free to check out ourΒ GitHub Page for Tutorials, Codes and Notebooks.Β Also,Β feel free to follow us onΒ TwitterΒ and donβt forget to join ourΒ 100k+ ML SubRedditΒ and Subscribe toΒ our Newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

