Improving CAD Design With LLMs

December 19, 2025

Modern products, from bike components to drones, start as digital 3D models, but creating Computer-Aided Design (CAD) models is still slow and expensive. Our new approach, CADmium, rethinks 3D design automation by training a Large Language Model (LLM) to translate natural language into 3D design code. This text-driven CAD design is easy to adopt, fast to refine, and produces highly accurate 3D shapes.

Rethinking Text-to-CAD as Text-to-Code

A CAD model is a sequence of steps turning flat sketches (lines, arcs, circles) into 3D features, just like a recipe. This structure aligns perfectly with language models, if (and only if) the language is specific enough to eliminate geometrical ambiguity. CADmium bridges that gap by combining a multimodal annotation pipeline with code-LLM fine-tuning.

Instead of inventing translation methods to represent 3D data like custom tokenizers or bespoke vector embeddings, we treat CAD as code. This allows us to leverage Qwen2.5-Coder, a proven instruction-tuned code model, keeping our architecture simple, efficient, and flexible.

High-Fidelity Training: 176k Human-Like CAD Models Descriptions

To construct a robust training set, we overhauled the industry-standard DeepCAD dataset.

We upgraded 176,017 CAD models by adding concise, human-like text descriptions. Generated by GPT-4.1 from multi-view renders and design histories, these prompts read naturally yet retain the geometrical specificity needed for unambiguous reconstruction (e.g., profiles, constraints, measures).

We then fine-tuned Qwen2.5-Coder to translate those descriptions into minimal-JSON CAD histories, the step-by-step code that builds a part. Because Qwen2.5-Coder is a code model, it excels at managing structured formats, brackets, and long-range dependencies inherent in 3D modeling.

Ensuring Geometric Precision and Functional Viability

Standard metrics like point-cloud distances miss critical structural flaws. We apply advanced topology-aware metrics to guarantee that every generated model is mathematically valid. We utilize Sphericity Discrepancy, Discrete Mean Curvature Difference, Exact Euler Characteristic Match, plus watertightness to assess compactness, curvature, and topological correctness.

Tested on human-written prompts, CADmium outperformed specialized baselines in both geometric precision (lines, circles, extrusions) and topology accuracy. Our benchmarks prove that code-LLMs are highly effective CAD generators, with larger models delivering more reliable and precise files.

Built for Easy Adaptation & Scale

We utilized standard training protocols (such as LoRA and FSDP on A100s) to enable any team to easily adapt CADmium for their specific domain.

To facilitate replication, benchmarking, and extension, we released our full stack: annotations, training code, fine-tuning configurations, and model checkpoints.

Links