Born ML — Pure Go Machine Learning Framework

COLLECTIVE

Born ML is a modern Pure Go machine learning framework — GPU-accelerated training and inference on the first Pure Go GPU stack (gogpu/wgpu), zero CGO, zero Python dependencies. Tensors, autograd, optimizers, neural networks — all in Go.

This is Born ML — Pure Go Machine Learning Framework's page Born ML — Pure Go Machine Learning Framework

Contribute

All ways to contribute

About

Submit expense

Contribute

Become a financial contributor.

Financial Contributions

Recurring contribution

Backer

Become a backer for $5.00 per month and support us

Starts at

$5 USD / month

Recurring contribution

Supporter

Become a supporter for $25.00 per month. Your name listed in our README Supporters section.

$25 USD / month

Recurring contribution

Sponsor

Become a sponsor for $100.00 per month and support us

Starts at

$100 USD / month

Recurring contribution

Gold Sponsor

Become a gold sponsor for $500.00 per month. Large logo on README and website. Priority issue response.

$500 USD / month

Custom contribution

Donation

Make a custom one-time or recurring contribution.

Born ML — Pure Go Machine Learning Framework is all of us

Our contributors 2

Thank you for supporting Born ML — Pure Go Machine Learning Framework.

Kolkov

Admin

Ancha

Admin

About

Born ML — Production-Ready Machine Learning for Go
Born is a modern ML framework that lets Go developers train and deploy models as single binaries — no Python runtime, no CUDA, no Docker complexity. Inspired by Burn (Rust), built on the GoGPU ecosystem.

The Problem
Deploying ML in production is painful. Python sidecars, dependency hell, 5GB Docker images, slow cold starts, integration friction with Go backends. Every Go team that needs ML faces the same choice: maintain a Python service alongside their Go stack, or don't use ML at all.

The Solution
go get github.com/born-ml/born — and you have training + inference in the same binary, same toolchain, zero external dependencies. Born is what database/sql is for databases: the standard Go-native interface for ML.

What Born Does Today (v0.9.1)

- Train models from scratch on CPU and GPU (MNIST 97%+, recurrent models verified)
- Run LLM inference from GGUF files (LLaMA/TinyLlama 1.1B)
- Import ONNX models from PyTorch/TensorFlow (49 operators)
- GPU acceleration via WebGPU — Vulkan, Metal, DX12, Software backends
- Flash Attention 2 with O(N) memory
- Type-safe API with Go generics — errors at compile-time, not runtime
- AVX2 SIMD micro-kernels for CPU performance
- Software compute backend for CI — no GPU hardware required for testing

Technical Architecture

- Pure Go — zero CGO, trivial cross-compilation, FROM scratch Docker images
- Decorator pattern — composable backends: autodiff.New(webgpu.New())
- All backward ops via forward composition — tensors never leave the GPU during training
- TieredPool GPU memory management from device limits (Burn/CubeCL pattern)
- Explicit buffer lifecycle with Persist/Unpersist API for recurrent models

Part of the GoGPU Ecosystem

Born uses gogpu/wgpu for GPU compute and gogpu/naga for shader compilation. The GoGPU ecosystem spans 15 repositories and 1.1M+ lines of pure Go code — graphics, GPU compute, UI, and ML all without C dependencies.

Roadmap to v1.0

- v0.10.0 — Resource budget enforcement, distributed multi-GPU training
- v0.11.0 — Quantization (INT8, GPTQ), model zoo
- v0.12.0 — Multi-node training, production serving
- v1.0.0 — API freeze, LTS stability guarantees

Why We Need Funding

Born ML is built and maintained by a solo developer. The framework has grown to 40K+ lines of code with 30+ releases in 7 months — but sustaining this pace and reaching v1.0 requires resources.

GPU CI Infrastructure — Today our CI runs GPU tests on a software backend. Real GPU bugs only surface on real hardware. We need dedicated CI runners with Intel, NVIDIA, and AMD GPUs across Windows, Linux, and macOS to catch driver-specific issues before they reach users.

ARM Performance — Born's SIMD optimizations currently cover x86 (AVX2) only. Apple Silicon (M1–M4) and ARM servers (AWS Graviton, Ampere Altra) have no optimized path — inference runs 10-50x slower than it should. We need ARM hardware and dedicated time for NEON micro-kernels.

Multi-GPU Testing — Distributed training requires multi-GPU setups we don't have. A dual-GPU workstation would enable development of Data Parallelism, AllReduce, and Tensor Parallelism.

Full-Time Development — The Go ecosystem has no production ML framework. Born is filling that gap, but competing with PyTorch (hundreds of engineers) as a solo project means every hour counts. Funding allows dedicated focus.

Community Growth — Born has 4 external contributors and growing interest. Funding enables proper PR reviews, contributor onboarding, documentation, and tutorials to grow the Go ML community.

Transparency

All expenses are documented on this Open Collective page. We follow the same transparency model as the GoGPU ecosystem.

Links

- GitHub: https://github.com/born-ml/born
- GoGPU Ecosystem: https://github.com/gogpu
- Discussions: https://github.com/born-ml/born/discussions

Our team

Kolkov

Admin

Ancha

Admin