Born ML — Pure Go Machine Learning Framework
Fiscal Host: Open Source Collective
Born ML is a modern Pure Go machine learning framework — GPU-accelerated training and inference on the first Pure Go GPU stack (gogpu/wgpu), zero CGO, zero Python dependencies. Tensors, autograd, optimizers, neural networks — all in Go.
Contribute
Become a financial contributor.
Financial Contributions
Born ML — Pure Go Machine Learning Framework is all of us
Our contributors 2
Thank you for supporting Born ML — Pure Go Machine Learning Framework.
About
Born ML — Production-Ready Machine Learning for Go
Born is a modern ML framework that lets Go developers train and deploy models as single binaries — no Python runtime, no CUDA, no Docker complexity. Inspired by Burn (Rust), built on the GoGPU ecosystem.
The Problem
Deploying ML in production is painful. Python sidecars, dependency hell, 5GB Docker images, slow cold starts, integration friction with Go backends. Every Go team that needs ML faces the same choice: maintain a Python service alongside their Go stack, or don't use ML at all.
The Solution
go get github.com/born-ml/born — and you have training + inference in the same binary, same toolchain, zero external dependencies. Born is what database/sql is for databases: the standard Go-native interface for ML.
What Born Does Today (v0.9.1)
- Train models from scratch on CPU and GPU (MNIST 97%+, recurrent models verified)
- Run LLM inference from GGUF files (LLaMA/TinyLlama 1.1B)
- Import ONNX models from PyTorch/TensorFlow (49 operators)
- GPU acceleration via WebGPU — Vulkan, Metal, DX12, Software backends
- Flash Attention 2 with O(N) memory
- Type-safe API with Go generics — errors at compile-time, not runtime
- AVX2 SIMD micro-kernels for CPU performance
- Software compute backend for CI — no GPU hardware required for testing
Technical Architecture
- Pure Go — zero CGO, trivial cross-compilation, FROM scratch Docker images
- Decorator pattern — composable backends: autodiff.New(webgpu.New())
- All backward ops via forward composition — tensors never leave the GPU during training
- TieredPool GPU memory management from device limits (Burn/CubeCL pattern)
- Explicit buffer lifecycle with Persist/Unpersist API for recurrent models
Part of the GoGPU Ecosystem
Born uses gogpu/wgpu for GPU compute and gogpu/naga for shader compilation. The GoGPU ecosystem spans 15 repositories and 1.1M+ lines of pure Go code — graphics, GPU compute, UI, and ML all without C dependencies.
Roadmap to v1.0
- v0.10.0 — Resource budget enforcement, distributed multi-GPU training
- v0.11.0 — Quantization (INT8, GPTQ), model zoo
- v0.12.0 — Multi-node training, production serving
- v1.0.0 — API freeze, LTS stability guarantees
Why We Need Funding
Born ML is built and maintained by a solo developer. The framework has grown to 40K+ lines of code with 30+ releases in 7 months — but sustaining this pace and reaching v1.0 requires resources.
GPU CI Infrastructure — Today our CI runs GPU tests on a software backend. Real GPU bugs only surface on real hardware. We need dedicated CI runners with Intel, NVIDIA, and AMD GPUs across Windows, Linux, and macOS to catch driver-specific issues before they reach users.
ARM Performance — Born's SIMD optimizations currently cover x86 (AVX2) only. Apple Silicon (M1–M4) and ARM servers (AWS Graviton, Ampere Altra) have no optimized path — inference runs 10-50x slower than it should. We need ARM hardware and dedicated time for NEON micro-kernels.
Multi-GPU Testing — Distributed training requires multi-GPU setups we don't have. A dual-GPU workstation would enable development of Data Parallelism, AllReduce, and Tensor Parallelism.
Full-Time Development — The Go ecosystem has no production ML framework. Born is filling that gap, but competing with PyTorch (hundreds of engineers) as a solo project means every hour counts. Funding allows dedicated focus.
Community Growth — Born has 4 external contributors and growing interest. Funding enables proper PR reviews, contributor onboarding, documentation, and tutorials to grow the Go ML community.
Transparency
All expenses are documented on this Open Collective page. We follow the same transparency model as the GoGPU ecosystem.
Links
- GitHub: https://github.com/born-ml/born
- GoGPU Ecosystem: https://github.com/gogpu
- Discussions: https://github.com/born-ml/born/discussions
Born is a modern ML framework that lets Go developers train and deploy models as single binaries — no Python runtime, no CUDA, no Docker complexity. Inspired by Burn (Rust), built on the GoGPU ecosystem.
The Problem
Deploying ML in production is painful. Python sidecars, dependency hell, 5GB Docker images, slow cold starts, integration friction with Go backends. Every Go team that needs ML faces the same choice: maintain a Python service alongside their Go stack, or don't use ML at all.
The Solution
go get github.com/born-ml/born — and you have training + inference in the same binary, same toolchain, zero external dependencies. Born is what database/sql is for databases: the standard Go-native interface for ML.
What Born Does Today (v0.9.1)
- Train models from scratch on CPU and GPU (MNIST 97%+, recurrent models verified)
- Run LLM inference from GGUF files (LLaMA/TinyLlama 1.1B)
- Import ONNX models from PyTorch/TensorFlow (49 operators)
- GPU acceleration via WebGPU — Vulkan, Metal, DX12, Software backends
- Flash Attention 2 with O(N) memory
- Type-safe API with Go generics — errors at compile-time, not runtime
- AVX2 SIMD micro-kernels for CPU performance
- Software compute backend for CI — no GPU hardware required for testing
Technical Architecture
- Pure Go — zero CGO, trivial cross-compilation, FROM scratch Docker images
- Decorator pattern — composable backends: autodiff.New(webgpu.New())
- All backward ops via forward composition — tensors never leave the GPU during training
- TieredPool GPU memory management from device limits (Burn/CubeCL pattern)
- Explicit buffer lifecycle with Persist/Unpersist API for recurrent models
Part of the GoGPU Ecosystem
Born uses gogpu/wgpu for GPU compute and gogpu/naga for shader compilation. The GoGPU ecosystem spans 15 repositories and 1.1M+ lines of pure Go code — graphics, GPU compute, UI, and ML all without C dependencies.
Roadmap to v1.0
- v0.10.0 — Resource budget enforcement, distributed multi-GPU training
- v0.11.0 — Quantization (INT8, GPTQ), model zoo
- v0.12.0 — Multi-node training, production serving
- v1.0.0 — API freeze, LTS stability guarantees
Why We Need Funding
Born ML is built and maintained by a solo developer. The framework has grown to 40K+ lines of code with 30+ releases in 7 months — but sustaining this pace and reaching v1.0 requires resources.
GPU CI Infrastructure — Today our CI runs GPU tests on a software backend. Real GPU bugs only surface on real hardware. We need dedicated CI runners with Intel, NVIDIA, and AMD GPUs across Windows, Linux, and macOS to catch driver-specific issues before they reach users.
ARM Performance — Born's SIMD optimizations currently cover x86 (AVX2) only. Apple Silicon (M1–M4) and ARM servers (AWS Graviton, Ampere Altra) have no optimized path — inference runs 10-50x slower than it should. We need ARM hardware and dedicated time for NEON micro-kernels.
Multi-GPU Testing — Distributed training requires multi-GPU setups we don't have. A dual-GPU workstation would enable development of Data Parallelism, AllReduce, and Tensor Parallelism.
Full-Time Development — The Go ecosystem has no production ML framework. Born is filling that gap, but competing with PyTorch (hundreds of engineers) as a solo project means every hour counts. Funding allows dedicated focus.
Community Growth — Born has 4 external contributors and growing interest. Funding enables proper PR reviews, contributor onboarding, documentation, and tutorials to grow the Go ML community.
Transparency
All expenses are documented on this Open Collective page. We follow the same transparency model as the GoGPU ecosystem.
Links
- GitHub: https://github.com/born-ml/born
- GoGPU Ecosystem: https://github.com/gogpu
- Discussions: https://github.com/born-ml/born/discussions