Justine Tunney of the Mozilla Internet Ecosystem (MIECO) co-authored this blog post.
The first release of llamafile is being announced today, with an invitation for the open source community to participate in this new project.
llamafile allows users to turn large language model (LLM) weights into executables.
For instance, a set of LLM weights in a 4GB GGUF file can be transformed into a binary that operates on six different operating systems without requiring installation.
This significantly simplifies the distribution and execution of LLMs. Furthermore, as models and their weight formats evolve, llamafile provides a method to ensure that a given set of weights remains usable and performs consistently and reproducibly indefinitely.
This achievement resulted from combining two projects: llama.cpp (a leading open source LLM chatbot framework) with Cosmopolitan Libc (an open source project enabling C programs to be compiled and run on numerous platforms and architectures). The development also involved solving several interesting challenges, such as adding GPU and dlopen() support to Cosmopolitan; more details are available in the project’s README.
This initial release of llamafile is a product of Mozilla’s innovation group and was developed by Justine Tunney, the creator of Cosmopolitan. Justine has recently collaborated with Mozilla through MIECO, a program that funded her work on the 3.0 release (Hacker News discussion) of Cosmopolitan. With llamafile, Justine is contributing more directly to Mozilla projects.
llamafile is licensed under Apache 2.0, and contributions are encouraged. Changes made to llama.cpp itself are licensed MIT (matching llama.cpp’s license) to facilitate potential future upstreaming. llama.cpp is highly regarded, and llamafile would not have been possible without it and Cosmopolitan.
It is hoped that llamafile will be useful, and feedback is welcome here.

