The new OpenAI Open-Weight models are here¶

Veit Schiele

6 August 2025

~3 minutes

Yesterday, OpenAI released its new Open-Weight models under Apache 2.0, which differ in size: New: gpt-oss. The * gpt-oss-120b* model is comparable to OpenAI o4-mini, gpt-oss-20b to OpenAI o3-mini. gpt-oss-20b can run on devices with 16 GB of RAM or more, making it suitable for local inference or rapid iterations without costly infrastructure. But even the large gpt-oss-20b model runs fast enough on my Mac laptop with 64 GB of RAM.

Note

There are a number of ways to run these models. For my first attempts, I used LM Studio to install openai/gpt-oss-20b. It then consumes just over 11 GB with reasoning=medium and processes approximately 55 tokens/second.

The publication on how the models were trained also provides interesting insights: gpt-oss-120b & gpt-oss-20b Model Card (PDF, 5.1 MB). The models were specifically trained to use web browsers and Python tools more effectively:

A browsing tool allows you to search for and open content available on the web.
A Python tool executes code in a state-oriented Jupyter Notebook environment.

There is also a section on using Python tools in the openai/gpt-oss: repository.

Finally, OpenAI Harmony has also been released under the Apache 2 licence. It is inspired by their new Responses API. The format is described in OpenAI Harmony Response Format. It contains some exciting concepts:

A fine-grained role model with the roles system, developer, user, assistant and tool.
Three output channels: analysis, commentary and final.

In the graphical user interface, usually only the final channel is visible, analysis is used for the chain of thought, and commentary is used for tools.

I have not yet tested how well the tool calls work with local models. So far, I have been rather disappointed in this regard. This was probably due to the fact that I was able to execute individual calls, but with Claude, dozens of tools are called in a single session.