ModelMeta
452 model profiles·22 providers·Refresh cadence hourly
Back to models
OpenAIGPT-4o

GPT 4o Mini Audio Preview

NewPreview

O mini AudioSmaller model capable of audio inputs and outputs

Context window

16K

16,384 tokens

Input price

$10

Per 1M input tokens

Output price

$20

Per 1M output tokens

Access

Closed

Registry access classification

Overview

Where this model fits best

A premium detail page should answer the top-level selection question first, then fan out into technical specifics.

Best fit

Use this model when you need a well-documented, structured option inside the registry and want a single place to inspect pricing, capabilities, and operational limits.

Use cases

text-to-speechvisiontext-chatspeech-to-textfunction-calling

Capabilities

What you can actually do with it

Grouped by features, endpoints, and tools so the page reads like a product brief instead of a flat checklist.

Capabilities

High-level features exposed by the model runtime.

StreamingFunction CallingStructured OutputJson ModeSystem Prompt

Endpoints

APIs and surfaces this model can be called through.

Chat CompletionsBatchFine TuningVision

Pricing

Commercial shape at a glance

Pricing is presented as carded offers instead of a plain table, which makes the hierarchy easier to scan on desktop and mobile.

Standard API

Input

$10 / 1M

Output

$20 / 1M

Default request pricing when using the primary endpoint.

Controls

Runtime knobs worth knowing

Supported parameters are surfaced as control cards so sampling behavior reads like an operational surface, not a spreadsheet.

Temperature

1

0 to 2

Top P

1

0 to 1

Frequency penalty

0

-2 to 2

Presence penalty

0

-2 to 2

Max stop sequences

4

Maximum number of stop values accepted

Specifications

Technical reference

Dense metadata is moved into a card grid so the bottom of the page still feels deliberate and easy to skim.

Model ID

gpt-4o-mini-audio-preview

Provider

OpenAI

Family

GPT-4o

Access type

Closed

Input modalities

text, image, audio

Output modalities

text, audio

Max output tokens

16,384