fbpx

Get began with the brand new Advanced Audio Models in Azure OpenAI Service

Get began with the brand new Advanced Audio Models in Azure OpenAI Service

Public Preview of Azure OpenAI Audio models

We’re excited to announce the preview availability of Azure OpenAI’s superior audio fashions—GPT-4o-Transcribe, GPT-4o-Mini-Transcribe, and GPT-4o-Mini-TTS. This information offers builders with important insights and steps to successfully leverage these superior audio capabilities of their purposes.

What’s New in Azure OpenAI Audio Models?

Azure OpenAI introduces three highly effective new audio fashions, obtainable for deployment at this time in East US2 on Azure AI Foundry.

  • GPT-4o-Transcribe and GPT-4o-Mini-Transcribe: Speech-to-text fashions outperforming earlier benchmarks.
  • GPT-4o-Mini-TTS: A customizable text-to-speech mannequin enabling detailed directions on speech traits.

Model Comparison

Feature GPT-4o-Transcribe GPT-4o-Mini-Transcribe GPT-4o-Mini-TTS
Performance Best Quality Great Quality Best Quality
Speed Fast Fastest Fastest
Input Text, Audio Text, Audio Text
Output Text Text Audio
Streaming
Ideal Use Cases Accurate transcription for difficult environments like buyer name facilities and automatic assembly notes Rapid transcription for dwell captioning, quick-response apps, and budget-sensitive eventualities Customizable interactive voice outputs for chatbots, digital assistants, accessibility instruments, and academic apps

Technical Innovations

  • Targeted Audio Pretraining: OpenAI’s GPT-4o audio fashions leverage intensive pretraining on specialised audio datasets, considerably enhancing understanding of speech nuances.
  • Advanced Distillation Techniques: Employing subtle distillation strategies, information from bigger fashions is transferred to environment friendly, smaller fashions, preserving excessive efficiency.
  • Reinforcement Learning: Integrated RL methods dramatically enhance transcription accuracy and scale back misrecognition, attaining state-of-the-art efficiency for the speech-to-text fashions in advanced speech recognition duties.

Getting Started Guide for Developers

Use the Azure OpenAI TTS Demo repository to discover GPT‑4o audio fashions by sensible, fingers‑on examples.
[cta-button text="Get started" url="https://ift.tt/hMLoE7v" color="btn-primary"]

Step 1: Clone the Repository

git clone https://github.com/Azure-Samples/azure-openai-tts-demo.git
cd azure-openai-tts-demo

Step 2: Configure Your Environment

Create your digital surroundings and set up dependencies:

python -m venv .venv
supply .venv/bin/activate  # macOS/Linux
.venvScriptsactivate     # Windows
pip set up -r necessities.txt

Set up your Azure credentials by making a .env file:

cp .env.instance .env
# Edit .env along with your Azure OpenAI endpoint and API key

Example .env:

AZURE_OPENAI_ENDPOINT="https://<your-resource-name>.openai.azure.com/"
AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
AZURE_OPENAI_API_VERSION="2025-04-14"

Step 3: Run the Interactive Gradio Soundboard

Launch the demo to experiment interactively:

python soundboard.py

Select completely different voices, vibes, and take heed to generated speech.

Step 4: Explore Additional Sample Scripts

Run pattern scripts for particular audio duties:

  • Streaming audio to a file
python streaming-tts-to-file-sample.py
  • Asynchronous streaming and playback
python async-streaming-tts-sample.py

Developer Impact

Integrating Azure OpenAI superior audio fashions permits builders to:

  • Easily incorporate superior transcription and TTS performance.
  • Create extremely interactive, intuitive voice-driven purposes.
  • Enhance person expertise with customizable and expressive audio interactions.

Further Exploration

We encourage builders to leverage these modern audio fashions and share their insights and suggestions!

HI-FI News

through Microsoft for Developers https://ift.tt/1v2hGLk

April 16, 2025 at 06:06PM

Select your currency