Exploring Hugging Face: Audio Classification
Exploring Hugging Face: Audio Classification
Audio Classification Using Models From Hugging Face
The audio classification activity in Hugging Face includes categorizing audio knowledge into predefined classes or labels.
Audio recordsdata are transformed right into a format (reminiscent of waveforms or spectrograms) that the chosen mannequin can course of.
A waveform is a visible illustration of an audio sign’s amplitude over time. It reveals how the amplitude of the sound wave modifications. In audio processing, waveforms are essential for analyzing the traits of the sound, reminiscent of its loudness, pitch, and length.
import librosa
audio_path = 'speech.wav'
waveform, sample_rate = librosa.load(audio_path, sr=None)
We can use librosa
package deal for transformation. The load
operate from the librosa
library is used to learn the audio file specified by audio_path
.
waveform
is a numpy array that represents the audio sign’s amplitude over time. It’s a sequence of floating-point numbers that symbolize the sound wave.
sample_rate
is the variety of samples of audio carried per second, measured in Hz (Hertz). It defines the variety of knowledge factors used to symbolize every second of audio. The sr
parameter specifies the pattern fee. By setting sr=None
, we’re telling librosa
to make use of the unique pattern fee of the audio file, which suggests it is not going to resample the audio and can preserve its authentic high quality.
import matplotlib.pyplot as plt
time_axis = librosa.times_like(waveform, sr=sample_rate)
plt.determine(figsize=(10, 4))
plt.plot(time_axis, waveform)
plt.title('Waveform of Audio')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.present()
Now, let’s use the MIT/ast-finetuned-audioset-10–10–0.4593 mannequin from Hugging Face.
from transformers import pipeline
pipe = pipeline("audio-classification", mannequin="MIT/ast-finetuned-audioset-10-10-0.4593")
outcomes = pipe(waveform, sample_rate=sample_rate)
print(outcomes)
"""
[{'score': 0.7925717830657959, 'label': 'Speech'},
{'score': 0.03275119513273239, 'label': 'Speech synthesizer'},
{'score': 0.02389572374522686, 'label': 'Narration, monologue'},
{'score': 0.019056597724556923, 'label': 'Sound effect'},
{'score': 0.01026979461312294, 'label': 'Female speech, woman speaking'}]
"""
The mannequin assigns the best rating to “Speech,” indicating that it believes the audio is almost certainly to be speech. The different labels are the mannequin’s subsequent greatest guesses however with considerably decrease confidence scores.
Let’s use one other mannequin:
pipe = pipeline("audio-classification", mannequin="excellent/wav2vec2-base-superb-sid")
outcomes = pipe(waveform, sample_rate=sample_rate)
print(outcomes)
"""
[{'score': 0.47217562794685364, 'label': 'id10652'},
{'score': 0.23792167007923126, 'label': 'id10335'},
{'score': 0.10524415224790573, 'label': 'id10856'},
{'score': 0.08934732526540756, 'label': 'id10651'},
{'score': 0.022524842992424965, 'label': 'id10396'}]
"""
Each label
corresponds to a singular speaker identifier (ID). The rating
represents the mannequin’s confidence that the audio section belongs to the speaker related to that ID. You can get the mappings from the documentation of this mannequin.
Another mannequin predicts the emotion within the audio file:
pipe = pipeline("audio-classification", mannequin="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")
outcomes = pipe(waveform, sample_rate=sample_rate)
print(outcomes)
"""
[{'score': 0.13225293159484863, 'label': 'disgust'},
{'score': 0.12851978838443756, 'label': 'neutral'},
{'score': 0.12753769755363464, 'label': 'calm'},
{'score': 0.1254863142967224, 'label': 'angry'},
{'score': 0.12439820170402527, 'label': 'fearful'}]
"""
Sources
https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593
https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english
https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
HI-FI News
by way of Artificial Intelligence on Medium https://ift.tt/FwUSseI
March 17, 2024 at 12:48AM
-
Product on saleAudiophile Vinyl Records Cleaning BundleOriginal price was: €44.95.€34.95Current price is: €34.95. excl. VAT
-
Product on saleEasy Start Vinyl Records Cleaning KitOriginal price was: €39.90.€29.90Current price is: €29.90. excl. VAT
-
Vinyl Records Cleaner Easy Groove Concentrate€19.95 excl. VAT
-
Easy Groove Super Set€199.00 excl. VAT
-
Easy Groove Enzycaster – vinyl records prewash cleaner€25.00 excl. VAT
-
Easy Groove Spray&Wipe vinyl records cleaner€19.95 excl. VAT