fbpx

Learning semantic consistency for audio-visual zero-shot studying

Learning semantic consistency for audio-visual zero-shot studying

Audio-visual zero-shot studying requires an understanding of the connection between audio and visible data to find out unseen courses. Despite many efforts and important progress within the discipline, many current strategies are likely to give attention to studying robust representations, neglecting the semantic consistency between audio and video in addition to the inherent hierarchical construction of the info. To deal with these points, we suggest Learning Semantic Consistency for Audio-Visual Zero-shot Learning. Specifically, we make use of an consideration mechanism to boost cross-modal data interactions, aiming to seize the semantic consistency between audio and visible information. Meanwhile, we introduce a hyperbolic house to mannequin the hierarchical construction of the info itself. Moreover, the proposed method features a novel loss operate that considers the relationships between enter modalities, decreasing the space between options of various modalities. To consider the proposed methodology, we take a look at it on three benchmark datasets $$hbox {VGGSound-GZS}^{cls}}$$, $$hbox {UCF-GZS}^{cls}}$$, and $$hbox {ActivityNet-GZS}^{cls}}$$. Extensive experimental outcomes present that the proposed methodology achieves state-of-the-art efficiency on all three datasets. For instance, on the $$hbox {UCF-GZS}^{cls}}$$ dataset, the harmonic imply is improved by 5.7%. Code and information out there at https://github.com/ybyangjing/LSC-AVZSL.

HI-FI News

through Latest Results https://ift.tt/mTpFLlI

April 17, 2025 at 11:48PM

Select your currency