Choose Language Modal Design

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Abstract: Audio large language models (LLMs) are considered experts at recognizing sound objects, yet their performance relative to LLMs in other sensory modalities, such as visual or audio-visual ...

Transform Text Into Professional Audio Across 32 Languages for Just $39.99

You can customize speaking speed and choose from conversational, professional, male or female voice tones depending on your ...

IEEE

Multi-Dimensional Quality Assessment for UGC Videos via Modular Multi-Modal Vision-Language Models

Abstract: Recent advances in video processing and the growth of social media have led to a surge in user-generated content (UGC) videos. However, various factors can degrade their quality, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Transform Text Into Professional Audio Across 32 Languages for Just $39.99

Multi-Dimensional Quality Assessment for UGC Videos via Modular Multi-Modal Vision-Language Models

Trending now