Choose Language Modal Design

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Abstract: Audio large language models (LLMs) are considered experts at recognizing sound objects, yet their performance relative to LLMs in other sensory modalities, such as visual or audio-visual ...

Transform Text Into Professional Audio Across 32 Languages for Just $39.99

You can customize speaking speed and choose from conversational, professional, male or female voice tones depending on your ...

IEEE

Multi-Dimensional Quality Assessment for UGC Videos via Modular Multi-Modal Vision-Language Models

Abstract: Recent advances in video processing and the growth of social media have led to a surge in user-generated content (UGC) videos. However, various factors can degrade their quality, ...

Scientific Research Publishing

Multimodal Digital Phenotyping for Bipolar Disorder: Robust Mood-State Classification and Early Relapse Risk Monitoring ()

Bipolar Disorder, Digital Phenotyping, Multimodal Learning, Face/Voice/Phone, Mood Classification, Relapse Prediction, T-SNE, Ablation Share and Cite: de Filippis, R. and Al Foysal, A. (2025) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results