AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head - Summary
The paper proposes a multi-modal AI system named AudioGPT that complements Large Language Models (LLMs) with foundation models to process complex audio information and solve numerous understanding and generation tasks. AudioGPT is connected with an input/output interface (ASR, TTS) to support spoke