ASAM
This is the code&dataset for our paper [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment. AAAI 2018]
Install / Use
/learn @jacoxu/ASAMREADME
AAAI2018-Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment
=======================================================================
Our demo code is implemented in Keras (writtern in Python, and the backend is theano).
Usage:
$python main_run.py
or execute it in terminal background:
$bash run.sh
Notice:
(1). In order to aviod the version mismatch of Keras, we fork the verison_1.2.2 of Keras into this project.
(2). We use Matlab version of BSS_eval to evaluate NSDR.

Figure 1: Two specific attention tasks for auditory selection in a three speech mixture environment. One is top-down task-specific attention, and the other is bottom-up stimulus-driven attention.

Figure 2: An illustration of our Auditory Selection with Attention and Memory (ASAM). (a): The overall architecture of the proposed ASAM. (b): Life-long memory module to memory the prior knowledge. In top-down attention scene, the dashed boxes and arrow are only conducted in the training phase and removed in the evaluation time.

Figure 3: Effects of attention with different amounts of stimulus on one male and female mixture sample from WSJ0. (a) shows the SIR (Signal-to-Interference Ratio), SAR (Signal-to-Artifacts Ratio) and NSDR results, (b)-(d) are the auditory stimuli whose magnitudes are divided by the maximum magnitude, (e) is the mixture input spectrogram, (i) is the target spectrogram, (f)-(h) are attention maps based on the corresponding auditory stimuli and (j)-(l) are the corresponding predictions with their NSDR performances.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
