ASAM

This is the code&dataset for our paper [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment. AAAI 2018]

Generate Convert Improve

Install / Use

/learn @jacoxu/ASAM

About this skill

Quality Score

0/100

README

AAAI2018-Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment

=======================================================================

Our demo code is implemented in Keras (writtern in Python, and the backend is theano).

Usage:
$python main_run.py
or execute it in terminal background:
$bash run.sh

Notice:
(1). In order to aviod the version mismatch of Keras, we fork the verison_1.2.2 of Keras into this project.
(2). We use Matlab version of BSS_eval to evaluate NSDR.

Figure 1: Auditory Attention

Figure 1: Two specific attention tasks for auditory selection in a three speech mixture environment. One is top-down task-specific attention, and the other is bottom-up stimulus-driven attention. Figure 2: Framework

Figure 2: An illustration of our Auditory Selection with Attention and Memory (ASAM). (a): The overall architecture of the proposed ASAM. (b): Life-long memory module to memory the prior knowledge. In top-down attention scene, the dashed boxes and arrow are only conducted in the training phase and removed in the evaluation time.

Figure 3: Attention Heat Map

Figure 3: Effects of attention with different amounts of stimulus on one male and female mixture sample from WSJ0. (a) shows the SIR (Signal-to-Interference Ratio), SAR (Signal-to-Artifacts Ratio) and NSDR results, (b)-(d) are the auditory stimuli whose magnitudes are divided by the maximum magnitude, (e) is the mixture input spectrogram, (i) is the target spectrogram, (f)-(h) are attention maps based on the corresponding auditory stimuli and (j)-(l) are the corresponding predictions with their NSDR performances.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。