Moozonian

About 0 results
AI Overview
Generating...
Sponsored • AdSense Integration Active
arxiv.org arXiv
arxiv.org › abs › 2305.15266v3
Diffusion-Based Audio Inpainting
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of existing methods produce plausible reconstructions when the gap lengths are short, but struggle to reconstruct ga...
arxiv.org arXiv
arxiv.org › abs › 1908.02590v3
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic pr...
arxiv.org arXiv
arxiv.org › abs › 1905.06148v2
A general-purpose deep learning approach to model time-varying audio effects
Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect unit...
arxiv.org arXiv
arxiv.org › abs › 2411.18222v1
Towards Improved Objective Perceptual Audio Quality Assessment -- Part 1: A Novel Data-Driven Cognitive Model
Efficient audio quality assessment is vital for streamlining audio codec development. Objective assessment tools have been developed over time to algorithmically predict quality ratings from subjectiv...
arxiv.org arXiv
arxiv.org › abs › 2505.20166v3
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Audio-aware large language models (ALLMs) have recently made great strides in understanding and processing audio inputs. These models are typically adapted from text-based large language models (LLMs)...
arxiv.org arXiv
arxiv.org › abs › 2105.01531v2
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding
Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image dat...
arxiv.org arXiv
arxiv.org › abs › gr-qc › 9810059v1
Space-time distributions
The space-time foliation Sigma compatible with the gravitational field g on a 4-manifold M determines a fibration pi of M, pi : M -> N is a surjective submersion over the 1-dimensional leaves space N....
arxiv.org arXiv
arxiv.org › abs › 2501.04116v3
dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing
Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-...
arxiv.org arXiv
arxiv.org › abs › 2311.08396v1
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audi...
arxiv.org arXiv
arxiv.org › abs › 2102.01243v3
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Audio tagging is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing model performance, which mostly comes from the d...