Hiroshi Saruwatari 研究室

主宰者：Hiroshi Saruwatari

東京大学

AI 要約（直近 5 年の研究成果）

Saruwatari研究室は、音声信号処理とその応用を中心に研究を展開しています。複数の音声が混在する環境から目的の音声を抽出・強化する技術、いわゆる音源分離の高度な手法開発に力を注いでいます。空間的な配置情報を活用した音源分離、複数のマイクロフォンアレイが配置された環境での信号処理、ドローンに搭載したマイクロフォンを用いた音声収録の改善など、様々な実環境を想定した手法を提案しています。さらに、音声生成システムの改善にも取り組んでいます。テキストから音声を合成する際に、感情表現や話者の特性を細かく制御するための技術、また複数のサンプリング周波数に対応できるニューラルネットワークの設計を研究しています。これらの研究では、深層学習モデルの事前学習知識を効果的に活用しながら、過学習を抑制する工夫も施されています。研究室全体を通じて、実際の機器や環境での動作を重視する傾向が見られます。異なるサンプリング周波数への対応、限定的な計算資源での処理、既有情報の活用など、実用性を意識した課題解決を目指しており、これが音声処理分野における応用研究の発展に貢献しています。

※ AI（Claude）が、公開されている論文要旨から研究の問い・手法・主要な発見を事実情報として抽出・再構成して自動生成しています。誤りを含む可能性があるため、正確性は研究室公式情報でご確認ください。

外部リンク

研究成果（189 件）

[2026] DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
[2026] Spatial-CLAP: Learning Spatially-Aware Audio–Text Embeddings for Multi-Source Conditions
DOI: https://doi.org/10.1109/icassp55912.2026.11460767
[2026] DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
[2025] Active noise cancellation in space containing scattering objects based on kernel interpolation
DOI: https://doi.org/10.1121/10.0037383
[2025] Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
DOI: https://doi.org/10.1109/icassp49660.2025.10887678
[2025] TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models From Dark Data
DOI: https://doi.org/10.1109/taslpro.2025.3633089
[2025] Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis With Fast Demixing Matrix Estimation
DOI: https://doi.org/10.1109/access.2025.3569590
[2025] Music Bleeding-sound Reduction Based on Time-channel Nonnegative Matrix Factorization
DOI: https://doi.org/10.1561/116.20250020
[2025] Spatial Active Noise Control Based on Kernel Interpolation With Individual Directional Weighting
DOI: https://doi.org/10.1561/116.20250034
[2025] Toward Data-Efficient Speech Synthesis: Active Learning-Based Corpus Construction for Multi-Speaker Text-to-Speech Synthesis
DOI: https://doi.org/10.1109/access.2025.3645005

続きを表示（残り 179 件）

[2025] Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model
DOI: https://doi.org/10.1016/j.specom.2025.103331
[2025] Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis With Fast Demixing Matrix Estimation
DOI: https://doi.org/10.1109/access.2025.3569590
[2025] Music Bleeding-sound Reduction Based on Time-channel Nonnegative Matrix Factorization
DOI: https://doi.org/10.1561/116.20250020
[2025] Spatial Active Noise Control Based on Kernel Interpolation With Individual Directional Weighting
DOI: https://doi.org/10.1561/116.20250034
[2025] Toward Data-Efficient Speech Synthesis: Active Learning-Based Corpus Construction for Multi-Speaker Text-to-Speech Synthesis
DOI: https://doi.org/10.1109/access.2025.3645005
[2025] Language-queried target speech extraction using para-linguistic and non-linguistic prompts
DOI: https://doi.org/10.1250/ast.e25.27
[2025] Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model
DOI: https://doi.org/10.1016/j.specom.2025.103331
[2025] Stride conversion algorithms for convolutional layers and its application to sampling-frequency-independent deep neural networks
DOI: https://doi.org/10.1016/j.sigpro.2025.110420
[2025] Sound Source Enhancement Using Power Spectral Density Estimation in Beamspace for a Dual Unmanned Aerial Vehicle System
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248984
[2025] Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249057
[2025] Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249098
[2025] Stride conversion algorithms for convolutional layers and its application to sampling-frequency-independent deep neural networks
DOI: https://doi.org/10.1016/j.sigpro.2025.110420
[2025] Sound Source Enhancement Using Power Spectral Density Estimation in Beamspace for a Dual Unmanned Aerial Vehicle System
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248984
[2025] Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249057
[2025] Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249098
[2025] Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248982
[2025] Local Equivariance Error-Based Metrics for Evaluating Sampling-Frequency-Independent Property of Neural Network
DOI: https://doi.org/10.23919/eusipco63237.2025.11226337
[2025] Semi-blind source separation for unmanned aerial vehicle audition
DOI: https://doi.org/10.1016/j.apacoust.2025.111019
[2025] RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
DOI: https://doi.org/10.21437/interspeech.2025-1830
[2025] Initiation of apremilast treatment decreases prescribed topical therapy amount in patients with psoriasis: a health insurance claims study in Japan
DOI: https://doi.org/10.1080/09546634.2025.2535686
[2025] Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248982
[2025] Local Equivariance Error-Based Metrics for Evaluating Sampling-Frequency-Independent Property of Neural Network
DOI: https://doi.org/10.23919/eusipco63237.2025.11226337
[2025] Semi-blind source separation for unmanned aerial vehicle audition
DOI: https://doi.org/10.1016/j.apacoust.2025.111019
[2025] RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
DOI: https://doi.org/10.21437/interspeech.2025-1830
[2025] Initiation of apremilast treatment decreases prescribed topical therapy amount in patients with psoriasis: a health insurance claims study in Japan
DOI: https://doi.org/10.1080/09546634.2025.2535686
[2025] Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
[2025] Language-queried target speech extraction using para-linguistic and non-linguistic prompts
DOI: https://doi.org/10.1250/ast.e25.27
[2025] Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
[2025] Design of new auxiliary function for fully blind spatially regularized independent low-rank matrix analysis
DOI: https://doi.org/10.1121/10.0037821
[2025] Hearing-aids system using distributed assistive device and blind speech extraction method under diffuse noise
DOI: https://doi.org/10.1121/10.0037481
[2025] Active noise cancellation in space containing scattering objects based on kernel interpolation
DOI: https://doi.org/10.1121/10.0037383
[2025] TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models From Dark Data
DOI: https://doi.org/10.1109/taslpro.2025.3633089
[2025] Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
DOI: https://doi.org/10.1109/icassp49660.2025.10887678
[2025] Design of new auxiliary function for fully blind spatially regularized independent low-rank matrix analysis
DOI: https://doi.org/10.1121/10.0037821
[2025] Hearing-aids system using distributed assistive device and blind speech extraction method under diffuse noise
DOI: https://doi.org/10.1121/10.0037481
[2024] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence
DOI: https://doi.org/10.1561/116.00000242
[2024] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
DOI: https://doi.org/10.1109/taslp.2024.3369537
[2024] Cross-Dialect Text-to-Speech In Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level Bert
DOI: https://doi.org/10.1109/slt61566.2024.10832155
[2024] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence
DOI: https://doi.org/10.1561/116.00000242
[2024] JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions
DOI: https://doi.org/10.1109/access.2024.3360885
[2024] Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment
DOI: https://doi.org/10.1109/taslp.2024.3467951
[2024] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
DOI: https://doi.org/10.1109/taslp.2024.3369537
[2024] NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849090
[2024] NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849090
[2024] The T05 System for the voicemos challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
DOI: https://doi.org/10.1109/slt61566.2024.10832315
[2024] Cross-Dialect Text-to-Speech In Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level Bert
DOI: https://doi.org/10.1109/slt61566.2024.10832155
[2024] DNN-Based Ensemble Singing Voice Synthesis With Interactions Between Singers
DOI: https://doi.org/10.1109/slt61566.2024.10832340
[2024] DNN-Based Ensemble Singing Voice Synthesis With Interactions Between Singers
DOI: https://doi.org/10.1109/slt61566.2024.10832340
[2024] The T05 System for the voicemos challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
DOI: https://doi.org/10.1109/slt61566.2024.10832315
[2024] EFFECT OF MULTIPOLE DICTIONARY IN PARSE SOUND FIELD DECOMPOSITION FOR SUPER-RESOLUTION IN RECORDING AND REPRODUCTION
DOI: https://doi.org/10.25144/24542
[2024] Design of Spectrogram-Consistency Regularization Term Dependent on Observation in Independent Low-Rank Matrix Analysis for Blind Source Separation
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848900
[2024] Beamforming informed independent low-rank matrix analysis for sound source enhancement in unmanned aerial vehicles
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849300
[2024] Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue Systems
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848599
[2024] EFFECT OF MULTIPOLE DICTIONARY IN PARSE SOUND FIELD DECOMPOSITION FOR SUPER-RESOLUTION IN RECORDING AND REPRODUCTION
DOI: https://doi.org/10.25144/24542
[2024] Design of Spectrogram-Consistency Regularization Term Dependent on Observation in Independent Low-Rank Matrix Analysis for Blind Source Separation
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848900
[2024] Beamforming informed independent low-rank matrix analysis for sound source enhancement in unmanned aerial vehicles
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849300
[2024] Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue Systems
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848599
[2024] Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression
DOI: https://doi.org/10.1109/icassp48485.2024.10448224
[2024] Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer
DOI: https://doi.org/10.1561/116.20230082
[2024] Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach
DOI: https://doi.org/10.1186/s13636-024-00362-6
[2024] SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
DOI: https://doi.org/10.21437/interspeech.2024-1508
[2024] SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
DOI: https://doi.org/10.21437/interspeech.2024-1554
[2024] Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
DOI: https://doi.org/10.21437/interspeech.2024-972
[2024] Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
DOI: https://doi.org/10.21437/interspeech.2024-1107
[2024] SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
DOI: https://doi.org/10.21437/interspeech.2024-388
[2024] Real-Time Speech Extraction Using Spatially Regularized Independent Low-Rank Matrix Analysis and Rank-Constrained Spatial Covariance Matrix Estimation
DOI: https://doi.org/10.1109/icasspw62465.2024.10627448
[2024] Do Learned Speech Symbols Follow Zipf’s Law?
DOI: https://doi.org/10.1109/icassp48485.2024.10448331
[2024] Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features
DOI: https://doi.org/10.1109/icassp48485.2024.10448068
[2024] Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression
DOI: https://doi.org/10.1109/icassp48485.2024.10448224
[2024] JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions
DOI: https://doi.org/10.1109/access.2024.3360885
[2023] Kernel Interpolation of Acoustic Transfer Functions with Adaptive Kernel for Directed and Residual Reverberations
DOI: https://doi.org/10.1109/icassp49357.2023.10095429
[2023] jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
DOI: https://doi.org/10.1109/icassp49357.2023.10095569
[2023] TEXT-TO-SPEECH SYNTHESIS FROM DARK DATA WITH EVALUATION-IN-THE-LOOP DATA SELECTION
DOI: https://doi.org/10.1109/icassp49357.2023.10095161
[2023] TEXT-TO-SPEECH SYNTHESIS FROM DARK DATA WITH EVALUATION-IN-THE-LOOP DATA SELECTION
DOI: https://doi.org/10.1109/icassp49357.2023.10095161
[2023] ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
DOI: https://doi.org/10.21437/interspeech.2023-1095
[2023] jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
DOI: https://doi.org/10.1109/icassp49357.2023.10095569
[2023] COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control
DOI: https://doi.org/10.1109/asru57964.2023.10389693
[2023] COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control
DOI: https://doi.org/10.1109/asru57964.2023.10389693
[2023] JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions
DOI: https://doi.org/10.1016/j.specom.2023.103004
[2023] JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions
DOI: https://doi.org/10.1016/j.specom.2023.103004
[2023] Blind Source Separation Using Independent Low-Rank Matrix Analysis with Spectrogram-Consistency Regularization
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317156
[2023] Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
DOI: https://doi.org/10.1109/waspaa58266.2023.10248156
[2023] Perceptual Quality Enhancement of Sound Field Synthesis Based on Combination of Pressure and Amplitude Matching
DOI: https://doi.org/10.1109/waspaa58266.2023.10248106
[2023] Multichannel Active Noise Control with Exterior Radiation Suppression Based on Riemannian Optimization
DOI: https://doi.org/10.23919/eusipco58844.2023.10289919
[2023] Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides
DOI: https://doi.org/10.23919/eusipco58844.2023.10289819
[2023] Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
DOI: https://doi.org/10.1109/waspaa58266.2023.10248156
[2023] Perceptual Quality Enhancement of Sound Field Synthesis Based on Combination of Pressure and Amplitude Matching
DOI: https://doi.org/10.1109/waspaa58266.2023.10248106
[2023] Multichannel Active Noise Control with Exterior Radiation Suppression Based on Riemannian Optimization
DOI: https://doi.org/10.23919/eusipco58844.2023.10289919
[2023] Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides
DOI: https://doi.org/10.23919/eusipco58844.2023.10289819
[2023] NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction
DOI: https://doi.org/10.23919/eusipco58844.2023.10289863
[2023] Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion
DOI: https://doi.org/10.21437/ssw.2023-15
[2023] Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
DOI: https://doi.org/10.21437/ssw.2023-10
[2023] NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction
DOI: https://doi.org/10.23919/eusipco58844.2023.10289863
[2023] Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion
DOI: https://doi.org/10.21437/ssw.2023-15
[2023] ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
DOI: https://doi.org/10.21437/interspeech.2023-1095
[2023] Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
DOI: https://doi.org/10.21437/interspeech.2023-806
[2023] How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
DOI: https://doi.org/10.21437/interspeech.2023-981
[2023] CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
DOI: https://doi.org/10.21437/interspeech.2023-1098
[2023] HumanDiffusion: diffusion model using perceptual gradients
DOI: https://doi.org/10.21437/interspeech.2023-1680
[2023] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
DOI: https://doi.org/10.24963/ijcai.2023/575
[2023] Kernel Interpolation of Acoustic Transfer Functions with Adaptive Kernel for Directed and Residual Reverberations
DOI: https://doi.org/10.1109/icassp49357.2023.10095429
[2023] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
DOI: https://doi.org/10.1109/icassp49357.2023.10096247
[2023] Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
DOI: https://doi.org/10.21437/interspeech.2023-806
[2023] How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
DOI: https://doi.org/10.21437/interspeech.2023-981
[2023] CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
DOI: https://doi.org/10.21437/interspeech.2023-1098
[2023] HumanDiffusion: diffusion model using perceptual gradients
DOI: https://doi.org/10.21437/interspeech.2023-1680
[2023] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
DOI: https://doi.org/10.24963/ijcai.2023/575
[2023] Blind Source Separation Using Independent Low-Rank Matrix Analysis with Spectrogram-Consistency Regularization
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317156
[2023] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
DOI: https://doi.org/10.1109/icassp49357.2023.10096247
[2023] Spatial Active Noise Control Method Based on Sound Field Interpolation from Reference Microphone Signals
DOI: https://doi.org/10.1109/icassp49357.2023.10097189
[2023] Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images
DOI: https://doi.org/10.1109/icassp49357.2023.10096517
[2023] Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech
DOI: https://doi.org/10.1109/icassp49357.2023.10096402
[2023] MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models
DOI: https://doi.org/10.1109/icassp49357.2023.10097113
[2023] VTTS: Visual-Text To Speech
DOI: https://doi.org/10.1109/slt54892.2023.10022739
[2023] Spatial Active Noise Control Method Based on Sound Field Interpolation from Reference Microphone Signals
DOI: https://doi.org/10.1109/icassp49357.2023.10097189
[2023] Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images
DOI: https://doi.org/10.1109/icassp49357.2023.10096517
[2023] Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech
DOI: https://doi.org/10.1109/icassp49357.2023.10096402
[2023] MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models
DOI: https://doi.org/10.1109/icassp49357.2023.10097113
[2023] VTTS: Visual-Text To Speech
DOI: https://doi.org/10.1109/slt54892.2023.10022739
[2023] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation
DOI: https://doi.org/10.1109/taslp.2023.3293044
[2023] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources
DOI: https://doi.org/10.1109/access.2023.3345027
[2023] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation
DOI: https://doi.org/10.1109/taslp.2023.3293044
[2023] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources
DOI: https://doi.org/10.1109/access.2023.3345027
[2022] Region-Restricted Sensor Placement Based on Gaussian Process for Sound Field Estimation
DOI: https://doi.org/10.1109/tsp.2022.3156012
[2022] Head-Related Transfer Function Interpolation From Spatially Sparse Measurements Using Autoencoder With Source Position Conditioning
DOI: https://doi.org/10.1109/iwaenc53105.2022.9914751
[2022] Personalized Filled-pause Generation with Group-wise Prediction Models
DOI: https://doi.org/10.63317/5esa64wph3ou
[2022] Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
DOI: https://doi.org/10.1109/icassp43922.2022.9746399
[2022] Spatial Active Noise Control Based on Individual Kernel Interpolation of Primary and Secondary Sound Fields
DOI: https://doi.org/10.1109/icassp43922.2022.9746065
[2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional Weighting
DOI: https://doi.org/10.1109/icassp43922.2022.9746842
[2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties
DOI: https://doi.org/10.1109/taslp.2022.3201368
[2022] Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation
DOI: https://doi.org/10.1109/taslp.2022.3203907
[2022] Adaptive End-to-End Text-to-Speech Synthesis Based on Error Correction Feedback from Humans
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979876
[2022] Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation
DOI: https://doi.org/10.1109/taslp.2022.3203907
[2022] Region-Restricted Sensor Placement Based on Gaussian Process for Sound Field Estimation
DOI: https://doi.org/10.1109/tsp.2022.3156012
[2022] Amplitude Matching for Multizone Sound Field Control
DOI: https://doi.org/10.1109/taslp.2022.3231715
[2022] Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980158
[2022] Amplitude Matching for Multizone Sound Field Control
DOI: https://doi.org/10.1109/taslp.2022.3231715
[2022] Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980158
[2022] Adaptive End-to-End Text-to-Speech Synthesis Based on Error Correction Feedback from Humans
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979876
[2022] Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979895
[2022] Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980331
[2022] Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction
DOI: https://doi.org/10.1186/s13634-022-00905-z
[2022] UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
DOI: https://doi.org/10.21437/interspeech.2022-439
[2022] SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
DOI: https://doi.org/10.21437/interspeech.2022-298
[2022] Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979895
[2022] Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980331
[2022] Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction
DOI: https://doi.org/10.1186/s13634-022-00905-z
[2022] UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
DOI: https://doi.org/10.21437/interspeech.2022-439
[2022] SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
DOI: https://doi.org/10.21437/interspeech.2022-298
[2022] Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
DOI: https://doi.org/10.21437/interspeech.2022-403
[2022] STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
DOI: https://doi.org/10.21437/interspeech.2022-300
[2022] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
DOI: https://doi.org/10.21437/interspeech.2022-444
[2022] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
DOI: https://doi.org/10.21437/interspeech.2022-257
[2022] Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis
DOI: https://doi.org/10.21437/interspeech.2022-638
[2022] Physics-Informed Convolutional Neural Network with Bicubic Spline Interpolation for Sound Field Estimation
DOI: https://doi.org/10.1109/iwaenc53105.2022.9914792
[2022] Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
DOI: https://doi.org/10.21437/interspeech.2022-403
[2022] STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
DOI: https://doi.org/10.21437/interspeech.2022-300
[2022] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
DOI: https://doi.org/10.21437/interspeech.2022-444
[2022] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
DOI: https://doi.org/10.21437/interspeech.2022-257
[2022] Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis
DOI: https://doi.org/10.21437/interspeech.2022-638
[2022] Physics-Informed Convolutional Neural Network with Bicubic Spline Interpolation for Sound Field Estimation
DOI: https://doi.org/10.1109/iwaenc53105.2022.9914792
[2022] Head-Related Transfer Function Interpolation From Spatially Sparse Measurements Using Autoencoder With Source Position Conditioning
DOI: https://doi.org/10.1109/iwaenc53105.2022.9914751
[2022] Personalized Filled-pause Generation with Group-wise Prediction Models
DOI: https://doi.org/10.63317/5esa64wph3ou
[2022] Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
DOI: https://doi.org/10.1109/icassp43922.2022.9746399
[2022] Spatial Active Noise Control Based on Individual Kernel Interpolation of Primary and Secondary Sound Fields
DOI: https://doi.org/10.1109/icassp43922.2022.9746065
[2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional Weighting
DOI: https://doi.org/10.1109/icassp43922.2022.9746842
[2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties
DOI: https://doi.org/10.1109/taslp.2022.3201368
[2021] Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis
DOI: https://doi.org/10.21437/interspeech.2021-897
[2021] Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
DOI: https://doi.org/10.21437/interspeech.2021-583
[2021] DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching
DOI: https://doi.org/10.1587/transinf.2021edp7041
[2021] Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis with Prior Information on Desired Field
DOI: https://doi.org/10.1109/waspaa52581.2021.9632799
[2021] Speaker adaptation of speech synthesis using human perceptual evaluation feedback
[2021] Binaural rendering from microphone array signals of arbitrary geometry
DOI: https://doi.org/10.1121/10.0006538
[2021] Binaural rendering from microphone array signals of arbitrary geometry
DOI: https://doi.org/10.1121/10.0006538
[2021] Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
DOI: https://doi.org/10.1109/asru51503.2021.9687904
[2021] DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching
DOI: https://doi.org/10.1587/transinf.2021edp7041
[2021] Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis with Prior Information on Desired Field
DOI: https://doi.org/10.1109/waspaa52581.2021.9632799
[2021] Speaker adaptation of speech synthesis using human perceptual evaluation feedback
[2021] Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
DOI: https://doi.org/10.1109/asru51503.2021.9687904

科研費（0 件）

まだデータがありません（KAKEN 取り込み後に表示）。

所属学会・役職（0 件）

まだデータがありません（学会データ連携後に表示）。

AI 要約（直近 5 年の研究成果）

外部リンク

関連研究室(8 件)

研究成果（189 件）

科研費（0 件）

所属学会・役職（0 件）