Hiroshi Saruwatari 研究室
主宰者:Hiroshi Saruwatari
東京大学
AI 要約(直近 5 年の研究成果)
Saruwatari研究室は、音声信号処理とその応用を中心に研究を展開しています。複数の音声が混在する環境から目的の音声を抽出・強化する技術、いわゆる音源分離の高度な手法開発に力を注いでいます。空間的な配置情報を活用した音源分離、複数のマイクロフォンアレイが配置された環境での信号処理、ドローンに搭載したマイクロフォンを用いた音声収録の改善など、様々な実環境を想定した手法を提案しています。
さらに、音声生成システムの改善にも取り組んでいます。テキストから音声を合成する際に、感情表現や話者の特性を細かく制御するための技術、また複数のサンプリング周波数に対応できるニューラルネットワークの設計を研究しています。これらの研究では、深層学習モデルの事前学習知識を効果的に活用しながら、過学習を抑制する工夫も施されています。
研究室全体を通じて、実際の機器や環境での動作を重視する傾向が見られます。異なるサンプリング周波数への対応、限定的な計算資源での処理、既有情報の活用など、実用性を意識した課題解決を目指しており、これが音声処理分野における応用研究の発展に貢献しています。
※ AI(Claude)が、公開されている論文要旨から研究の問い・手法・主要な発見を事実情報として抽出・再構成して自動生成しています。誤りを含む可能性があるため、正確性は研究室公式情報でご確認ください。
外部リンク
関連研究室(8 件)
- 農学・生物科学Takanori Fukao 研究室東京大学論文 31 件·共通: 工学, 機械・ロボティクス, 環境, 地球科学・環境 +11
- エネルギーKenichi Furuhashi 研究室東京大学論文 21 件·共通: 工学, 機械・ロボティクス, 環境, 地球科学・環境 +11
- 神経科学Katsumi Watanabe 研究室早稲田大学論文 25 件·共通: 学習, 生物学, 神経科学, 認知・行動 +8
- 農学・生物科学Kazuhiro Fujiwara 研究室東京大学論文 21 件·共通: 生物学, システム, 情報工学, 計算機科学 +8
- 神経科学Ryohei Kanzaki 研究室東京大学論文 33 件·共通: 生物学, 神経科学, 認知・行動, 制御 +6
- 農学・生物科学Ryohei Sugita 研究室東京大学論文 16 件·共通: 生物学, システム, 情報工学, 計算機科学 +6
- 医学Yutaka Suzuki 研究室東京大学論文 100 件·共通: 生物学, 制御, 工学, 機械・ロボティクス +5
- 農学・生物科学Tadao Asami 研究室東京大学論文 55 件·共通: 生物学, 制御, 工学, 機械・ロボティクス +5
研究成果(187 件)
- DOI: https://doi.org/10.1109/icassp55912.2026.11460767
- [2025] Active noise cancellation in space containing scattering objects based on kernel interpolationDOI: https://doi.org/10.1121/10.0037383
- DOI: https://doi.org/10.1109/icassp49660.2025.10887678
- DOI: https://doi.org/10.1109/taslpro.2025.3633089
- DOI: https://doi.org/10.1109/access.2025.3569590
- DOI: https://doi.org/10.1561/116.20250020
- [2025] Spatial Active Noise Control Based on Kernel Interpolation With Individual Directional WeightingDOI: https://doi.org/10.1561/116.20250034
- DOI: https://doi.org/10.1109/access.2025.3645005
続きを表示(残り 177 件)閉じる
- DOI: https://doi.org/10.1016/j.specom.2025.103331
- DOI: https://doi.org/10.1109/access.2025.3569590
- DOI: https://doi.org/10.1561/116.20250020
- [2025] Spatial Active Noise Control Based on Kernel Interpolation With Individual Directional WeightingDOI: https://doi.org/10.1561/116.20250034
- DOI: https://doi.org/10.1109/access.2025.3645005
- DOI: https://doi.org/10.1250/ast.e25.27
- DOI: https://doi.org/10.1016/j.specom.2025.103331
- DOI: https://doi.org/10.1016/j.sigpro.2025.110420
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248984
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249057
- [2025] Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone ArraysDOI: https://doi.org/10.1109/apsipaasc65261.2025.11249098
- DOI: https://doi.org/10.1016/j.sigpro.2025.110420
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248984
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249057
- [2025] Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone ArraysDOI: https://doi.org/10.1109/apsipaasc65261.2025.11249098
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248982
- DOI: https://doi.org/10.23919/eusipco63237.2025.11226337
- DOI: https://doi.org/10.1016/j.apacoust.2025.111019
- [2025] RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audioDOI: https://doi.org/10.21437/interspeech.2025-1830
- DOI: https://doi.org/10.1080/09546634.2025.2535686
- DOI: https://doi.org/10.1109/apsipaasc65261.2025.11248982
- DOI: https://doi.org/10.23919/eusipco63237.2025.11226337
- DOI: https://doi.org/10.1016/j.apacoust.2025.111019
- [2025] RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audioDOI: https://doi.org/10.21437/interspeech.2025-1830
- DOI: https://doi.org/10.1080/09546634.2025.2535686
- DOI: https://doi.org/10.1250/ast.e25.27
- DOI: https://doi.org/10.1121/10.0037821
- DOI: https://doi.org/10.1121/10.0037481
- [2025] Active noise cancellation in space containing scattering objects based on kernel interpolationDOI: https://doi.org/10.1121/10.0037383
- DOI: https://doi.org/10.1109/taslpro.2025.3633089
- DOI: https://doi.org/10.1109/icassp49660.2025.10887678
- DOI: https://doi.org/10.1121/10.0037821
- DOI: https://doi.org/10.1121/10.0037481
- DOI: https://doi.org/10.1561/116.00000242
- DOI: https://doi.org/10.1109/taslp.2024.3369537
- [2024] Cross-Dialect Text-to-Speech In Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BertDOI: https://doi.org/10.1109/slt61566.2024.10832155
- DOI: https://doi.org/10.1561/116.00000242
- DOI: https://doi.org/10.1109/access.2024.3360885
- [2024] Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to EnvironmentDOI: https://doi.org/10.1109/taslp.2024.3467951
- DOI: https://doi.org/10.1109/taslp.2024.3369537
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849090
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849090
- DOI: https://doi.org/10.1109/slt61566.2024.10832315
- [2024] Cross-Dialect Text-to-Speech In Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BertDOI: https://doi.org/10.1109/slt61566.2024.10832155
- DOI: https://doi.org/10.1109/slt61566.2024.10832340
- DOI: https://doi.org/10.1109/slt61566.2024.10832340
- DOI: https://doi.org/10.1109/slt61566.2024.10832315
- DOI: https://doi.org/10.25144/24542
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848900
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849300
- [2024] Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue SystemsDOI: https://doi.org/10.1109/apsipaasc63619.2025.10848599
- DOI: https://doi.org/10.25144/24542
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848900
- DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849300
- [2024] Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue SystemsDOI: https://doi.org/10.1109/apsipaasc63619.2025.10848599
- DOI: https://doi.org/10.1109/icassp48485.2024.10448224
- DOI: https://doi.org/10.1561/116.20230082
- DOI: https://doi.org/10.1186/s13636-024-00362-6
- DOI: https://doi.org/10.21437/interspeech.2024-1508
- DOI: https://doi.org/10.21437/interspeech.2024-1554
- DOI: https://doi.org/10.21437/interspeech.2024-972
- [2024] Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target SignalsDOI: https://doi.org/10.21437/interspeech.2024-1107
- DOI: https://doi.org/10.21437/interspeech.2024-388
- DOI: https://doi.org/10.1109/icasspw62465.2024.10627448
- DOI: https://doi.org/10.1109/icassp48485.2024.10448331
- DOI: https://doi.org/10.1109/icassp48485.2024.10448068
- DOI: https://doi.org/10.1109/icassp48485.2024.10448224
- DOI: https://doi.org/10.1109/access.2024.3360885
- DOI: https://doi.org/10.24963/ijcai.2023/575
- DOI: https://doi.org/10.1109/icassp49357.2023.10095569
- DOI: https://doi.org/10.1109/icassp49357.2023.10095161
- DOI: https://doi.org/10.1109/icassp49357.2023.10095161
- DOI: https://doi.org/10.21437/interspeech.2023-1095
- DOI: https://doi.org/10.1109/icassp49357.2023.10095569
- DOI: https://doi.org/10.1109/asru57964.2023.10389693
- DOI: https://doi.org/10.1109/asru57964.2023.10389693
- DOI: https://doi.org/10.1016/j.specom.2023.103004
- DOI: https://doi.org/10.1016/j.specom.2023.103004
- DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317156
- DOI: https://doi.org/10.1109/waspaa58266.2023.10248156
- DOI: https://doi.org/10.1109/waspaa58266.2023.10248106
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289919
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289819
- DOI: https://doi.org/10.1109/waspaa58266.2023.10248156
- DOI: https://doi.org/10.1109/waspaa58266.2023.10248106
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289919
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289819
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289863
- DOI: https://doi.org/10.21437/ssw.2023-15
- DOI: https://doi.org/10.21437/ssw.2023-10
- DOI: https://doi.org/10.23919/eusipco58844.2023.10289863
- DOI: https://doi.org/10.21437/ssw.2023-15
- DOI: https://doi.org/10.21437/interspeech.2023-1095
- [2023] Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter CorpusDOI: https://doi.org/10.21437/interspeech.2023-806
- DOI: https://doi.org/10.21437/interspeech.2023-981
- DOI: https://doi.org/10.21437/interspeech.2023-1098
- DOI: https://doi.org/10.21437/interspeech.2023-1680
- DOI: https://doi.org/10.24963/ijcai.2023/575
- DOI: https://doi.org/10.1109/icassp49357.2023.10095429
- [2023] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual ContextsDOI: https://doi.org/10.1109/icassp49357.2023.10096247
- [2023] Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter CorpusDOI: https://doi.org/10.21437/interspeech.2023-806
- DOI: https://doi.org/10.21437/interspeech.2023-981
- DOI: https://doi.org/10.21437/interspeech.2023-1098
- DOI: https://doi.org/10.21437/interspeech.2023-1680
- DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317156
- DOI: https://doi.org/10.1109/icassp49357.2023.10095429
- [2023] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual ContextsDOI: https://doi.org/10.1109/icassp49357.2023.10096247
- DOI: https://doi.org/10.1109/icassp49357.2023.10097189
- DOI: https://doi.org/10.1109/icassp49357.2023.10096517
- [2023] Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-SpeechDOI: https://doi.org/10.1109/icassp49357.2023.10096402
- DOI: https://doi.org/10.1109/icassp49357.2023.10097113
- [2023] VTTS: Visual-Text To SpeechDOI: https://doi.org/10.1109/slt54892.2023.10022739
- DOI: https://doi.org/10.1109/icassp49357.2023.10097189
- DOI: https://doi.org/10.1109/icassp49357.2023.10096517
- [2023] Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-SpeechDOI: https://doi.org/10.1109/icassp49357.2023.10096402
- DOI: https://doi.org/10.1109/icassp49357.2023.10097113
- [2023] VTTS: Visual-Text To SpeechDOI: https://doi.org/10.1109/slt54892.2023.10022739
- DOI: https://doi.org/10.1109/taslp.2023.3293044
- DOI: https://doi.org/10.1109/access.2023.3345027
- DOI: https://doi.org/10.1109/taslp.2023.3293044
- DOI: https://doi.org/10.1109/access.2023.3345027
- [2022] Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational AutoencodersDOI: https://doi.org/10.23919/apsipaasc55919.2022.9980158
- DOI: https://doi.org/10.1109/icassp43922.2022.9746399
- DOI: https://doi.org/10.1109/icassp43922.2022.9746065
- [2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional WeightingDOI: https://doi.org/10.1109/icassp43922.2022.9746842
- DOI: https://doi.org/10.1109/taslp.2022.3201368
- [2022] Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source SeparationDOI: https://doi.org/10.1109/taslp.2022.3203907
- DOI: https://doi.org/10.1109/tsp.2022.3156012
- [2022] Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source SeparationDOI: https://doi.org/10.1109/taslp.2022.3203907
- DOI: https://doi.org/10.1109/tsp.2022.3156012
- DOI: https://doi.org/10.1109/taslp.2022.3231715
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979876
- DOI: https://doi.org/10.1109/taslp.2022.3231715
- [2022] Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational AutoencodersDOI: https://doi.org/10.23919/apsipaasc55919.2022.9980158
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979876
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979895
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980331
- DOI: https://doi.org/10.1186/s13634-022-00905-z
- DOI: https://doi.org/10.21437/interspeech.2022-439
- DOI: https://doi.org/10.21437/interspeech.2022-298
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979895
- DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980331
- DOI: https://doi.org/10.1186/s13634-022-00905-z
- DOI: https://doi.org/10.21437/interspeech.2022-439
- DOI: https://doi.org/10.21437/interspeech.2022-298
- DOI: https://doi.org/10.21437/interspeech.2022-403
- DOI: https://doi.org/10.21437/interspeech.2022-300
- DOI: https://doi.org/10.21437/interspeech.2022-444
- DOI: https://doi.org/10.21437/interspeech.2022-257
- DOI: https://doi.org/10.21437/interspeech.2022-638
- DOI: https://doi.org/10.1109/iwaenc53105.2022.9914792
- DOI: https://doi.org/10.21437/interspeech.2022-403
- DOI: https://doi.org/10.21437/interspeech.2022-300
- DOI: https://doi.org/10.21437/interspeech.2022-444
- DOI: https://doi.org/10.21437/interspeech.2022-257
- DOI: https://doi.org/10.21437/interspeech.2022-638
- DOI: https://doi.org/10.1109/iwaenc53105.2022.9914792
- DOI: https://doi.org/10.1109/iwaenc53105.2022.9914751
- DOI: https://doi.org/10.63317/5esa64wph3ou
- DOI: https://doi.org/10.1109/icassp43922.2022.9746399
- DOI: https://doi.org/10.1109/icassp43922.2022.9746065
- [2022] Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional WeightingDOI: https://doi.org/10.1109/icassp43922.2022.9746842
- DOI: https://doi.org/10.1109/taslp.2022.3201368
- DOI: https://doi.org/10.1109/iwaenc53105.2022.9914751
- DOI: https://doi.org/10.63317/5esa64wph3ou
- DOI: https://doi.org/10.1121/10.0006538
- DOI: https://doi.org/10.1121/10.0006538
- [2021] DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments MatchingDOI: https://doi.org/10.1587/transinf.2021edp7041
- DOI: https://doi.org/10.1109/waspaa52581.2021.9632799
- DOI: https://doi.org/10.1109/waspaa52581.2021.9632799
- DOI: https://doi.org/10.1109/asru51503.2021.9687904
- [2021] DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments MatchingDOI: https://doi.org/10.1587/transinf.2021edp7041
- DOI: https://doi.org/10.1109/asru51503.2021.9687904
科研費(0 件)
まだデータがありません(KAKEN 取り込み後に表示)。
所属学会・役職(0 件)
まだデータがありません(学会データ連携後に表示)。