Kazuhiro Nakadai 研究室

主宰者：Kazuhiro Nakadai

東京工業大学

AI 要約（直近 5 年の研究成果）

本研究室は、ロボットやドローンが周囲の音環境を理解・認識する「ロボット聴覚」に関する研究を展開しています。マイクロフォンアレイ（複数のマイクを配置したセンサー）を用いて、音源の位置を特定したり、混在する複数の音声から目的の音を抽出したり、騒音環境での音声認識精度を向上させたりするための手法を開発しています。これらの技術は、信号処理とニューラルネットワークの両面からアプローチされており、実環境の変化に適応できるシステム設計が重視されています。具体的には、災害救助現場での要救助者の発見を想定した複数ドローンによる音源探索システムや、画像が得られない煙や崩落した建物内での聴覚情報の活用など、視覚情報が限定される状況での応用を進めています。また、聴覚障害者向けの手話表現システムや、森林生態系における動物の鳴き声分析など、ロボット聴覚技術の社会的応用も広がっています。さらに、エッジコンピューティングデバイスやFPGAへの実装による高速化・低消費電力化、複数のセンサー間の自動キャリブレーション技術も並行して開発されており、現実世界での実装可能性を高める取り組みが特徴です。

※ AI（Claude）が、公開されている論文要旨から研究の問い・手法・主要な発見を事実情報として抽出・再構成して自動生成しています。誤りを含む可能性があるため、正確性は研究室公式情報でご確認ください。

外部リンク

研究成果（73 件）

[2025] Single-Channel Target Speech Extraction Utilizing Distance and Room Clues
DOI: https://doi.org/10.23919/eusipco63237.2025.11226413
[2025] Towards Online Sign Language Expression for Real-Time Human-Robot Interaction
DOI: https://doi.org/10.1109/ro-man63969.2025.11217908
[2025] Observability-Aware Active Calibration of Multisensor Extrinsics for Ground Robots via Online Trajectory Optimization
DOI: https://doi.org/10.1109/jsen.2025.3580427
[2025] An LCMV-based Scan-and-Sum Beamformer to Extract In-region Sound Sources
DOI: https://doi.org/10.7210/jrsj.43.537
[2025] From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution
DOI: https://doi.org/10.1109/apsipaasc65261.2025.11249079
[2025] Swarm Active Audition with Robots and Drones: Real-World Performance Validation
DOI: https://doi.org/10.1109/iros60139.2025.11247372
[2025] SignFlow: End-to-End Sign Language Generation for One-to-Many Modeling using Conditional Flow Matching
DOI: https://doi.org/10.1145/3716553.3750765
[2024] HARK3.6およびそのアクティブドローン聴覚への応用
[2024] 複数ドローンとロボットの協調による群アクティブ聴覚システム
[2024] Special issue on robot and human interactive communication
DOI: https://doi.org/10.1080/01691864.2024.2410825

続きを表示（残り 63 件）

[2024] 複数のドローンを用いた音源探査のためのROSネットワークの構築
[2024] A Video Vision Transformer for Sound Source Localization
DOI: https://doi.org/10.23919/eusipco63174.2024.10715427
[2024] Online adaptation of fourier series-based acoustic transfer function model and its application to sound source localization and separation
DOI: https://doi.org/10.1080/01691864.2024.2379384
[2024] A Performance Assessment on Rotor Noise-Informed Active Multidrone Sound Source Tracking Methods
DOI: https://doi.org/10.3390/drones8060266
[2024] FPGA-based Low Power Acceleration of HARK Sound Source Localization
DOI: https://doi.org/10.1109/coolchips61292.2024.10531180
[2024] Real Time Sound Source Localization Using von-Mises ResNet
DOI: https://doi.org/10.1109/sii58957.2024.10417224
[2024] SLAM-Based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization
DOI: https://doi.org/10.1109/tro.2024.3410456
[2024] Improving Noise Robustness of Automatic Speech Recognition with Speech Enhancement and Adapters
DOI: https://doi.org/10.7210/jrsj.42.920
[2024] Performance Improvement and Acceleration of Surface Source Extractionbased on Multiple Constraint MVDR Beamforming and Woodbury Matrix Identity
DOI: https://doi.org/10.7210/jrsj.42.584
[2024] Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?
DOI: https://doi.org/10.1186/s13636-024-00387-x
[2024] Advancing Applications of Robot Audition Systems: Efficient HARK Deployment with GPU and FPGA Implementations
DOI: https://doi.org/10.3390/chips4010002
[2024] Swarm Active Audition System with Robots and Drones for a Search and Rescue Task
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848937
[2024] Implementation of a Robot Operation System-based network for sound source localization using multiple drones
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10849321
[2024] LCMV-based Scan-and-Sum Beamforming for Region Source Extraction
DOI: https://doi.org/10.1109/apsipaasc63619.2025.10848984
[2024] Special issue on robot and human interactive communication (Part II)
DOI: https://doi.org/10.1080/01691864.2024.2440161
[2023] miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge
DOI: https://doi.org/10.21437/interspeech.2023-1162
[2023] Improving Sign Language Understanding Introducing Label Smoothing
DOI: https://doi.org/10.1109/ro-man57019.2023.10309531
[2023] Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
DOI: https://doi.org/10.21437/interspeech.2023-1320
[2023] Visualization and Quantification of the Activities of Animal Vocalizations in Forest Species Using Robot Audition Techniques
DOI: https://doi.org/10.35995/jea7010002
[2023] Performance evaluation of sound source localisation and tracking methods using multiple drones
DOI: https://doi.org/10.3397/in_2023_0291
[2023] Is the Ideal Ratio Mask Really the Best? — Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317440
[2023] Unsupervised Domain Adaptation of Universal Source Separation Based on Neural Full-Rank Spatial Covariance Analysis
DOI: https://doi.org/10.1109/mlsp55844.2023.10285999
[2023] Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation
DOI: https://doi.org/10.1109/ro-man57019.2023.10309550
[2023] Placement Planning for Sound Source Tracking in Active Drone Audition
DOI: https://doi.org/10.3390/drones7070405
[2023] Low power implementation of Geometric High-order Decorrelation-based Source Separation on an FPGA board
DOI: https://doi.org/10.1109/coolchips57690.2023.10121954
[2023] Monitoring the courtship flight trajectory of Latham's snipe (<i>Gallinago hardwickii</i>) using microphone arrays
DOI: https://doi.org/10.1002/ece3.9938
[2023] Estimating the Soundscape Structure and Dynamics of Forest Bird Vocalizations in an Azimuth-Elevation Space Using a Microphone Array
DOI: https://doi.org/10.3390/app13063607
[2023] Observability Analysis of Graph SLAM-Based Joint Calibration of Multiple Microphone Arrays and Sound Source Localization
DOI: https://doi.org/10.1109/sii55687.2023.10039204
[2023] Assessment of Simultaneous Calibration for Positions, Orientations, and Time Offsets in Multiple Microphone Arrays Systems
DOI: https://doi.org/10.1109/sii55687.2023.10039440
[2023] Metric-Based Multimodal Meta-Learning for Human Movement Identification Via Footstep Recognition
DOI: https://doi.org/10.1109/sii55687.2023.10039089
[2023] Reconstruction of Depth Scenes Based on Echolocation
DOI: https://doi.org/10.1109/sii55687.2023.10039271
[2023] An Ensemble Method for Multiple Speech Enhancement Using Deep Learning
DOI: https://doi.org/10.1109/sii55687.2023.10039167
[2023] Extracting Bird Vocalizations from a Complex Natural Soundscape in Forests Using Robot Audition Techniques
DOI: https://doi.org/10.1109/sii55687.2023.10039198
[2022] 複数音源追跡におけるドローン群の行動計画の検討
[2022] 任意の混合音を入力としたマイクロホンアレイ形状のキャリブレーション
[2022] Blockwiseストリーミング音声認識と発話区間検出の統合
[2022] Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition
DOI: https://doi.org/10.21437/interspeech.2022-11077
[2022] Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection
DOI: https://doi.org/10.21437/interspeech.2022-11216
[2022] Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model
DOI: https://doi.org/10.21437/interspeech.2022-576
[2022] Auditory Survey of Endangered Eurasian Bittern Using Microphone Arrays and Robot Audition
DOI: https://doi.org/10.3389/frobt.2022.854572
[2022] 3D Convolution Recurrent Neural Networks for Multi-Label Earthquake Magnitude Classification
DOI: https://doi.org/10.3390/app12042195
[2022] Visual Scene Reconstruction based on Echolocation with a Generative Adversarial Network
DOI: https://doi.org/10.7210/jrsj.40.351
[2022] Evaluation of a Speech Enhancement Method Combining Ensemble Time-Frequency Masking and Beamforming
DOI: https://doi.org/10.7210/jrsj.40.631
[2022] An FPGA off-loading of HARK sound source localization
DOI: https://doi.org/10.1109/candarw57323.2022.00057
[2022] 深層ブラインド音源分離と転移学習に基づく遠隔音声認識の評価
[2022] 低解像度画像からの小領域物体の検出手法の検討
[2021] Observing Nocturnal Birds Using Localization Techniques
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382665
[2021] Assessment of a Beamforming Implementation Developed for Surface Sound Source Separation
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382648
[2021] Visualizing Directional Soundscapes of Bird Vocalizations Using Robot Audition Techniques
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382639
[2021] Proposal and Evaluation of Spatial Sound Source Separationusing NMF with Multiple Microphone Arrays
DOI: https://doi.org/10.7210/jrsj.39.669
[2021] Two-Dimensional Environment Recognition by Audible Sound with Weighted Likelihood Function and Standing Wave
DOI: https://doi.org/10.7210/jrsj.39.271
[2021] Sound Source Tracking Using Integrated Direction Likelihood for Drones with Microphone Arrays
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382619
[2021] CASE: CNN Acceleration for Speech-Classification in Edge-Computing
DOI: https://doi.org/10.1109/ieeecloudsummit52029.2021.00018
[2021] Fully-Online Always-Adaptation of Transfer Functions and Its Application to Sound Source Localization and Separation
DOI: https://doi.org/10.1109/iros51168.2021.9636631
[2021] Assessment of Sound Source Tracking Using Multiple Drones Equipped with Multiple Microphone Arrays
DOI: https://doi.org/10.3390/ijerph18179039
[2021] Assessment of von Mises-Bernoulli Deep Neural Network in Sound Source Localization
DOI: https://doi.org/10.21437/interspeech.2021-1050
[2021] Simultaneous Calibration of Positions, Orientations, and Time Offsets, Among Multiple Microphone Arrays
DOI: https://doi.org/10.1109/icas49788.2021.9551166
[2021] Non-Invasive Monitoring of the Spatio-Temporal Dynamics of Vocalizations among Songbirds in a Semi Free-Flight Environment Using Robot Audition Techniques
DOI: https://doi.org/10.3390/birds2020012
[2021] Detecting earthquakes: a novel deep learning-based approach for effective disaster response
DOI: https://doi.org/10.1007/s10489-021-02285-7
[2021] Multichannel environmental sound segmentation
DOI: https://doi.org/10.1007/s10489-021-02314-5
[2021] Investigation of Node Pruning Criteria for Neural Networks Model Compression with Non-Linear Function and Non-Uniform Network Topology
DOI: https://doi.org/10.1109/slt48900.2021.9383593
[2021] EMC: Earthquake Magnitudes Classification on Seismic Signals via Convolutional Recurrent Networks
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382696
[2021] Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net
DOI: https://doi.org/10.1109/ieeeconf49454.2021.9382730

科研費（0 件）

まだデータがありません（KAKEN 取り込み後に表示）。

所属学会・役職（0 件）

まだデータがありません（学会データ連携後に表示）。

AI 要約（直近 5 年の研究成果）

外部リンク

関連研究室(8 件)

研究成果（73 件）

科研費（0 件）

所属学会・役職（0 件）