最終更新日2017年02月01日

氏名

オクノ ヒロシ

奥乃 博

職名

教授(任期付)

所属理工学術院

(大学院創造理工学研究科)

連絡先

メールアドレス

メールアドレス
okuno@aoni.waseda.jp
メールアドレス(その他)
okuno@nue.org

住所・電話番号・fax番号

住所
〒169-0072新宿区 大久保2−4−12 ラムダックスビル3階
電話番号
03-6233-7864
fax番号
03-5285-0028

URL等

WebページURL

http://www.aoni.waseda.jp/okuno(HP)

本属以外の学内所属

学内研究所等

ヒューマン・ロボット共創研究所

研究所員 2017年-

学歴・学位

学歴

-1972年 東京大学 教養学部 基礎科学科

学位

博士(工学) 論文 東京大学 知能情報学

教養学士 課程 東京大学

経歴

2014年04月-現在 早稲田大学 実体情報学博士プログラム 教授(任期付)
2001年04月-2014年03月京都大学大学院 情報学研究科 教授
1999年-2001年東京理科大学理工学部情報科学科教授
1998年-1999年科学技術振興事業団研究員,技術参事,グループリーダ
1972年-1998年日本電信電話株式会社

所属学協会

Institute of Electrical and Electronics Engineers Fellow (2012)

(一社)人工知能学会 理事,フェロー(2013)

(一社)情報処理学会 理事,フェロー(2015)

(一社)日本ソフトウエア科学会 理事

(一社)日本ロボット学会 フェロー(2015)

Association for Computing Machinery

Association for the Advancement of Artificial Intelligence

Acoustic Society of America

委員歴・役員歴(学外)

理事:日本ソフトウエア科学会,人工知能学会,情報処理学会
学位審査委員:大学評価・学位授与機構

受賞

2016 Advanced Robotics Best Paper Award

2016年10月授与機関:日本ロボット学会

タイトル:Posture Estimation of Hose-shaped Robot by using Active Microphoe Array

受賞者(グループ):Yoshiaki Bando, Takuma Otsuka, Kazuhiro Nakadai, Satoshi Tadokoro, Masashi Konyo, Katsutoshi Itoyama, Hiroshi G. Okuno

Best Innovative Paper Award

2015年10月授与機関:IEEE-RAS SSRR-2015

タイトル:Human-Voice Enhancement based on Online RPCA for a Hose-shaped Rescue Robot with a Microphone Array

受賞者(グループ):Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

People's Choice Demo

2015年10月授与機関:IEEE-RAS SSRR-2015

タイトル:Hose-shaped Rescue Robot

受賞者(グループ):Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

日本ロボット学会 フェロー

2015年09月

情報処理学会 フェロー認証

2015年06月

人工知能学会 2013年度業績賞

2014年06月

科学技術研究費補助金 平成26年度審査委員表彰, 日本学術振興会

2014年10月

Avanced Robotics誌 第2回 Best Paper Award

2014年09月

人工知能学会 2013年度研究会優秀賞

2014年06月

京都大学 名誉教授

2014年04月

The Best Paper Award, IEA/AIE-2013, The 8th International Workshop on Security

2013年06月

The Best Paper Award, The 8th International Workshop on Security (IWSEC-2013)

2013年11月

人工知能学会 フェロー

2013年06月

平成25年度科学技術部門の文部科学大臣表彰 科学技術賞(研究部門)

2013年04月

Fellow

2012年01月授与機関:IEEE

タイトル:Fellow for contributions to robot audition technology

受賞者(グループ):Hiroshi G.Okuno

人工知能学会2010年度研究会優秀賞

2011年06月

NTF Award for Entertainment Robots and Systems, IEEE/RSJ IROS-2010

2010年10月

Award for a Best Paper, IEA/AIE-2010, International Society for Applied Intelligence

2010年06月

RSJ/SICE AWARD for IROS2006 Best Paper Nomination Fiinalist

2007年10月

FIT2004論文賞

2006年08月

International Society for Applied Intelligence, IEA/AIE-2005 Best Paper Award

2005年10月

International Society for Applied Intelligence, IEA/AID-2005 Best Paper Award

2005年06月

(社)人工知能学会 第16回全国大会優秀論文賞

2003年06月

(社)計測自動制御学会 第3回システムインテグレーション部門 講演会 (SI2002) ベストセッション賞

2003年10月

(財)船井情報科学財団第2回情報科学振興賞

2003年10月

(社)人工知能学会 2002年度研究会優秀賞

2003年06月

Nakamura Award for IROS 2001 Best Paper Finalist Nomination, IEEE and RSJ

2002年08月

IS-2000 Best Paper Award(最優秀論文賞)

2001年06月

第17回電気通信普及財団賞

2001年12月

IEA/AIE-2001 Best Paper Award, International Society of Applied Intelligence

2001年06月

人工知能学会研究奨励賞

2001年12月

人工知能学会第11回全国大会優秀論文賞

1998年06月

人工知能学会1990年度論文賞

1991年06月

研究分野

キーワード

音環境理解・ロボット聴覚・人工知能

科研費分類

情報学 / 人間情報学 / 知能ロボティクス

情報学 / 計算基盤 / マルチメディア・データベース

情報学 / 人間情報学 / 知能情報学

研究テーマ履歴

人工知能

個人研究

音環境理解

個人研究

ロボット聴覚

国内共同研究

論文

ロボット聴覚オープンソースソフトウェアHARKの開発とその応用(最先端研究)

中臺 一博;奥乃 博;水本 武志;中村 圭佑

シミュレーション35(1)p.32 - 382016年03月-2016年03月 

CiNii

詳細

ISSN:02859947

Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

SSRR 2015 - 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics2016年03月-2016年03月 

DOIScopus

詳細

概要:© 2015 IEEE.This paper presents an online real-time method that enhances human voices included in severely noisy audio signals captured by microphones of a hose-shaped rescue robot. To help a remote operator of such a robot pick up a weak voice of a human buried under rubble, it is crucial to suppress the loud ego-noise caused by the movements of the robot in real time. We tackle this task by using online robust principal component analysis (ORPCA) for decomposing the spectrogram of an observed noisy signal into the sum of low-rank and sparse spectrograms that are expected to correspond to periodic ego-noise and human voices. Using a microphone array distributed on the long body of a hose-shaped robot, ego-noise suppression can be further improved by combining the results of ORPCA applied to the observed signal captured by each microphone. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method improves the performance of conventional ego-noise suppression using only one microphone by 7.4 dB in SDR and 17.2 in SIR.

Microphone-accelerometer based 3D posture estimation for a hose-shaped rescue robot

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

IEEE International Conference on Intelligent Robots and Systems2015-Decemberp.5580 - 55862015年12月-2015年12月 

DOIScopus

詳細

ISSN:21530858

概要:© 2015 IEEE.3D posture estimation for a hose-shaped robot is critical in rescue activities due to complex physical environments. Conventional sound-based posture estimation assumes rather flat physical environments and focuses only on 2D, resulting in poor performance in real world environments with rubble. This paper presents novel 3D posture estimation by exploiting microphones and accelerometers. The idea of our method is to compensate the lack of posture information obtained by sound-based time-difference-of arrival (TDOA) with the tilt information obtained from accelerometers. This compensation is formulated as a nonlinear state-space model and solved by the unscented Kalman filter. Experiments are conducted by using a 3m hose-shaped robot with eight units of a microphone and an accelerometer and seven units of a loudspeaker and a vibration motor deployed in a simple 3D structure. Experimental results demonstrate that our method reduces the errors of initial states to about 20 cm in the 3D space. If the initial errors of initial states are less than 20 %, our method can estimate the correct 3D posture in real-time.

Unified inter- and intra-recording duration model for multiple music audio alignment

Maezawa, Akira; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 20152015年11月-2015年11月 

DOIScopus

詳細

概要:© 2015 IEEE.This paper presents a probabilistic audio-to-audio alignment method that focuses on the relationship among the note durations of different performances of a piece of music. A key issue in probabilistic audio alignment methods is in expressing how interrelated are the durations of notes in the underlying piece of music. Existing studies focus either on the duration of adjacent notes within a recording (intra-recording duration model), or the duration of a given note across different recordings (inter-recording duration model). This paper unifies these approaches through a simple modification to them. Furthermore, the paper extends the unified model, allowing the dynamics of the note duration to change sporadically. Experimental evaluation demonstrated that the proposed models decrease the alignment error.

Audio-visual speech recognition using deep learning

Noda, Kuniaki; Yamaguchi, Yuki; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Ogata, Tetsuya

Applied Intelligence42(4)p.722 - 7372015年06月-2015年06月 

DOIScopus

詳細

ISSN:0924669X

概要:© 2014, The Author(s). Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep denoising autoencoder is utilized for acquiring noise-robust audio features. By preparing the training data for the network with pairs of consecutive multiple steps of deteriorated audio features and the corresponding clean features, the network is trained to output denoised audio features from the corresponding features deteriorated by noise. Second, a convolutional neural network (CNN) is utilized to extract visual features from raw mouth area images. By preparing the training data for the CNN as pairs of raw images and the corresponding phoneme label outputs, the network is trained to predict phoneme labels from the corresponding mouth area input images. Finally, a multi-stream HMM (MSHMM) is applied for integrating the acquired audio and visual HMMs independently trained with the respective features. By comparing the cases when normal and denoised mel-frequency cepstral coefficients (MFCCs) are utilized as audio features to the HMM, our unimodal isolated word recognition results demonstrate that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signal-to-noise-ratio (SNR) for the audio signal input. Moreover, our multimodal isolated word recognition results utilizing MSHMM with denoised MFCCs and acquired visual features demonstrate that an additional word recognition rate gain is attained for the SNR conditions below 10 dB.

Beat Tracking for Interactive Dancing Robots

Joao Lobato Oliveira, Gokhan Ince, Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno, Fabien Gouyon, Luis Paulo Reis

International Journal of Humanoid Robotics12(1)p.1 - 242015年05月-

DOI

Preferential Training of Neuro-Dynamical Model Based on Predictability of Target Dynamics

Shun Nishide, Harumitsu Nobuta, Hiroshi G. Okuno, Tetsuya Ogata

Advanced Robotics29(9)p.587 - 5962015年05月-

DOI

Bayesian Audio-to-Score Alignment Based on Joint Inference of Timbre, Volume, Tempo, and Performer-Dependent Note Onset Timings

Akira Maezawa, Hiroshi G. Okuno

Computer Music Journal39(1)p.74 - 872015年05月-

DOI

聞き分ける技術の水平展開(<レクチャーシリーズ>つながりが創発するイノベーション〔第1回〕)

奥乃 博

人工知能:人工知能学会誌30(3)p.366 - 3762015年05月-2015年05月 

CiNii

詳細

ISSN:21882266

Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shunsuke Mori, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing23(2)p.373 - 3822015年02月-

DOI

A Recipe for Empathy: Integrating the mirror system, insula, somatosensory cortex and motherese

Angelica Lim, Hiroshi G. Okuno

International Journal of Social Robotics7(1)p.35 - 492015年02月-

DOI

Audio-Visual Speech Recognition using Deep Learning

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata

Applied Intelligence42(1)p.722 - 7372015年01月-

DOI

Posture Estimation of Hose-shaped Robot by using Active Microphoe Array

Yoshiaki Bando, Takuma Otsuka, Kazuhiro Nakadai, Satoshi Tadokoro, Masashi Konyo, Katsutoshi Itoyama, Hiroshi G. Okuno

Advanced Robotics29(1)p.35 - 492015年01月-

DOI

Improved Sound Source Localization in Horizontal Plane for Binaural Robot Audition

Ui-Hyun Kim, Kazuhiro Nakadai, Hiroshi G. Okuno

Applied Intelligence42(1)p.63 - 742015年01月-

DOI

A Recipe for Empathy Integrating the Mirror System, Insula, Somatosensory Cortex and Motherese

Lim, Angelica;Okuno, Hiroshi G.

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS7(1)p.35 - 492015年-2015年

DOIWoS

詳細

ISSN:1875-4791

Preferential training of neurodynamical model based on predictability of target dynamics

Nishide, Shun;Nobuta, Harumitsu;Okuno, Hiroshi G.;Ogata, Tetsuya

ADVANCED ROBOTICS29(9)p.587 - 5962015年-2015年

DOIWoS

詳細

ISSN:0169-1864

Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources based on Bayesian Nonparametrics

Takuma Otsuka, Katsutoshi Ishiguro, Hiroshi Sawada, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing22(12)p.2218 - 22322014年12月-2014年

DOIWoS

詳細

ISSN:2329-9290

Nonparametric Bayesian dereverberation ofpower spectrograms based on infinite-orderautoregressive processes and interpretation

Akira Maezawa, Katsutoshi Itoyama, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing22(12)p.1918 - 19302014年12月-

DOI

結合動的モデルに基づく音響信号アライメント

前澤陽 ;糸山克寿 ;吉井和佳 ;奥乃博 ;河原達也

情報処理学会研究報告. [音楽情報科学]2014(13)p.1 - 72014年08月-2014年08月 

CiNii

詳細

概要:本稿では,複数の演奏者が演奏した同一楽曲の複数の音響信号の比較を支援するため,各音響信号の時刻を同一楽曲内での位置に対応づける手法 (音響信号アライメント) について述べる.従来,演奏の解析において,テンポの動特性に関するモデルの有用性が指摘されていたが,一般的な音響信号アライメント手法にはテンポ推定の機構がなく,テンポ情報を活用することができなかった.本研究では,テンポの動特性を間接的にモデル化するため,楽曲の各位置で,各音響信号が演奏する,瞬時的なテンポ同士の比率をモデル化する.具体的には,瞬時的なテンポの比率が連続的であり,その変化量は音響信号間で相関があることを仮定することで,テンポ軌跡の連続性と演奏者間の類似性を同時にモデル化する.このとき,変化量を生成する背後にある共分散行列は,少数の代表的な共分散行列から構成されるマルコフ系列であるとして確率的な定式化を行う.これにより,楽曲を通して頻出する,特徴的なテンポ比率の発生箇所とその変動パターンを同時に学習することが出来るため,演奏解析に有益な情報も得られる.評価実験の結果,アライメントの精度が向上することが示され,解釈の違いの分析に対する有用性が示唆された.

混合音中の歌声F0軌跡に対する歌唱表現転写システム

池宮由楽 ;糸山克寿 ;吉井和佳 ;奥乃博

情報処理学会研究報告. [音楽情報科学]2014(23)p.1 - 62014年08月-2014年08月 

CiNii

詳細

概要:本稿では,音楽音響信号に含まれる歌声の基本周波数 (F0) 軌跡に対して歌唱表現 (ビブラート・グリッサンド・こぶし) を転写することを可能とするシステムを提案する.能動的音楽鑑賞インタフェースは,エンドユーザのインタラクティブな音楽鑑賞を実現することを目的とした研究アプローチである.これには既存楽曲の加工支援も含まれ,歌声に関連するものでは,声質変換や歌声分離などの研究がなされている.本研究では,歌唱の歌い回しの加工を扱い,特に混合音中の歌声の F0 軌跡を任意に編集するインタフェースを実現する.ユーザは,歌声の任意の箇所を指定し,好みの歌唱表現を転写することで,歌い回しを自由に加工することができる.また,事前に市販楽曲からプロ歌手の歌唱表現を蓄積したデータベースを作成し,ユーザはそのデータベースから歌唱表現を参照することで直感的に転写を行うことが可能となる.歌唱表現の転写は,対数周波数軸において選択的に歌声のスペクトルのみをシフトさせ,伴奏音への影響を抑圧しながら歌声の音高を操作することで行われる.このとき,音韻性を保持するためスペクトル包絡を用いて音色の補正を行う.実際にユーザが表現の転写箇所を指定したり,F0 の存在範囲を提示するため,Graphical User Interface (GUI) の作成を行っている.実験では,音色補正の有効性やユーザ入力を用いた F0 推定の頑健性などを確認した.

擬似生成した複数方言言語モデル混合による混合方言音声認識

平山 直樹, 吉野 幸一郎, 糸山 克寿, 森 信介, 奥乃 博

情報処理学会論文誌55(7)p.1681 - 16942014年07月-

The MEI Robot: Towards Using Motherese to Develop Multimodal Emotional Intelligence

Angelica Lim, Hiroshi G. Okuno

IEEE Transactions on Autonomous Mental Development6(2)p.126 - 1382014年06月-

DOI

HARKによって定位・分離された多方向音声のアノテーションツールの開発(エージェントを用いた実世界インタラクション,及び一般)

杉山 治;糸山 克寿;中臺 一博;奥乃 博

電子情報通信学会技術研究報告. CNR, クラウドネットワークロボット114(85)p.23 - 262014年06月-2014年06月 

CiNii

詳細

ISSN:0913-5685

概要:本研究では、ロボット聴覚ソフトウェアHARKに基づいた多方向音声のアノテーションツールの開発について述べる。多方向の音声情報を視覚化し分かりやすく提示する既存研究は存在するが、提示された情報をラベル付けし、セマンティックにアノテーションするまでを一括して行うツールはまだ提案されていない。本研究では、HARKによって音源定位・分離された多方向音声をアノテーションするツールを開発し、アノテーションの負荷を軽減するためにSVMに基づく、自動補完機能を実装した。そして、被験者実験を通じて、その有効性を検証した。

音声中の任意検索語検出のための未知語区間推定に基づく選択的インデクス統合法

神田 直之, 糸山 克寿, 奥乃 博

情報処理学会論文誌55(3)p.1201 - 12112014年03月-

Bayesian Nonparametrics for Microphone Array Processing

Takuma Otsuka, Katsutoshi Ishiguro, Hiroshi Sawada, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing22(2)p.493 - 5042014年02月-

DOI

Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observation

Ikkyu Aihara, Takeshi Mizumoto, Takuma Otsuka, Hiromitsu Awano, Kohei Nagira, Hiroshi G. Okuno, Kazuyuki Aihara

Scientific Reports4(3891)p.12014年01月-

DOI

The Interaction between a Robot and Multiple People based on Spatially Mapping of Friendliness and Motion Parameters

Tsuyoshi Tasaki, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics28(1)p.39 - 512014年01月-2014年

DOIWoS

詳細

ISSN:0169-1864

Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes

Maezawa, Akira;Itoyama, Katsutoshi;Yoshii, Kazuyoshi;Okuno, Hiroshi G.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING22(12)p.1918 - 19302014年-2014年

DOIWoS

詳細

ISSN:2329-9290

相槌認識による聞き手の理解状態推定を利用した インタラクションのためのロボット動作制御

田崎 豪, 尾形 哲也, 奥乃 博

ヒューマンインタフェース学会論文誌15(4)p.363 - 3742013年11月-

A real-tome super-resolution robot audition system that improves the robustness of simultaneous speech recognition

Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno

Advanced Robotics27(12)p.933 - 9452013年05月-

DOI

Robust Multipitch Analyzer against Initialization based on Latent Harmonic Allocation using Overtone Corpus

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing21(2)p.246 - 2562013年01月-

DOI

Nonparametric Bayesina Sparse Factor Analysis for Frequency Doain Blind Source Separation without Pearmuation Ambiguity

Kohei Nagira, Takuma Otsuka, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing2013(3)2013年01月-

DOI

Automatic Allocation of Training Data for Speech Understanding based on Multiple Model Combinations

Kazunori Komatani, Mikio Nakano, Masaki Katsumaru, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

IEICE Transactions on Information and SystemsE95-D(9)p.2298 - 23072012年09月-

Automated Violin Fingering Transcription Through Analysis of an Audio Recording

Akira Maezawa, Katsutoshi Itoyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Computer Music Journal36(3)p.57 - 722012年09月-

DOI

Tool-Body Assimilation of Humanoid Robt using Neuro-Dynamical System

Shun Nishide, Jun Tani, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata

IEEE Transactions on Autonomous Mental Development4(2)p.139 - 1492012年06月-

DOI

A musical robot that synchronizes with a co-player using non-verbal cues

Angelica Lim, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics26(4)p.363 - 3812012年01月-

DOI

Towards expressive musical robots: A cross-modal framework for emotional gesture, voice and music,

Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing2012(3)2012年01月-

DOI

A multi-modal tempo and beat tracking system based on audio-visual information from live guitar performance

Tatsuhiko Itohara, Takuma Otsuka, Takeshi Muzumoto, Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing2012(6)2012年01月-

DOI

Efficient Blind Dereverberation and Echo Cancellation based on Independent Component Analysis for Actual Acoustic Signals

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Neural Computation24(1)p.234 - 2722012年01月-

DOI

Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

European Signal Processing Conference2016-Novemberp.1018 - 10222016年11月-2016年11月 

DOIScopus

詳細

ISSN:22195491

概要:© 2016 IEEE.This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

Sound-based online localization for an in-pipe snake robot

Bando, Yoshiaki; Suhara, Hiroki; Tanaka, Motoyasu; Kamegawa, Tetsushi; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Matsuno, Fumitoshi; Okuno, Hiroshi G.

SSRR 2016 - International Symposium on Safety, Security and Rescue Roboticsp.207 - 2132016年12月-2016年12月 

DOIScopus

詳細

概要:© 2016 IEEE.This paper presents a sound-based online localization method for an in-pipe snake robot with an inertial measurement unit (IMU). In-pipe robots, in particular, snake robots need online localization for autonomous inspection and for remote operator supports. The GPS is denied in a pipeline, and conventional odometry-based localization may deteriorate due to slippage and sudden unintended movements. By putting a microphone on the robot and a loudspeaker at the entrance of the pipeline, their distance can be estimated by measuring the time of flight (ToF) of a reference sound emitted from the loudspeaker. Since the sound propagation path in the pipeline is necessary for estimating the robot location, the proposed sound-based online localization method simultaneously estimates the robot location and the pipeline map by combining the distance obtained by the ToF and orientation estimated by the IMU. The experimental results showed that the error of the distance estimation was less than 7% and the accuracy of the pipeline map was more than 68.0%.

Swarm of sound-to-light conversion devices to monitor acoustic communication among small nocturnal animals

Mizumoto, Takeshi; Aihara, Ikkyu; Otsuka, Takuma; Awano, Hiromitsu; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.255 - 2672017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. While many robots have been developed to monitor environments, most studies are dedicated to navigation and locomotion and use off-the-shelf sensors. We focus on a novel acoustic device and its processing software, which is designed for a swarm of environmental monitoring robots equipped with the device. This paper demonstrates that a swarm of monitoring devices is useful for biological field studies, i.e., understanding the spatio-temporal structure of acoustic communication among animals in their natural habitat. The following processes are required in monitoring acoustic communication to analyze the natural behavior in the field: (1) working in their habitat, (2) automatically detecting multiple and simultaneous calls, (3) minimizing the effect on the animals and their habitat, and (4) working with various distributions of animals. We present a sound-imaging system using sound-to-light conversion devices called “Fireflies” and their data analysis method that satisfies the requirements. We can easily collect data by placing a swarm (dozens) of Fireflies and record their light intensities using an offthe- shelf video camera. Because each Firefly converts sound in its vicinity into light, we can easily obtain when, how long, and where animals call using temporal analysis of the Firefly light intensities. The device is evaluated in terms of three aspects: volume to light-intensitycharacteristics, battery life through indoor experiments , and water resistance via field experiments. We also present the visualization of a chorus of Japanese tree frogs (Hyla japonica) recorded in their habitat, that is, paddy fields.

Harkbird: Exploring acoustic interactions in bird communities using a microphone array

Suzuki, Reiji; Matsubayashi, Shiho; Hedley, Richard W.; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.213 - 2232017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. Understanding auditory scenes is important when deploying intelligent robots and systems in real-world environments. We believe that robot audition can better recognize acoustic events in the field as compared to conventional methods such as human observation or recording using single-channel microphone array. We are particularly interested in acoustic interactions among songbirds. Birds do not always vocalize at random, for example, but may instead divide a soundscape so that they avoid overlapping their songs with those of other birds. To understand such complex interaction processes, we must collect much spatiotemporal data in which multiple individuals and species are singing simultaneously. However, it is costly and difficult to annotate many or long recorded tracks manually to detect their interactions. In order to solve this problem, we are developing HARKBird, an easily-available and portable system consisting of a laptop PC with open-source software for robot audition HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) together with a lowcost and commercially available microphone array. HARKBird enables us to extract the songs of multiple individuals from recordings automatically. In this paper, we introduce the current status of our project and report preliminary results of recording experiments in two different types of forests – one in the USA and the other in Japan-using this system to automatically estimate the direction of arrival of the songs of multiple birds, and separate them from the recordings. We also discuss asymmetries among species in terms of their tendency to partition temporal resources.

Size effect on call properties of japanese tree frogs revealed by audio-processing technique

Aihara, Ikkyu; Takeda, Ryu; Mizumoto, Takeshi; Otsuka, Takuma; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.247 - 2542017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. Sensing the external environment is a core function of robots and autonomous mechanics. This function is useful for monitoring and analyzing the ecosystem for our deeper understanding of the nature and accomplishing the sustainable ecosystem. Here, we investigate calling behavior of male frogs by applying audio-processing technique on multiple audio data. In general, male frogs call from their breeding site, and a female frog approaches one of the males by hearing their calls. First, we conducted an indoor experiment to record spontaneous calling behavior of three male Japanese tree frogs, and then separated their call signals according to independent component analysis. The analysis of separated signals shows that chorus size (i.e., the number of calling frogs) has a positive effect on call number, inter-call intervals, and chorus duration. We speculate that a competition in a large chorus encourages the male frogs to make their call properties more attractive to conspecific females.

Acoustic monitoring of the great reed warbler using multiple microphone arrays and robot audition

Matsubayashi, Shiho; Suzuki, Reiji; Saito, Fumiyuki; Murate, Tatsuyoshi; Masuda, Tomohisa; Masuda, Tomohisa; Yamamoto, Koichi; Kojima, Ryosuke; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.224 - 2352017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. This paper reports the results of our field test of HARKBird, a portable system that consists of robot audition, a laptop PC, and omnidirectional microphone arrays. We assessed its localization accuracy to monitor songs of the great reed warbler (Acrocephalus arundinaceus) in time and two-dimensional space by comparing locational and temporal data collected by human observers and HARKBird. Our analysis revealed that stationarity of the singing individual affected the spatial accuracy. Temporally, HARKBird successfully captured the exact song duration in seconds, which cannot be easily achieved by human observers. The data derived from HARKBird suggest that one of the warbler males dominated the sound space. Given the assumption that the cost of the singing activity is represented by song duration in relation to the total recording session, this particular male paid a higher cost of singing, possibly to win the territory of best quality. Overall, this study demonstrated the high potential of HARKBird as an effective alternative to the point count method to survey bird songs in the field.

Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

Bando, Yoshiaki; Bando, Yoshiaki; Saruwatari, Hiroshi; Ono, Nobutaka; Makino, Shoji; Makino, Shoji; Itoyama, Katustoshi; Itoyama, Katustoshi; Kitamura, Daichi; Ishimura, Masaru; Takakusaki, Moe; Mae, Narumi; Yamaoka, Kouei; Yamaoka, Kouei; Matsui, Yutaro; Matsui, Yutaro; Ambe, Yuichi; Konyo, Masashi; Tadokoro, Satoshi; Yoshii, Kazuyoshi; Yoshii, Kazuyoshi; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.198 - 2122017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

Development of a robotic pet using sound source localization with the HARK robot audition system

Suzuki, Ryo; Takahashi, Takuto; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.146 - 1532017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. We have developed a self-propelling robotic pet, in which the robot audition software HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was installed to equip it with sound source localization functions, thus enabling it to move in the direction of sound sources. The developed robot, which is not installed with cameras or speakers, can communicate with humans by using only its own movements and the surrounding audio information obtained using a microphone. We have confirmed through field experiments, during which participants could gain hands-on experience with our developed robot, that participants behaved or felt as if they were touching a real pet. We also found that its highprecision sound source localization could contribute to the promotion and facilitation of human-robot interactions.

Influence of different impulse response measurement signals on MUSIC-based sound source localization

Suzuki, Takuya; Otsuka, Hiroaki; Akahori, Wataru; Bando, Yoshiaki; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics29(1)p.72 - 822017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. Two major functions, sound source localization and sound source separation, provided by robot audition open source software HARK exploit the acoustic transfer functions of a microphone array to improve the performance. The acoustic transfer functions are calculated from the measured acoustic impulse response. In the measurement, special signals such as Time Stretched Pulse (TSP) are used to improve the signal-to-noise ratio of the measurement signals. Recent studies have identified the importance of selecting a measurement signal according to the applications. In this paper, we investigate how six measurement signals – up-TSP, down-TSP, M-Series, Log-SS, NW-SS, and MN-SS – influence the performance of the MUSIC-based sound source localization provided by HARK. Experiments with simulated sounds, up to three simultaneous sound sources, demonstrate no significant difference among the six measurement signals in the MUSIC-based sound source localization.

Development, deployment and applications of robot audition open source software HARK

Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Mizumoto, Takeshi; Mizumoto, Takeshi

Journal of Robotics and Mechatronics29(1)p.16 - 252017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

概要:© 2017, Fuji Technology Press. All rights reserved. Robot audition is a research field that focuses on developing technologies so that robots can hear sound through their own ears (microphones). By compiling robot audition studies performed over more than 10 years, open source software for research purposes called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was released to the public in 2008. HARK is updated every year, and free tutorials are often held for its promotion. In this paper, the major functions of HARK – such as sound source localization, sound source separation, and automatic speech recognition – are explained. In order to promote HARK, HARK-Embedded for embedding purposes and HARK-SaaS used as Software as a Service (SaaS) have been actively studied and developed in recent years; these technologies are also described in the paper. In addition, applications of HARK are introduced as case studies.

Special issue on robot audition technologies

Okuno, Hiroshi G.; Nakadai, Kazuhiro

Journal of Robotics and Mechatronics29(1)2017年02月-2017年02月 

DOIScopus

詳細

ISSN:09153942

書籍等出版物

インターネット活用術,岩波科学ライブラリー44

奥乃博

岩波書店1996年 11月-

詳細

ISBN:4000065440

Computational Auditory Scene Analysis

David F. Rosenthal and Hiroshi G. Okuno (Eds)

Lawrence Erlbaum Associates1998年 04月-

詳細

ISBN:0805822836

Advanced Lisp Technology

Taiichi Yuasa and Hiroshi G. Okuno (Eds)

Taylor and Francis Publishers2002年 05月-

詳細

ISBN:1455778818

作品・ソフトウェア・教材・フィールドワーク等

外部研究資金

科学研究費採択状況

研究種別:基盤研究(S)

ロボット聴覚の実環境理解に向けた多面的展開

2012年06月-2017年03月

研究種別:基盤研究(S)

音環境理解研究からのロボット聴覚の構築

2007年06月-2012年03月

研究資金の受入れ状況

提供機関:科学技術振興機構制度名:内閣府 ImPACT実施形態:受託教育

極限音響のための基礎技術の開発2014年09月-2019年03月

提供機関:科学技術振興機構制度名:日仏研究交流実施形態:共同研究

ヒューマノイドロボットのための能動的両耳聴の研究2009年09月-2014年03月

現在担当している科目

科目名開講学部・研究科開講年度学期
実体情報学概論大学院基幹理工学研究科2017春学期
実体情報学概論大学院創造理工学研究科2017春学期
実体情報学概論大学院先進理工学研究科2017春学期
実体情報学演習A大学院基幹理工学研究科2017春学期
実体情報学演習A大学院創造理工学研究科2017春学期
実体情報学演習A大学院先進理工学研究科2017春学期
実体情報学演習B大学院基幹理工学研究科2017秋学期
実体情報学演習B大学院創造理工学研究科2017秋学期
実体情報学演習B大学院先進理工学研究科2017秋学期
実体情報学演習C大学院基幹理工学研究科2017春学期
実体情報学演習C大学院創造理工学研究科2017春学期
実体情報学演習C大学院先進理工学研究科2017春学期
実体情報学演習D大学院基幹理工学研究科2017秋学期
実体情報学演習D大学院創造理工学研究科2017秋学期
実体情報学演習D大学院先進理工学研究科2017秋学期
実体情報学演習E大学院基幹理工学研究科2017春学期
実体情報学演習E大学院創造理工学研究科2017春学期
実体情報学演習E大学院先進理工学研究科2017春学期
実体情報学演習F大学院基幹理工学研究科2017秋学期
実体情報学演習F大学院創造理工学研究科2017秋学期
実体情報学演習F大学院先進理工学研究科2017秋学期
実体情報学演習G大学院基幹理工学研究科2017春学期
実体情報学演習G大学院創造理工学研究科2017春学期
実体情報学演習G大学院先進理工学研究科2017春学期
実体情報学演習H大学院基幹理工学研究科2017秋学期
実体情報学演習H大学院創造理工学研究科2017秋学期
実体情報学演習H大学院先進理工学研究科2017秋学期
実体情報学演習I大学院基幹理工学研究科2017春学期
実体情報学演習I大学院創造理工学研究科2017春学期
実体情報学演習I大学院先進理工学研究科2017春学期
実体情報学演習J大学院基幹理工学研究科2017秋学期
実体情報学演習J大学院創造理工学研究科2017秋学期
実体情報学演習J大学院先進理工学研究科2017秋学期
実体情報学特別演習大学院基幹理工学研究科2017通年
実体情報学特別演習大学院創造理工学研究科2017通年
実体情報学特別演習大学院先進理工学研究科2017通年
海外・国内英語研修大学院基幹理工学研究科2017通年
海外・国内英語研修大学院創造理工学研究科2017通年
海外・国内英語研修大学院先進理工学研究科2017通年
海外インターンシップ大学院基幹理工学研究科2017通年
海外インターンシップ大学院創造理工学研究科2017通年
海外インターンシップ大学院先進理工学研究科2017通年
修士論文(総合機械)大学院創造理工学研究科2017通年
ヒューマンロボットインタフェース研究大学院創造理工学研究科2017通年
ヒューマンロボットインタフェース研究大学院創造理工学研究科2017通年