Name

OKUNO, Hiroshi G.

Official Title

Professor(without tenure)

Affiliation

(Graduate School of Creative Science and Engineering)

Contact Information

Mail Address

Mail Address
okuno@aoni.waseda.jp
Mail Address(Others)
okuno@nue.org

Address・Phone Number・Fax Number

Address
Rambducks Bldg 3rd Floor, 2-4-12 Okubo, Shinjuku-City, Tokyo 169-0072, Japan
Phone Number
+81-3-6233-7801
Fax Number
+81-3-5285-0028

URL

Web Page URL

http://www.aoni.waseda.jp/okuno

Grant-in-aids for Scientific Researcher Number
60318201

Sub-affiliation

Affiliated Institutes

ヒューマン・ロボット共創研究所

研究所員 2017-

理工学術院総合研究所(理工学研究所)

兼任研究員 2018-

Educational background・Degree

Educational background

-1972 The University of Tokyo Faculty of Liberal Arts Department of Pure and Applied Sciences

Degree

Ph.D Thesis The University of Tokyo Intelligent informatics

BA Coursework The University of Tokyo

Career

2014/04-present Professor (without tenure), Graduate Program for Embodiment Informatics, Waseda University
2001/04-2014/03Professor, Graduate School of Informatics, Kyoto University
1999-2001Professor, School of Science and Technology, Tokyo University of Science
1998-1999Research Manager and Group Leader, ERATO Kitano Symbiotic Systems Project, Japan Science and Technology Corporation
1972-1998Researcher, Group Leader, NTT Basic Research Laboratories, NTT

Academic Society Joined

IEEE

Japanese Society for Artificial Intelligence

Information Processing Society of Japan

Japanese Society for Software Science and Technology

Robotics Society of Japan

ACM

AAAI

ASA

Award

Amity Research Award

2019/01Conferment Institution:Amity University

Title:For Significant Contribution in the Field of Robot Audition

Award Winner(Group):Hiroshi G. Okuno

2013 Research Achievement Award, Japanese Society for Artificial Intelligence

2014/06

2013 Minister of EXT Award

2013/04

Honorary Professor, Amity University

2019/01Conferment Institution:Amity University

Title:at Amity School of Engineering and Technology

Award Winner(Group):Hiroshi G. Okuno

Professor Emeritus

2014/04

Fellow, Robotics Society of Japan

2015/09

Fellow, Information Processing Society of Japan

2015/06

Fellow, Japanese Society for Artificial Intelligence

2013/06

IEEE Fellow

2012/01Conferment Institution:IEEE

Title:Fellow for contributions to robot audition technology

Award Winner(Group):Hiroshi G. Okuno

2014 Best Reviewer Award of Kakenhi, JSPS

2014/10

2016 Advanced Robotics誌 Best Paper Award

2016/10Conferment Institution:Robotic Society of Japan

Title:Posture Estimation of Hose-shaped Robot by using Active Microphoe Array

Award Winner(Group):Yoshiaki Bando, Takuma Otsuka, Kazuhiro Nakadai, Satoshi Tadokoro, Masashi Konyo, Katsutoshi Itoyama, Hiroshi G. Okuno

Avanced Robotics Second Best Paper Award

2014/09

IEEE SSRR-2015 Best Innovative Paper Award

2015/10Conferment Institution:IEEE-RAS SSRR-2015

Title:Human-Voice Enhancement based on Online RPCA for a Hose-shaped Rescue Robot with a Microphone Array

Award Winner(Group):Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

IEEE SSRR-2015 People's Choice Demo

2015/10Conferment Institution:IEEE-RAS SSRR-2015

Title:Hose-shaped Rescue Robot

Award Winner(Group):Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

2013 SIG Award, Japanese Society for Artificial Intelligence

2014/06

The Best Paper Award, IEA/AIE-2013

2013/06

The Best Paper Award, IWSEC-2013

2013/11

NTF Award for Entertainment Robots and Systems, IEEE/RSJ IROS-2010

2010/10

Award for a Best Paper, IEA/AIE-2010, International Society for Applied Intelligence

2010/06

RSJ/SICE AWARD for IROS2006 Best Paper Nomination Fiinalist

2007/10

International Society for Applied Intelligence, IEA/AIE-2005 Best Paper Award

2005/10

Nakamura Award for IROS 2001 Best Paper Finalist Nomination, IEEE and RSJ

2002/08

IEA/AIE-2001 Best Paper Award, International Society of Applied Intelligence

2001/06

Finalist of IROS Best Paper Awards on Safety, Security, and Rescue Robotics

2017/10Conferment Institution:IEEE/RSJ IROS-2017

Title:Development of Microphone-Array-Embedded UAV for Search and Rescue Task

Award Winner(Group):Hiroshi G. Okuno et al.

Outstanding reviewer - Robotics and Autonomous Systems

2018/04Conferment Institution:Elsevier

Award Winner(Group):Hiroshi G. Okuno

Research Field

Keywords

Computational Auditory Scene Analysis, Robot Audition, Artificial Intelligence

Grants-in-Aid for Scientific Research classification

Informatics / Human informatics / Intelligent robotics

Informatics / Calculations of Informatics / Multimedia database

Informatics / Human informatics / Intelligent informatics

Research interests Career

Artificial Intelligence

Individual research allowance

Computational Auditory Scene Analysis

Individual research allowance

Robot Audition

Cooperative Research within Japan

Paper

Visualizing Phonotactic Behavior of Female Frogs in Darkness

Ikkyu Aihara

Scientific Reports Peer Review Yes 7(10539) p.12017/09-

DOI

Swarm of sound-to-light conversion devices to monitor acoustic communication among small nocturnal animals

Mizumoto, Takeshi; Aihara, Ikkyu; Otsuka, Takuma; Awano, Hiromitsu; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.255 - 2672017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. While many robots have been developed to monitor environments, most studies are dedicated to navigation and locomotion and use off-the-shelf sensors. We focus on a novel acoustic device and its processing software, which is designed for a swarm of environmental monitoring robots equipped with the device. This paper demonstrates that a swarm of monitoring devices is useful for biological field studies, i.e., understanding the spatio-temporal structure of acoustic communication among animals in their natural habitat. The following processes are required in monitoring acoustic communication to analyze the natural behavior in the field: (1) working in their habitat, (2) automatically detecting multiple and simultaneous calls, (3) minimizing the effect on the animals and their habitat, and (4) working with various distributions of animals. We present a sound-imaging system using sound-to-light conversion devices called “Fireflies” and their data analysis method that satisfies the requirements. We can easily collect data by placing a swarm (dozens) of Fireflies and record their light intensities using an offthe- shelf video camera. Because each Firefly converts sound in its vicinity into light, we can easily obtain when, how long, and where animals call using temporal analysis of the Firefly light intensities. The device is evaluated in terms of three aspects: volume to light-intensitycharacteristics, battery life through indoor experiments , and water resistance via field experiments. We also present the visualization of a chorus of Japanese tree frogs (Hyla japonica) recorded in their habitat, that is, paddy fields.

Harkbird: Exploring acoustic interactions in bird communities using a microphone array

Suzuki, Reiji; Matsubayashi, Shiho; Hedley, Richard W.; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.213 - 2232017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. Understanding auditory scenes is important when deploying intelligent robots and systems in real-world environments. We believe that robot audition can better recognize acoustic events in the field as compared to conventional methods such as human observation or recording using single-channel microphone array. We are particularly interested in acoustic interactions among songbirds. Birds do not always vocalize at random, for example, but may instead divide a soundscape so that they avoid overlapping their songs with those of other birds. To understand such complex interaction processes, we must collect much spatiotemporal data in which multiple individuals and species are singing simultaneously. However, it is costly and difficult to annotate many or long recorded tracks manually to detect their interactions. In order to solve this problem, we are developing HARKBird, an easily-available and portable system consisting of a laptop PC with open-source software for robot audition HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) together with a lowcost and commercially available microphone array. HARKBird enables us to extract the songs of multiple individuals from recordings automatically. In this paper, we introduce the current status of our project and report preliminary results of recording experiments in two different types of forests – one in the USA and the other in Japan-using this system to automatically estimate the direction of arrival of the songs of multiple birds, and separate them from the recordings. We also discuss asymmetries among species in terms of their tendency to partition temporal resources.

Size effect on call properties of japanese tree frogs revealed by audio-processing technique

Aihara, Ikkyu; Takeda, Ryu; Mizumoto, Takeshi; Otsuka, Takuma; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.247 - 2542017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. Sensing the external environment is a core function of robots and autonomous mechanics. This function is useful for monitoring and analyzing the ecosystem for our deeper understanding of the nature and accomplishing the sustainable ecosystem. Here, we investigate calling behavior of male frogs by applying audio-processing technique on multiple audio data. In general, male frogs call from their breeding site, and a female frog approaches one of the males by hearing their calls. First, we conducted an indoor experiment to record spontaneous calling behavior of three male Japanese tree frogs, and then separated their call signals according to independent component analysis. The analysis of separated signals shows that chorus size (i.e., the number of calling frogs) has a positive effect on call number, inter-call intervals, and chorus duration. We speculate that a competition in a large chorus encourages the male frogs to make their call properties more attractive to conspecific females.

Acoustic monitoring of the great reed warbler using multiple microphone arrays and robot audition

Matsubayashi, Shiho; Suzuki, Reiji; Saito, Fumiyuki; Murate, Tatsuyoshi; Masuda, Tomohisa; Masuda, Tomohisa; Yamamoto, Koichi; Kojima, Ryosuke; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.224 - 2352017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. This paper reports the results of our field test of HARKBird, a portable system that consists of robot audition, a laptop PC, and omnidirectional microphone arrays. We assessed its localization accuracy to monitor songs of the great reed warbler (Acrocephalus arundinaceus) in time and two-dimensional space by comparing locational and temporal data collected by human observers and HARKBird. Our analysis revealed that stationarity of the singing individual affected the spatial accuracy. Temporally, HARKBird successfully captured the exact song duration in seconds, which cannot be easily achieved by human observers. The data derived from HARKBird suggest that one of the warbler males dominated the sound space. Given the assumption that the cost of the singing activity is represented by song duration in relation to the total recording session, this particular male paid a higher cost of singing, possibly to win the territory of best quality. Overall, this study demonstrated the high potential of HARKBird as an effective alternative to the point count method to survey bird songs in the field.

Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

Bando, Yoshiaki; Bando, Yoshiaki; Saruwatari, Hiroshi; Ono, Nobutaka; Makino, Shoji; Makino, Shoji; Itoyama, Katustoshi; Itoyama, Katustoshi; Kitamura, Daichi; Ishimura, Masaru; Takakusaki, Moe; Mae, Narumi; Yamaoka, Kouei; Yamaoka, Kouei; Matsui, Yutaro; Matsui, Yutaro; Ambe, Yuichi; Konyo, Masashi; Tadokoro, Satoshi; Yoshii, Kazuyoshi; Yoshii, Kazuyoshi; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.198 - 2122017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

Development of a robotic pet using sound source localization with the HARK robot audition system

Suzuki, Ryo; Takahashi, Takuto; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.146 - 1532017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. We have developed a self-propelling robotic pet, in which the robot audition software HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was installed to equip it with sound source localization functions, thus enabling it to move in the direction of sound sources. The developed robot, which is not installed with cameras or speakers, can communicate with humans by using only its own movements and the surrounding audio information obtained using a microphone. We have confirmed through field experiments, during which participants could gain hands-on experience with our developed robot, that participants behaved or felt as if they were touching a real pet. We also found that its highprecision sound source localization could contribute to the promotion and facilitation of human-robot interactions.

Influence of different impulse response measurement signals on MUSIC-based sound source localization

Suzuki, Takuya; Otsuka, Hiroaki; Akahori, Wataru; Bando, Yoshiaki; Okuno, Hiroshi G.; Okuno, Hiroshi G.

Journal of Robotics and Mechatronics 29(1) p.72 - 822017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. Two major functions, sound source localization and sound source separation, provided by robot audition open source software HARK exploit the acoustic transfer functions of a microphone array to improve the performance. The acoustic transfer functions are calculated from the measured acoustic impulse response. In the measurement, special signals such as Time Stretched Pulse (TSP) are used to improve the signal-to-noise ratio of the measurement signals. Recent studies have identified the importance of selecting a measurement signal according to the applications. In this paper, we investigate how six measurement signals – up-TSP, down-TSP, M-Series, Log-SS, NW-SS, and MN-SS – influence the performance of the MUSIC-based sound source localization provided by HARK. Experiments with simulated sounds, up to three simultaneous sound sources, demonstrate no significant difference among the six measurement signals in the MUSIC-based sound source localization.

Development, deployment and applications of robot audition open source software HARK

Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Okuno, Hiroshi G.; Mizumoto, Takeshi; Mizumoto, Takeshi

Journal of Robotics and Mechatronics 29(1) p.16 - 252017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Outline:© 2017, Fuji Technology Press. All rights reserved. Robot audition is a research field that focuses on developing technologies so that robots can hear sound through their own ears (microphones). By compiling robot audition studies performed over more than 10 years, open source software for research purposes called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was released to the public in 2008. HARK is updated every year, and free tutorials are often held for its promotion. In this paper, the major functions of HARK – such as sound source localization, sound source separation, and automatic speech recognition – are explained. In order to promote HARK, HARK-Embedded for embedding purposes and HARK-SaaS used as Software as a Service (SaaS) have been actively studied and developed in recent years; these technologies are also described in the paper. In addition, applications of HARK are introduced as case studies.

Special issue on robot audition technologies

Okuno, Hiroshi G.; Nakadai, Kazuhiro

Journal of Robotics and Mechatronics 29(1) 2017/02-2017/02

DOIScopus

Detail

ISSN:09153942

Sound-based online localization for an in-pipe snake robot

Bando, Yoshiaki; Suhara, Hiroki; Tanaka, Motoyasu; Kamegawa, Tetsushi; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Matsuno, Fumitoshi; Okuno, Hiroshi G.

SSRR 2016 - International Symposium on Safety, Security and Rescue Robotics p.207 - 2132016/12-2016/12

DOIScopus

Detail

Outline:© 2016 IEEE.This paper presents a sound-based online localization method for an in-pipe snake robot with an inertial measurement unit (IMU). In-pipe robots, in particular, snake robots need online localization for autonomous inspection and for remote operator supports. The GPS is denied in a pipeline, and conventional odometry-based localization may deteriorate due to slippage and sudden unintended movements. By putting a microphone on the robot and a loudspeaker at the entrance of the pipeline, their distance can be estimated by measuring the time of flight (ToF) of a reference sound emitted from the loudspeaker. Since the sound propagation path in the pipeline is necessary for estimating the robot location, the proposed sound-based online localization method simultaneously estimates the robot location and the pipeline map by combining the distance obtained by the ToF and orientation estimated by the IMU. The experimental results showed that the error of the distance estimation was less than 7% and the accuracy of the pipeline map was more than 68.0%.

Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

European Signal Processing Conference 2016-Novemberp.1018 - 10222016/11-2016/11

DOIScopus

Detail

ISSN:22195491

Outline:© 2016 IEEE.This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

Computational Creation of Footsteps Illusion Art and Its Practical Applications(State-of-the-art Research)

Nakadai Kazuhiro;Okuno Hiroshi G.;Mizumoto Takeshi;Nakamura Keisuke

Journal of the Japan Society for Simulation Technology 35(1) p.32 - 382016/03-2016/03

CiNii

Detail

ISSN:02859947

Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

SSRR 2015 - 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics 2016/03-2016/03

DOIScopus

Detail

Outline:© 2015 IEEE.This paper presents an online real-time method that enhances human voices included in severely noisy audio signals captured by microphones of a hose-shaped rescue robot. To help a remote operator of such a robot pick up a weak voice of a human buried under rubble, it is crucial to suppress the loud ego-noise caused by the movements of the robot in real time. We tackle this task by using online robust principal component analysis (ORPCA) for decomposing the spectrogram of an observed noisy signal into the sum of low-rank and sparse spectrograms that are expected to correspond to periodic ego-noise and human voices. Using a microphone array distributed on the long body of a hose-shaped robot, ego-noise suppression can be further improved by combining the results of ORPCA applied to the observed signal captured by each microphone. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method improves the performance of conventional ego-noise suppression using only one microphone by 7.4 dB in SDR and 17.2 in SIR.

Microphone-accelerometer based 3D posture estimation for a hose-shaped rescue robot

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

IEEE International Conference on Intelligent Robots and Systems 2015-Decemberp.5580 - 55862015/12-2015/12

DOIScopus

Detail

ISSN:21530858

Outline:© 2015 IEEE.3D posture estimation for a hose-shaped robot is critical in rescue activities due to complex physical environments. Conventional sound-based posture estimation assumes rather flat physical environments and focuses only on 2D, resulting in poor performance in real world environments with rubble. This paper presents novel 3D posture estimation by exploiting microphones and accelerometers. The idea of our method is to compensate the lack of posture information obtained by sound-based time-difference-of arrival (TDOA) with the tilt information obtained from accelerometers. This compensation is formulated as a nonlinear state-space model and solved by the unscented Kalman filter. Experiments are conducted by using a 3m hose-shaped robot with eight units of a microphone and an accelerometer and seven units of a loudspeaker and a vibration motor deployed in a simple 3D structure. Experimental results demonstrate that our method reduces the errors of initial states to about 20 cm in the 3D space. If the initial errors of initial states are less than 20 %, our method can estimate the correct 3D posture in real-time.

Unified inter- and intra-recording duration model for multiple music audio alignment

Maezawa, Akira; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2015 2015/11-2015/11

DOIScopus

Detail

Outline:© 2015 IEEE.This paper presents a probabilistic audio-to-audio alignment method that focuses on the relationship among the note durations of different performances of a piece of music. A key issue in probabilistic audio alignment methods is in expressing how interrelated are the durations of notes in the underlying piece of music. Existing studies focus either on the duration of adjacent notes within a recording (intra-recording duration model), or the duration of a given note across different recordings (inter-recording duration model). This paper unifies these approaches through a simple modification to them. Furthermore, the paper extends the unified model, allowing the dynamics of the note duration to change sporadically. Experimental evaluation demonstrated that the proposed models decrease the alignment error.

Audio-visual speech recognition using deep learning

Noda, Kuniaki; Yamaguchi, Yuki; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Ogata, Tetsuya

Applied Intelligence 42(4) p.722 - 7372015/06-2015/06

DOIScopus

Detail

ISSN:0924669X

Outline:© 2014, The Author(s). Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep denoising autoencoder is utilized for acquiring noise-robust audio features. By preparing the training data for the network with pairs of consecutive multiple steps of deteriorated audio features and the corresponding clean features, the network is trained to output denoised audio features from the corresponding features deteriorated by noise. Second, a convolutional neural network (CNN) is utilized to extract visual features from raw mouth area images. By preparing the training data for the CNN as pairs of raw images and the corresponding phoneme label outputs, the network is trained to predict phoneme labels from the corresponding mouth area input images. Finally, a multi-stream HMM (MSHMM) is applied for integrating the acquired audio and visual HMMs independently trained with the respective features. By comparing the cases when normal and denoised mel-frequency cepstral coefficients (MFCCs) are utilized as audio features to the HMM, our unimodal isolated word recognition results demonstrate that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signal-to-noise-ratio (SNR) for the audio signal input. Moreover, our multimodal isolated word recognition results utilizing MSHMM with denoised MFCCs and acquired visual features demonstrate that an additional word recognition rate gain is attained for the SNR conditions below 10 dB.

Beat Tracking for Interactive Dancing Robots

Joao Lobato Oliveira, Gokhan Ince, Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno, Fabien Gouyon, Luis Paulo Reis

International Journal of Humanoid Robotics 12(1) p.1 - 242015/05-

DOI

Preferential Training of Neuro-Dynamical Model Based on Predictability of Target Dynamics

Shun Nishide, Harumitsu Nobuta, Hiroshi G. Okuno, Tetsuya Ogata

Advanced Robotics 29(9) p.587 - 5962015/05-

DOI

Bayesian Audio-to-Score Alignment Based on Joint Inference of Timbre, Volume, Tempo, and Performer-Dependent Note Onset Timings

Akira Maezawa, Hiroshi G. Okuno

Computer Music Journal 39(1) p.74 - 872015/05-

DOI

Horizontal Deployment of Listening to Several Things at Once

Okuno Hiroshi G.

journal of the Japanese Society for Artificial Intelligence 30(3) p.366 - 3762015/05-2015/05

CiNii

Detail

ISSN:21882266

Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shunsuke Mori, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing 23(2) p.373 - 3822015/02-

DOI

A Recipe for Empathy: Integrating the mirror system, insula, somatosensory cortex and motherese

Angelica Lim, Hiroshi G. Okuno

International Journal of Social Robotics 7(1) p.35 - 492015/02-

DOI

Audio-Visual Speech Recognition using Deep Learning

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata

Applied Intelligence 42(1) p.722 - 7372015/01-

DOI

Posture Estimation of Hose-shaped Robot by using Active Microphoe Array

Yoshiaki Bando, Takuma Otsuka, Kazuhiro Nakadai, Satoshi Tadokoro, Masashi Konyo, Katsutoshi Itoyama, Hiroshi G. Okuno

Advanced Robotics 29(1) p.35 - 492015/01-

DOI

Improved Sound Source Localization in Horizontal Plane for Binaural Robot Audition

Ui-Hyun Kim, Kazuhiro Nakadai, Hiroshi G. Okuno

Applied Intelligence 42(1) p.63 - 742015/01-

DOI

A Recipe for Empathy Integrating the Mirror System, Insula, Somatosensory Cortex and Motherese

Lim, Angelica;Okuno, Hiroshi G.

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS 7(1) p.35 - 492015-2015

DOIWoS

Detail

ISSN:1875-4791

Preferential training of neurodynamical model based on predictability of target dynamics

Nishide, Shun;Nobuta, Harumitsu;Okuno, Hiroshi G.;Ogata, Tetsuya

ADVANCED ROBOTICS 29(9) p.587 - 5962015-2015

DOIWoS

Detail

ISSN:0169-1864

Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources based on Bayesian Nonparametrics

Takuma Otsuka, Katsutoshi Ishiguro, Hiroshi Sawada, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing 22(12) p.2218 - 22322014/12-2014

DOIWoS

Detail

ISSN:2329-9290

Nonparametric Bayesian dereverberation ofpower spectrograms based on infinite-orderautoregressive processes and interpretation

Akira Maezawa, Katsutoshi Itoyama, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing 22(12) p.1918 - 19302014/12-

DOI

擬似生成した複数方言言語モデル混合による混合方言音声認識

平山 直樹, 吉野 幸一郎, 糸山 克寿, 森 信介, 奥乃 博

情報処理学会論文誌 55(7) p.1681 - 16942014/07-

The MEI Robot: Towards Using Motherese to Develop Multimodal Emotional Intelligence

Angelica Lim, Hiroshi G. Okuno

IEEE Transactions on Autonomous Mental Development 6(2) p.126 - 1382014/06-

DOI

Design and Implementation of Multidirectional Sound Annotation Tool with HARK

SUGIYAMA Osamu;ITOYAMA Katsutoshi;NAKADAI Kazuhiro;OKUNO Hiroshi G.

114(85) p.23 - 262014/06-2014/06

CiNii

Detail

ISSN:0913-5685

Outline:In this study we designed and developed the multidirectional sound source annotation tool with the robot audition software, HARK. With the rise of inexpensive microphone array products and the robot audition software called HARK, we can record and analyze multidirectional sound sources easily. The combination of microphone array and the software enables us to separate, localize, and track multidirectional sound sources. Most of the solutions for accessing these separated sound source information provide clients for interpreting simplified information about the separated sources, but not to directly execute the semantic annotations. Our proposed sound annotation tool provides drag & drop operation of annotation with a 3D sound source view and also provides annotation autocompletion with a SVM trained with the user ' s annotation history. The proposed features enable users to do the annotation task intuitively and confirm its result. We also conducted an evaluation demonstrating the efficiency of annotation done using the tool.

音声中の任意検索語検出のための未知語区間推定に基づく選択的インデクス統合法

神田 直之, 糸山 克寿, 奥乃 博

情報処理学会論文誌 55(3) p.1201 - 12112014/03-

Bayesian Nonparametrics for Microphone Array Processing

Takuma Otsuka, Katsutoshi Ishiguro, Hiroshi Sawada, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech and Language Processing 22(2) p.493 - 5042014/02-

DOI

Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observation

Ikkyu Aihara, Takeshi Mizumoto, Takuma Otsuka, Hiromitsu Awano, Kohei Nagira, Hiroshi G. Okuno, Kazuyuki Aihara

Scientific Reports 4(3891) p.12014/01-

DOI

The Interaction between a Robot and Multiple People based on Spatially Mapping of Friendliness and Motion Parameters

Tsuyoshi Tasaki, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 28(1) p.39 - 512014/01-2014

DOIWoS

Detail

ISSN:0169-1864

Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes

Maezawa, Akira;Itoyama, Katsutoshi;Yoshii, Kazuyoshi;Okuno, Hiroshi G.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 22(12) p.1918 - 19302014-2014

DOIWoS

Detail

ISSN:2329-9290

相槌認識による聞き手の理解状態推定を利用した インタラクションのためのロボット動作制御

田崎 豪, 尾形 哲也, 奥乃 博

ヒューマンインタフェース学会論文誌 15(4) p.363 - 3742013/11-

A real-tome super-resolution robot audition system that improves the robustness of simultaneous speech recognition

Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno

Advanced Robotics 27(12) p.933 - 9452013/05-

DOI

Robust Multipitch Analyzer against Initialization based on Latent Harmonic Allocation using Overtone Corpus

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing 21(2) p.246 - 2562013/01-

DOI

Nonparametric Bayesina Sparse Factor Analysis for Frequency Doain Blind Source Separation without Pearmuation Ambiguity

Kohei Nagira, Takuma Otsuka, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing 2013(3) 2013/01-

DOI

Automatic Allocation of Training Data for Speech Understanding based on Multiple Model Combinations

Kazunori Komatani, Mikio Nakano, Masaki Katsumaru, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

IEICE Transactions on Information and Systems E95-D(9) p.2298 - 23072012/09-

Automated Violin Fingering Transcription Through Analysis of an Audio Recording

Akira Maezawa, Katsutoshi Itoyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Computer Music Journal 36(3) p.57 - 722012/09-

DOI

Tool-Body Assimilation of Humanoid Robt using Neuro-Dynamical System

Shun Nishide, Jun Tani, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata

IEEE Transactions on Autonomous Mental Development 4(2) p.139 - 1492012/06-

DOI

A musical robot that synchronizes with a co-player using non-verbal cues

Angelica Lim, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 26(4) p.363 - 3812012/01-

DOI

Towards expressive musical robots: A cross-modal framework for emotional gesture, voice and music,

Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing 2012(3) 2012/01-

DOI

A multi-modal tempo and beat tracking system based on audio-visual information from live guitar performance

Tatsuhiko Itohara, Takuma Otsuka, Takeshi Muzumoto, Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

EURASIP Journal on Audio, Speech, and Music Processing 2012(6) 2012/01-

DOI

Efficient Blind Dereverberation and Echo Cancellation based on Independent Component Analysis for Actual Acoustic Signals

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Neural Computation 24(1) p.234 - 2722012/01-

DOI

Visualizing Phonotactic Behavior of Female Frogs in Darkness

Aihara, Ikkyu; Bishop, Phillip J.; Ohmer, Michel E.B.; Awano, Hiromitsu; Mizumoto, Takeshi; Okuno, Hiroshi G.; Narins, Peter M.; Hero, Jean Marc

Scientific Reports 7(1) 2017/12-2017/12

DOIScopus

Detail

Outline:© 2017 The Author(s). Many animals use sounds produced by conspecifics for mate identification. Female insects and anuran amphibians, for instance, use acoustic cues to localize, orient toward and approach conspecific males prior to mating. Here we present a novel technique that utilizes multiple, distributed sound-indication devices and a miniature LED backpack to visualize and record the nocturnal phonotactic approach of females of the Australian orange-eyed tree frog (Litoria chloris) both in a laboratory arena and in the animal's natural habitat. Continuous high-definition digital recording of the LED coordinates provides automatic tracking of the female's position, and the illumination patterns of the sound-indication devices allow us to discriminate multiple sound sources including loudspeakers broadcasting calls as well as calls emitted by individual male frogs. This innovative methodology is widely applicable for the study of phonotaxis and spatial structures of acoustically communicating nocturnal animals.

Design of UAV-embedded microphone array system for sound source localization in outdoor environments

Hoshiba, Kotaro; Washizaki, Kai; Wakabayashi, Mizuho; Ishiki, Takahiro; Ishiki, Takahiro; Kumon, Makoto; Bando, Yoshiaki; Gabriel, Daniel; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Okuno, Hiroshi G.

Sensors (Switzerland) 17(11) 2017/11-2017/11

DOIScopus

Detail

ISSN:14248220

Outline:© 2017 by the authors. Licensee MDPI, Basel, Switzerland. In search and rescue activities, unmanned aerial vehicles (UAV) should exploit sound information to compensate for poor visual information. This paper describes the design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments. Four critical development problems included water-resistance of the microphone array, efficiency in assembling, reliability of wireless communication, and sufficiency of visualization tools for operators. To solve these problems, we developed a spherical microphone array system (SMAS) consisting of a microphone array, a stable wireless network communication system, and intuitive visualization tools. The performance of SMAS was evaluated with simulated data and a demonstration in the field. Results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.

Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms

Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Kawahara, Tatsuya; Okuno, Hiroshi G.

IEEE/ACM Transactions on Audio Speech and Language Processing 26(2) p.215 - 2302018/02-2018/02

DOIScopus

Detail

ISSN:23299290

Outline:© 2017 IEEE. This paper presents a blind multichannel speech enhancement method that can deal with the time-varying layout of microphones and sound sources. Since nonnegative tensor factorization (NTF) separates a multichannelmagnitude (or power) spectrogram into source spectrograms without phase information, it is robust against the time-varying mixing system. This method, however, requires prior information such as the spectral bases (templates) of each source spectrogram in advance. To solve this problem, we develop a Bayesian model called robust NTF (Bayesian RNTF) that decomposes a multichannel magnitude spectrogram into target speech and noise spectrograms based on their sparseness and low rankness. Bayesian RNTF is applied to the challenging task of speech enhancement for a microphone array distributed on a hose-shaped rescue robot. When the robot searches for victims under collapsed buildings, the layout of themicrophones changes over time and some of them often fail to capture target speech. Our method robustly works under such situations, thanks to its characteristic of time-varying mixing system. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method outperforms conventional blind methods in enhancement performance by the signal-to-noise ratio of 1.03 dB.

Development of microphone-array-embedded UAV for search and rescue task

Nakadai, Kazuhiro; Nakadai, Kazuhiro; Kumon, Makoto; Okuno, Hiroshi G.; Hoshiba, Kotaro; Wakabayashi, Mizuho; Washizaki, Kai; Ishiki, Takahiro; Gabriel, Daniel; Bando, Yoshiaki; Morito, Takayuki; Kojima, Ryosuke; Sugiyama, Osamu

IEEE International Conference on Intelligent Robots and Systems 2017-Septemberp.5985 - 59902017/12-2017/12

DOIScopus

Detail

ISSN:21530858

Outline:© 2017 IEEE. This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV). In addition to sound source localization, sound source enhancement and robust communication method are also described. This system is one instance of deployment of our continuously developing open source software for robot audition called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University). To improve the robustness against outdoor acoustic noise, we propose to combine two sound source localization methods based on MUSIC (multiple signal classification) to cope with trade-off between latency and noise robustness. The standard Eigenvalue decomposition based MUSIC (SEVD-MUSIC) has smaller latency but less noise robustness, whereas the incremental generalized singular value decomposition based MUSIC (iGSVD-MUSIC) has higher noise robustness but larger latency. A UAV operator can use an appropriate method according to the situation. A sound enhancement method called online robust principal component analysis (ORPCA) enables the operator to detect a target sound source more easily. To improve the stability of wireless communication, and robustness of the UAV system against weather changes, we developed data compression based on free lossless audio codec (FLAC) extended to support a 16 ch audio data stream via UDP, and developed a water-resistant microphone array. The resulting system successfully worked in an outdoor search and rescue task in ImPACT Tough Robotics Challenge in November 2016.

Books And Publication

インターネット活用術,岩波科学ライブラリー44

奥乃博

岩波書店1996/11-

Detali

ISBN:4000065440

Computational Auditory Scene Analysis

David F. Rosenthal and Hiroshi G. Okuno (Eds)

Lawrence Erlbaum Associates1998/04-

Detali

ISBN:0805822836

Advanced Lisp Technology

Taiichi Yuasa and Hiroshi G. Okuno (Eds)

Taylor and Francis Publishers2002/05-

Detali

ISBN:1455778818

Work ・Software ・ Teaching Material ・ Field Work etc.

Research Grants & Projects

Grant-in-aids for Scientific Research Adoption Situation

Research Classification:

Deployment of Robot Audition Toward Understanding Real World

2012/-0-2017/-0

Allocation Class:¥218140000

Research Classification:

Music Co-player Robot that plays ensemble with people based on the frog's chorsu

2011/-0-2013/-0

Allocation Class:¥3640000

Research Classification:

Integrated research on augmentation and assisting technologies forauditory and speech functions

Allocation Class:¥48620000

Research Classification:

Development of Robot Audition based on Computational Auditory Scene Analysis

Allocation Class:¥119340000

Research Classification:

Study on Computational Auditory Scene Analysis for Humanoids by Active Audition

Allocation Class:¥51350000

Research Classification:

Automatic Transformation of GDA Document Tag and Development of Its Applications

Allocation Class:¥13300000

Research Classification:

Studies on intelligent information processings for robot systems integrated with environments

Allocation Class:¥41860000

Research Classification:

Musical Information Processing by using Sound Ontology

Allocation Class:¥9500000

Research Classification:

Complex systems approaches on the evolution of diversity in phenotypic plasticity and their ecological analyses and applications

2015/-0-2018/-0

Allocation Class:¥4680000

Lecture Course

Course TitleSchoolYearTerm
Embodiment InformaticsGraduate School of Fundamental Science and Engineering2019spring semester
Embodiment InformaticsGraduate School of Creative Science and Engineering2019spring semester
Embodiment InformaticsGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics AGraduate School of Fundamental Science and Engineering2019spring semester
Practice on Embodiment Informatics AGraduate School of Creative Science and Engineering2019spring semester
Practice on Embodiment Informatics AGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics BGraduate School of Fundamental Science and Engineering2019fall semester
Practice on Embodiment Informatics BGraduate School of Creative Science and Engineering2019fall semester
Practice on Embodiment Informatics BGraduate School of Advanced Science and Engineering2019fall semester
Practice on Embodiment Informatics CGraduate School of Fundamental Science and Engineering2019spring semester
Practice on Embodiment Informatics CGraduate School of Creative Science and Engineering2019spring semester
Practice on Embodiment Informatics CGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics DGraduate School of Fundamental Science and Engineering2019fall semester
Practice on Embodiment Informatics DGraduate School of Creative Science and Engineering2019fall semester
Practice on Embodiment Informatics DGraduate School of Advanced Science and Engineering2019fall semester
Practice on Embodiment Informatics EGraduate School of Fundamental Science and Engineering2019spring semester
Practice on Embodiment Informatics EGraduate School of Creative Science and Engineering2019spring semester
Practice on Embodiment Informatics EGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics FGraduate School of Fundamental Science and Engineering2019fall semester
Practice on Embodiment Informatics FGraduate School of Creative Science and Engineering2019fall semester
Practice on Embodiment Informatics FGraduate School of Advanced Science and Engineering2019fall semester
Practice on Embodiment Informatics GGraduate School of Fundamental Science and Engineering2019spring semester
Practice on Embodiment Informatics GGraduate School of Creative Science and Engineering2019spring semester
Practice on Embodiment Informatics GGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics HGraduate School of Fundamental Science and Engineering2019fall semester
Practice on Embodiment Informatics HGraduate School of Creative Science and Engineering2019fall semester
Practice on Embodiment Informatics HGraduate School of Advanced Science and Engineering2019fall semester
Practice on Embodiment Informatics IGraduate School of Fundamental Science and Engineering2019spring semester
Practice on Embodiment Informatics IGraduate School of Creative Science and Engineering2019spring semester
Practice on Embodiment Informatics IGraduate School of Advanced Science and Engineering2019spring semester
Practice on Embodiment Informatics JGraduate School of Fundamental Science and Engineering2019fall semester
Practice on Embodiment Informatics JGraduate School of Creative Science and Engineering2019fall semester
Practice on Embodiment Informatics JGraduate School of Advanced Science and Engineering2019fall semester
Intensive Seminar on Embodiment InformaticsGraduate School of Fundamental Science and Engineering2019full year
Intensive Seminar on Embodiment InformaticsGraduate School of Creative Science and Engineering2019full year
Intensive Seminar on Embodiment InformaticsGraduate School of Advanced Science and Engineering2019full year
On-Site Training in EnglishGraduate School of Fundamental Science and Engineering2019full year
On-Site Training in EnglishGraduate School of Creative Science and Engineering2019full year
On-Site Training in EnglishGraduate School of Advanced Science and Engineering2019full year
Overseas InternshipGraduate School of Fundamental Science and Engineering2019full year
Overseas InternshipGraduate School of Creative Science and Engineering2019full year
Overseas InternshipGraduate School of Advanced Science and Engineering2019full year
Master's Thesis (Department of Modern Mechanical Engineering)Graduate School of Creative Science and Engineering2019full year
Research on Human-Robot InterfaceGraduate School of Creative Science and Engineering2019full year
Research on Human-Robot InterfaceGraduate School of Creative Science and Engineering2019full year