Multimodal Data Recognition
Research Team

Research Summary

Our goal is comprehensive environment understanding around a robot. It includes object recognition, 3D scene recognition, and human activity recognition through signal processing / pattern recognition on multimodal sensory data. Especially, we focus on recognizing unknown events / objects.

Main Research Fields

Computer Vision
Robot Vision
Multimodal Recognition

Keywords

Object Recognition
Activity Recognition
Spatio-temporal Environmental Understanding
Perception of Unknown Event/Object
Scene Graph Generation

Research theme

Unknown Object Recognition
Recognition from a Skeleton Sequence
Scene Change Detection
Human Behavior Change Detection

Yasutomo Kawanishi

History

2006: Bachelor of Engineering, Kyoto University
2008: Master of Informatics, Kyoto University
2011: Ph.D Informatics, Kyoto University

Award

2009: Best Paper Award
2016: IEEE ITS Society Nagoya Chapter Young Researcher Award

Members

Motoharu Sonogashira: Research Scientist
Itthisak Phueaksri: Postdoctoral Researcher
Christiane Mietzsch: Special technical staff
Daiju Kanaoka: Research Associate
Shohei Nobuhara: Visiting Scientist
Tomohiro Fujita: Visiting Scientist
Tingwei Liu: Junior Research Associate and Student Trainee
Akira Kohjin: Administrative Part-time Worker I and Student Trainee
Taiyo Tamaki: Research Part-time orker II
Da Huo: Student Trainee
Nguyen Trung Thanh: Student Trainee
Yuga Yano: Student Trainee
Hirakawa Hayato: Student Trainee
Ziqi Li: Student Trainee
Wang Juan: Student Trainee
Yu Xinmeng: Student Trainee
Ting-Ru LIU: Student Trainee
Tri Duc Tran: Student Trainee
Tsung-Chih Chiang: Student Trainee

Former member

Vijay John: Research Scientist(2021/09-2025/08)
Yu-chen Lai: Student Trainee(2024/06-2025/1)
Hao-yu Hou: Student Trainee(2024/06-2025/1)
Jia-yi Chen: Student Trainee(2024/06-2025/1)
Yo-Hsin Fang: Student Trainee(2024/05-2024/10)
Diego Hernandez Rodriguez: Student Trainee(2023/06～2025/03)
Ozaki Airi: Student Trainee(2024/07～2025/03)
Murakawa Toshikazu: Student Trainee(2024/07～2025/03)
Hiei Satoshi: Student Trainee(2024/07～2025/03)
Yamada Shion: Student Trainee(2024/07～2025/03)
Joy Battocchio: Research Intern(2023/09-2023/10)
Hayato Yumiya: Research Intern(2021/07-2021/08)
Masaya Mizuno: Research Intern(2021/08-2021/09)
Thomas Reolon: Research Intern(2022/12-2023/01)
Kotaro Fujishiro: Research Intern(2023/9)
Haruto Kugo: Research Intern(2023/9)
Daijiro Suzuki: Research Intern(2023/9)

Research results

Unknown object recognition and description

When we humans see an unknown object, we can recognize it as some kind of object even if we don't know what it is. We also describe the relationship with other objects, e.g., an unknown object is on the table and besides the laptop.

On the other hand, robots can only detect objects that their object detectors have learned about and cannot estimate the relationship with other objects. Our team is researching the topic, "object recognition including unknown objects and relationship estimation".

The recognition problem including unknown objects is called the open-set recognition problem, which has recently attracted much attention in the computer vision field. On the other hand, the problem of recognizing relations among objects and describing them in a graph structure is called scene graph generation (SGG). Our team has named the problem of describing a scene containing unknown objects in a graph structure as open-set scene graph generation (Open-set SGG).

We have formulated the problem setup, proposed experimental protocols and evaluation metrics, and proposed a baseline method of the problem.

Human pose prediction from short time observations

Observing a person's activities and predicting the person's current state and future pose a few seconds later are important for many applications, such as proactive support by robots. Our team is working on predicting a person's future poses by observing the short-term behavior of the person.

Recent development in pose estimation techniques has led to many studies on human behavior using a sequence of human skeletons. Sequences of human skeletons are often considered a graph; vertices have locations of body joints, and edges represent the connectivity of body joints. Thus, graph-convolution-based methods have been proposed. However, some of the body motions cannot be distinguished only from the skeleton sequence in the future pose estimation task. In our study, we have proposed a method to predict future motions by using additional information, such as human surroundings.

Selected Publications

Yasutomo Kawanishi, Hitoshi Nishimura, Hiroshi Murase
“Human Pose Estimation from an Extremely Low-Resolution Image Sequence by Pose Transition Embedding Network”
Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, (2025).
Diego Hernández Rodríguez, Motoharu Sonogashira, Kazuya Kitano, Yuki Fujimura, Takuya Funatomi, Yasuhiro Mukaigawa, Yasutomo Kawanishi
“An Event Camera Simulator for Arbitrary Viewpoints based on Neural Radiance Fields”
Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, (2025).
Yasutomo Kawanishi, Yutaka Nakamura, Taiken Shintani, Carlos T. Ishi, Seiya Kawano, Koichiro Yoshino, Takashi Minato, Michihiko Minoh
“RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-world Observations”
The 31st International Conference on Multimedia Modeling, (2025). (Best Demo Honorable Mention)
Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“Towards Visual Storytelling by Understanding Narrative Context through Scene-Graphs”
Proceedings of the 31st International Conference on Multimedia Modeling, (2025)
Vijay John, Yasutomo Kawanishi
“Generating Pseudo-Strong Labels from Weak Labels for Multi-Source Sound Event Detection”
Proceedings of the 27th International Conference on Pattern Recognition, pp.98-113, (2024)
Tomohiro Fujita, Yasutomo Kawanishi
“Recurrent Graph Convolutional Network for Sequential Pose Prediction from 3D Human Skeleton Sequence”
Proceedings of the 27th International Conference on Pattern Recognition, pp. 342-358, (2024)
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
”Action Selection Learning for Multi-label Multi-view Action Recognition”
Proceedings of the ACM Multimedia Asia 2024, (2024)
Vijay John, Yasutomo Kawanishi
”Generating Pseudo-Strong Labels from Weak Labels for Multi-Source Sound Event Detection”
Proceedilngs of the 27th International Conference on Pattern Recognition, (2024)
Tomohiro Fujita, Yasutomo Kawanishi
”Recurrent Graph Convolutional Network for Sequential Pose Prediction from 3D Human Skeleton Sequence”
Proceedings of the 27th International Conference on Pattern Recognition, (2024)
Akira Kohjin, Motoharu Sonogashira, Masaaki Iiyama, Yasutomo Kawanishi
”Incremental Learning for Panoptic Lifting with Camera Viewpoints Selection”
Proceedings of the 21st International Conference on Automation Technology (Automation2024), (2024).
Motoharu Sonogashira, Masaaki Iiyama, Yasutomo Kawanishi
“Relationship-Aware Unknown Object Detection for Open-Set Scene Graph Generation”
IEEE Access, vol.12, pp.122513 - 122523, (2024) (open access).
植田暢大, 波部英子, 松井陽子, 湯口彰重, 河野誠也, 川西康友, 黒橋禎夫, 吉野幸一郎
“J-CRe3：実世界における参照関係解決のための日本語対話データセット”
自然言語処理, vol. 31, no. 3, (2024) (open access).
Vijay John, Yasutomo Kawanishi
“Frame-Level Latent Embedding using Weak Labels for Multi-view Action Recognition”
IEEE International Conference on Multimedia Information Processing and Retrieval, (2024).
Tingwei Liu, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association”
CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling, In conjunction with Computer Vision and Pattern Recognition 2024, (2024).
Yoshimitsu Kajiwara, Wanwan Zheng, Yasutomo Kawanishi
“Iconographic analysis of ancient roof tiles using a data science approach”
The Indonesian Journal of Social Studies, vol. 7, no. 2, pp.41-49, (2024) (open access).
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“One-stage open-vocabulary temporal action detection leveraging temporal multi-scale and action label features”
Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, (2024).
Shun Inadumi, Seiya Kawano, Akishige Yuguchi, Yasutomo Kawanishi, Koichiro Yoshino
“A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions”
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, (2024).
Nobuhiro Ueda, Hideko Habe, Akishige Yuguchi, Seiya Kawano, Yasutomo Kawanishi, Sadao Kurohashi, Koichiro Yoshino
“J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution”
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, (2024).
Yukinori Kawae, Yasutomo Kawanishi, Ichiroh Kanaya, Yoshihiro Yasumuro
“3D Survey of the Menkaure Pyramid”
Virtual Annual Meeting, American Research Center in Egypt, (2024).
Trung Thanh Nguyen, Phi Le Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“Zero-Shot Pill-Prescription Matching With Graph Convolutional Network and Contrastive Learning”
IEEE Access, vol. 12, pp. 55889-55904, (2024) (open access).
畑隆聖, 出口大輔, 平山高嗣, 川西康友, 村瀬洋
“Eye-contact Transformer: シーンコンテキストを考慮した遠方歩行者のアイコンタクト検出”
電子情報通信学会論文誌, Vol.J107-D, No.04, pp.231-242, (2024).
Chihaya Matsuhira, Marc Aurel Kastner, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, Ichiro Ide
“Interpolating the Text-to-Image Correspondence Based on Phonetic and Phonological Similarities for Nonword-to-Image Generation”
IEEE Access, vol.12, pp.41299 -41316, (2024) (open access).
Masaya Mizuno, Tomohiro Fujita, Yasutomo Kawanishi, Daisuke Deguchi, Hiroshi Murase
“Subjective Baggage-Weight Estimation based on Human Walking Behavior”
IEEE Access, Vol. 12, pp. 39390 - 39398, (2024) (open access)
Hiroki Tatemichi, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase
“Category-level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-stage Shape Reconstructor”
IEEE Access, vol. 12, pp. 33440-33448, (2024) (open access).
Naoya Kawamura, Wataru Sato, Koh Shimokawa, Tomohiro Fujita, Yasutomo Kawanishi
“Machine learning-based interpretable modeling for subjective emotional dynamics sensing using facial EMG”
Sensors, vol. 24, no. 5, 1536, (2024) (open access).
Angel Garcia Contreras, Seiya Kawano, Yasutomo Kawanishi, Yutaka Nakamura, Saito Satoru, Koichiro Yoshino
“Examining the Impact of a Forgetful Multi-store Memory System in a Cognitive Assistive Robot”
The 14th International Workshop on Spoken Dialogue Systems Technology, (2024).
Hiroto Murakami, Jialei Chen, Daisuke Deguchi, Takatsugu Hirayama, Yasutomo Kawanishi, Hiroshi Murase
“Pedestrian's Gaze Object Detection in Traffic Scene”
Proceedings of the 19th International Conference on Computer Vision Theory and Applications (VISAPP), (2024).
Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“Image-Collection Summarization Using Scene-Graph Generation With External Knowledge”
IEEE Access, vol.12, pp. 17499 - 17512, (2024) (open access)
Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
“An Approach to Generate a Caption for an Image Collection Using Scene Graph Generation”
IEEE Access, vol.11, pp. 128245 - 128260, (2023) (open access)
Daiju Kanaoka, Hakaru Tamukoh, Motoharu Sonogashira, Yasutomo Kawanishi
“ManifoldNeRF: View-dependent Image Feature Supervision for Few-shot Neural Radiance Fields”
In Proceedings of the 34th British Machine Vision Conference, (2023)
Shu Nakamura, Yasutomo Kawanishi, Shohei Nobuhara, Ko Nishino
“DeePoint: Visual Pointing Recognition and Direction Estimation”
In Proceedings of the 19th International Conference on Computer Vision, (2023)
Tomohiro Fujita, Yasutomo Kawanishi
“Human Pose Prediction by Progressive Generation in Multi-scale Frequency Domain”
In Proceedings of the 18th International Conference on Machine Vision Applications, (2023)
Vijay John, Yasutomo Kawanishi
“Combining Knowledge Distillation and Transfer Learning for Sensor Fusion in Visible and Thermal Camera-based Person Classification”
In Proceedings of the 18th International Conference on Machine Vision Applications, (2023)
Vijay John, Yasutomo Kawanishi
“Multimodal Cascaded Framework with Metric Learning Robust to Missing Modalities for Person Classification”
In Proceedings of the 14th ACM Multimedia Systems Conference, (2023) (open access)
Vijay John, Yasutomo Kawanishi
"Progressive Learning of a Multimodal Classifier Accounting for Different Modality Combinations"
Sensors 2023, 23(10), 4666 (2023) (open access)
Masaya Mizuno, Tomohiro Fujita, Yasutomo Kawanishi, Daisuke Deguchi, Hiroshi Murase
"Subjective Baggage-Weight Estimation from Gait ---Can you estimate how heavy the person feels?---"
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), (2023)
Hayato Yumiya, Yasutomo Kawanishi, Daisuke Deguchi, Hiroshi Murase
"End-to-End Gaze Grounding of a Person Pictured from Behind"
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), (2023)
Tomohiro Fujita, Yasutomo Kawanishi
"Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation"
Sensors 2023, 23(2), 876 (2023) (open access)
Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
"Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach"
In Proceedings of the 29th International Conference on MultiMedia Modeling (2023)
Vijay John, Yasutomo Kawanishi
"Audio-Visual Sensor Fusion Framework using Person Attributes Robust to Missing Visual Modality for Person Recognition"
In Proceedings of the 29th International Conference on MultiMedia Modeling (2023)
Jiaxin Li, Yasutomo Kawanishi, Daisuke Deguchi, Hiroshi Murase
"A Preliminary Study on View Independent Panoptic Scene Change Detection"
In proceedings of the 2023 International Workshop on Advanced Image Technology (2023)
Vijay John, Yasutomo Kawanishi
"A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition"
In proceedings of the ACM Multimedia Asia 2022 (2022)
Yasutomo Kawanishi, Ichiro Ide, Baidong Chu, Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Daisuke Deguchi
"Detection of Birds in a 3D Environment Referring to Audio-Visual Information"
In Proceedings of the 18th IEEE International Conference on Advanced Video and Signal-based Surveillance (2022)
Vijay John, Yasutomo Kawanishi
"Audio and Video-Based Emotion Recognition Using Multimodal Transformers"
In Proceedings of the 26th International Conference on Pattern Recognition (2022).
Yasutomo Kawanishi
"Label-Based Multiple Object Ensemble Tracking with Randomized Frame Dropping"
In Proceedings of the 26th International Conference on Pattern Recognition (2022).
Tomohiro Fujita, Yasutomo Kawanishi
"Toward Surroundings-aware Temporal Prediction of 3D Human Skeleton Sequence"
In Proceedings of the 26th ICPR Workshop: Towards a Complete Analysis of People: From Face and Body to Clothes (2022).
Motoharu Sonogashira, Masaaki Iiyama, Yasutomo Kawanishi,
"Towards Open-Set Scene Graph Generation with Unknown Objects"
IEEE Access, Vol.10, pp.11574-11583 (2022) ( open access )
Mahmud Dwi Sulistiyo, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Takatsugu Hirayama, Hiroshi Murase.:
"ColAtt-Net: In Reducing the Ambiguity of Pedestrian Orientations on Attribute-aware Semantic Segmentation Task"
IEEJ Transactions on Electronics, Information and Systems, Vol. 16, Issue 2, (2021).
Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase.:
"Ω-GAN: Object Manifold Embedding GAN for Image Generation by Disentangling Parameters into Pose and Shape Manifolds"
In Proceedings of the International 25th International Conference on Pattern Recognition (2020).
Hiroki Tatemichi, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase.:
"Median-shape Representation Learning for Category-level Object Pose Estimation in Cluttered Environments"
In Proceedings of the International 25th International Conference on Pattern Recognition (2020).
Saki Iwata, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase.:
"LFIR2Pose: Pose Estimation from an Extremely Low-Resolution FIR Image Sequence"
In Proceedings of the International 25th International Conference on Pattern Recognition (2020).
Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase.:
"Multiple Human Tracking with Alternately Updating Trajectories and Multi-Frame Action Features"
ITE Transactions on Media Technology and Applications, Vol. 8, No.4, pp. 269-279, (2020).
Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase.:
"Multiple Human Tracking using an Omnidirectional Camera with Local Rectification and World Coordinates Representation"
IEICE Transactions on Information and Systems, Vol. E103-D, No. 6, pp.1745-1361, (2020).
Naoki Nishida, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Jun Piao.:
"SOANets: Encoder-Decoder based Skeleton Orientation Alignment Network for White Cane User Recognition from 2D Human Skeleton Sequence"
In Proceedings of the 15th International Conference on Computer Vision Theory and Applications, pp. 435-443, 2020.
Yasutomo Kawanishi, Hiroshi Murase, Jianfeng Xu, Kazuyuki Tasaka, Hiromasa Yanagihara.:
"Which Content is he/she Reading? --Reading Content Estimation using an Indoor Surveillance Camera--"
In Proceedings of the 24th International Conference on Pattern Recognition, pp. 1731-1736, (2018).
Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase.:
"Trajectory Ensemble: Multiple Persons Consensus Tracking across Non-overlapping Multiple Cameras over Randomly Dropped Camera Networks"
In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 56-62, (2017).
Brahmastro Kresnaraman, Yasutomo Kawanishi, Daisuke Deguchi, Tomokazu Takahashi, Yoshito Mekada, Ichiro Ide, Hiroshi Murase.:
"Human Wearable Attribute Recognition using Probability-Map-based Decomposition of Thermal Infrared Images"
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E100-A Issue 3, pp.854-864, (2017).

Links

Yasutomo Kawanishi
Multimodal Data Recognition Research Team(RIKEN)

Contact Information

yasutomo.kawanishi [at] riken.jp

Multimodal Data RecognitionResearch Team