AI, ML and Friends is a weekly seminar series within the School of Computing on Artificial Intelligence, Machine Learning, and related topics. We are open to attendees and presenters external to the school. Please sign up to the mailing list to receive weekly announcements including zoom details, and email the seminar organiser to schedule a talk.
Upcoming Seminars #
02 February 2023, 11:00 #
Robust Human Action Modelling #
Speaker: Lei Wang
Abstract: Human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. We revive the use of old-fashioned handcrafted video representations for action recognition, e.g., IDT-based BoW/FV representations, and put new life into these techniques via a CNN-based hallucination step. We also design and hallucinate two costly but powerful descriptors, one leveraging four popular object detectors applied to training videos, and the other leveraging image- and video-level saliency detectors. These hallucination-based models are built on the concept of self-supervision by taking RGB frames as input to learn to predict both action concepts and auxiliary descriptors which leads to the state-of-the-art performance for video-based action recognition. For skeleton-based action recognition, inspired by Dynamic Time Warping (DTW) and its differentiable variant soft-DTW in matching pairs of sequences, we (i) introduce the uncertainty-DTW, dubbed as uDTW, whose role is to take into account the uncertainty of in frame-wise (or block-wise) features by selecting the path which maximizes the Maximum Likelihood Estimation (MLE) (ii) propose an advanced variant of DTW which jointly models each smooth path between the query and support frames of human skeleton sequences to achieve simultaneously the best alignment in the temporal and simulated camera viewpoint spaces for end-to-end learning under the limited few-shot training data. These two alignment methods are applied to few-shot skeletal action recognition. As human actions in video sequences are characterized by the complex interplay between spatial features and their temporal dynamics, we (i) propose tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition (ii) form hypergraph to model hyper-edges between graph nodes (which help capture higher-order motion patterns of groups of body joints) and later such embeddings of hyper-edges of different orders are fused through our Multi-order Multi-mode Transformer (3Mformer) which is able to achieve joint-mode attention on joint-mode tokens. These two models yield state-of-the-art results compared to GCN-, transformer- and existing hypergraph-based counterparts for skeletal action recognition.
Bio: Lei Wang, https://cecc.anu.edu.au/people/lei-wang, received the M.E. degree in software engineering from University of Western Australia (UWA), Australia, in 2018. He is currently pursuing a PhD degree with Australian National University (ANU) and Data61/CSIRO. Since 2018, he has been a full-time Computer Vision Researcher with iCetana Pty Ltd. He was a Visiting Researcher in Machine Learning Research Group at Data61/CSIRO (former NICTA). He was also a Visiting Researcher with the Department of Computer Science and Software Engineering, UWA. His research interests include action recognition in videos, anomaly detection, video image processing, time series and sequences. He is an IEEE Student Member and an ACM Student Member.
Where: Building 145, Room 3.41