Michael Mistry – School of Informatics Image Bayes Innovation Fellow: Michael Mistry What is your research focus? Industries such as manufacturing and fulfilment have employed robotic manipulation in limited ways. There is also increasing adoption in domains such as construction, agriculture and recycling. However, contact rich tasks like material handling, picking, packing, and assembly remain a challenge due to the requirements for advanced sensing and control. These tasks require contact forces to be explicitly or implicitly controlled to prevent damage to the robot, their environments and especially humans. Ideally such manipulation requires precise sensing (e.g. tactile or force sensing) at the point of contact and/or high-speed cameras recording the movement and deformation of objects at a rate fast enough for the robot to react. Such sensing demands come at significant cost: not only as an increased bill of materials but also the additional burdens of weight, space, cabling, data processing, and network traffic. Moreover, additional sensors add potential points of failure to an already complex system. What is your innovation idea? My vision is to make contact-rich robotic manipulation faster, cheaper and more reliable by leveraging research in data-driven predictive modelling and control. The recent explosion of deep learning has demonstrated that essential features may be automatically extracted from data, particularly for visual tasks like image classification. However, in contact-rich tasks, it is often the correlation between multiple sensory modes (vision, tactile, kinematics) where the most salient information lies. Thus, we employ models that learn a compact, action-orientated latent representation of multi-modal input. The models will be trained to reproduce the rich sensory consequences of robot action, even when using a reduced (lower-cost) sensor suite. These models can then be used to detect anomalies in real-time, pre-empt failures and automatically annotate outcomes. The models will also be flexible to adapt online as processes change and share their knowledge amongst a confederation of deployments. Why does this matter? Initial data collection and training of such models may be expensive. For an existing process I imagine installing additional (possibly redundant) sensors including cameras, tactile, force/torque and microphones. However, once trained, the compact latent representation will help determine the minimal viable sensing suite, yielding considerable cost-savings in deployment. Moreover, the models may serve as a digital-twin (especially if purposely trained to do so), allowing for simulation and low- cost experimentation of alternative configurations or integrations. What is the future of your research? The promise of our technology is to unlock cost savings by automating challenging contact-rich tasks, while minimising sensory requirements, providing condition monitoring and data visualisation, and continuous improvement over time. This article was published on 2024-09-30