Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind)

About the event

Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. This event will take place on 17th of February at 3pm.

Talk Abstract

Dialogue models are becoming more prevalent and capable, however there remain serious challenges with reliability, biased and toxic output and exploitability. How can we make effective use of human judgement to avoid these failures and ensure dialogue models are behaving in line with human preferences? For example techniques like Reinforcement Learning from Human Feedback (RLHF). And could the dialogue setting play an important role in achieving longer term AI Safety goals?

Speaker Bio

Rory Greig is a Research Engineer at DeepMind working on working on AI safety. He currently works on the Scalable Alignment Team, working on aligning Large Language Models with human preferences.