Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind) About the event Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. This event will take place on 17th of February at 3pm. Talk Abstract Dialogue models are becoming more prevalent and capable, however there remain serious challenges with reliability, biased and toxic output and exploitability. How can we make effective use of human judgement to avoid these failures and ensure dialogue models are behaving in line with human preferences? For example techniques like Reinforcement Learning from Human Feedback (RLHF). And could the dialogue setting play an important role in achieving longer term AI Safety goals? Speaker Bio Rory Greig is a Research Engineer at DeepMind working on working on AI safety. He currently works on the Scalable Alignment Team, working on aligning Large Language Models with human preferences. Feb 17 2023 00.00 - 23.59 Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind) Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. Attendance is free. The event will take place in G.03, Bayes Centre, The University of Edinburgh 47 Potterrow Edinburgh EH8 9BT Registration
Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind) About the event Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. This event will take place on 17th of February at 3pm. Talk Abstract Dialogue models are becoming more prevalent and capable, however there remain serious challenges with reliability, biased and toxic output and exploitability. How can we make effective use of human judgement to avoid these failures and ensure dialogue models are behaving in line with human preferences? For example techniques like Reinforcement Learning from Human Feedback (RLHF). And could the dialogue setting play an important role in achieving longer term AI Safety goals? Speaker Bio Rory Greig is a Research Engineer at DeepMind working on working on AI safety. He currently works on the Scalable Alignment Team, working on aligning Large Language Models with human preferences. Feb 17 2023 00.00 - 23.59 Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind) Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. Attendance is free. The event will take place in G.03, Bayes Centre, The University of Edinburgh 47 Potterrow Edinburgh EH8 9BT Registration
Feb 17 2023 00.00 - 23.59 Talk: Building Safer Dialogue Agents via Targeted Human Judgements - Rory Greig (DeepMind) Join AI Safety Hub Edinburgh (AISHED) for a talk by Rory Greig about his team's work on the Sparrow model a dialogue agent trained to be more helpful, correct, and harmless. Attendance is free.