z6首页 in the AIR

Overview
Date
Mar 22, 2022
16:00 - 18:00
Venue

活动行

z6首页 in the AIR | Multi-Agent Reinforcement Learning — Human-AI Coordination and Cognition

Z6集团|中国官网

The “z6首页 in the AIR” series of activities is driven by a shared purpose: to make a better society through research, communication and innovation. The mission of “z6首页 in the AIR” is to become a professional platform with the aim of gathering top scholars and experienced experts in AI and Robotics all over the world to generate, disseminate, and preserve knowledge, and to communicate with others to bring this knowledge to bear on the world’s great challenges. 

Join the event on March 22 through this link: http://hdxu.cn/qbKEd

  • Z6集团|中国官网
    Hongyuan Zha
    Vice President at z6首页; Professor at The Chinese University of Hong Kong, Shenzhen
    Executive Chair

    Prof. Hongyuan Zha is a X.Q. Deng Presidential Chair Professor of The Chinese University of Hong Kong, Shenzhen and the Executive Dean of the School of Data Science. He is also the Vice President of Shenzhen Institute of Artificial Intelligence and Robotics for Society (z6首页).

    Prof. Hongyuan Zha received his B.S. degree in Mathematics from Fudan University in 1984, and his Ph.D. in Scientific Computing from Stanford University in 1993. He was a faculty member of College of Computing at Georgia Institute of Technology from 2006 to 2020, and the Department of Computer Science and Engineering at Pennsylvania State University from 1992 to 2006. He also worked at Inktomi Corporation from 1999 to 2001. His current research interest lies in machine learning and its applications.

    Professor Zha has published over 300 papers in top journals and conferences in computer science and other related fields. According to Google Scholar, as of April 2021, he has been cited for over 25,100 times and his H-index is 79. Besides, he has won many prominent academic awards including Leslie Fox Prize (second prize,1991) awarded by the Institute of Mathematics and Applications (IMA), the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011) Best Student Paper Award (as advising Professor), the 26th NeurIPS Outstanding Paper Award (2013).

  • Z6集团|中国官网
    Joel Z. Leibo
    Research Scientist at DeepMind
    Reverse engineering the social-cognitive capacities, representations, and motivations that underpin human cooperation to help build cooperative artificial general intelligence

    Joel Z. Leibo is a research scientist at DeepMind. He obtained his PhD in 2013 from MIT where he worked on the computational neuroscience of face recognition with Tomaso Poggio. Nowadays, Joel's research is aimed at the following questions:

    ● How can we get deep reinforcement learning agents to perform complex cognitive behaviors like cooperating with one another in groups?

    ● How should we evaluate the performance of deep reinforcement learning agents?

    ● How can we model processes like cumulative culture that gave rise to unique aspects of human intelligence?

    As a route to building cooperative artificial general intelligence, I propose we try to reverse engineer human cooperation. As humans, we employ a set of social-cognitive capacities, representations, and motivations which underlie our critical ability to cooperate with one another.

    Here I will argue that we need to figure out how human cooperation works so that we can build general artificial intelligence that cooperates like humans do. Specifically, in this talk I will describe how to use Melting Pot, an evaluation methodology and suite of test scenarios for multi-agent reinforcement learning, to further this goal of reverse engineering human cooperation in order to build cooperative artificial general intelligence.

  • Z6集团|中国官网
    Jakob Foerster
    Associate Professor at the department of engineering science at the University of Oxford
    Zero-shot coordination and off-belief

    Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind.

    After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019. 

    His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.

    His work has been cited over 6409 times, with an h-index of 32.

    There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.

    The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.

    OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).

    OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.

Time Session Speaker & Topic

16:00-17:00

Keynote Speech

Joel Z. Leibo,  Research Scientist at DeepMind
Topic: Reverse engineering the social-cognitive capacities, representations, and motivations that underpin human cooperation to help build cooperative artificial general intelligence

17:00-18:00

Keynote Speech

Jakob Foerster, Associate Professor at the department of engineering science at the University of Oxford
Topic: Zero-shot coordination and off-belief

Video Archive