概述
日期
2023年08月18日
09:00 - 10:00
地址
活动杏注Bilibili

Warm-Start Reinforcement Learning: From Function Approximation Error to Sub-optimality Gap

Z6集团|中国官网

针对强化进建(Reinforcement Learning,,,,, ,,RL)较高的采样复杂度和推算负荷的问题,,,,, ,,热启动强化进建(Warm-Start RL)正成为一种有前途的新范式。。。。。。。 。热启动强化进建的根基思想是通过离线训练初始战术来加快在线进建。。。。。。。 。目前,,,,, ,,热启动强化进建已成功利用于AlphaZero和ChatGPT,,,,, ,,这些利用展示了热启动战术在加快在线进建方面的巨大潜力。。。。。。。 。为了深刻理解热启动强化进建,,,,, ,,钻研量化函数逼近误差对热启动强化进建次优差距的影响是至关沉要的。。。。。。。 。

第九期 IEEE TNSE 卓越讲座系列活动,,,,, ,,我们有幸约请到加州大学戴维斯分校的Junshan Zhang教授介绍热启动强化进建,,,,, ,,并分享他在这个领域内的有关钻研成就与有趣发现。。。。。。。 。

z6首页-TNSE Joint Distinguished Seminar Series is co-sponsored by IEEE Transactions on Network Science and Engineering (TNSE) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (z6首页), with joint support from The Chinese University of Hong Kong, Shenzhen, Network Communication and Economics Laboratory (NCEL), and IEEE. This series aims to bring together top international experts and scholars in the field of network science and engineering to share cutting-edge scientific and technological achievements.

Join the seminar on August 18 through Bilibili (http://live.bilibili.com/22587709).

  • Z6集团|中国官网
    Jianwei Huang
    Vice President, z6首页; Presidential Chair Professor, CUHK-Shenzhen; Editor-in-Chief, IEEE TNSE; IEEE Fellow; AAIA Fellow
    Executive Chair
  • Z6集团|中国官网
    Junshan Zhang
    加州大学戴维斯分校电子与推算机工程系教授、IEEE Fellow
    Warm-Start Reinforcement Learning: From Function Approximation Error to Sub-optimality Gap

    Junshan Zhang,,,,, ,,加州大学戴维斯分校电子与推算机工程系教授,,,,, ,,2000年于普渡大学获得博士学位,,,,, ,,2000 年至 2021 年于亚利桑那州立大学任教。。。。。。。 。他的钻研方向涉及信息网络和数据科学,,,,, ,,蕴含边缘推算人为智能、强化进建、持续进建、网络优化与节造、博弈论,,,,, ,,以及这些技术在互联和自动驾驶汽车、5G 及更高技术、无线网络、物联网 (IoT) 和智能电网中的利用。。。。。。。 。Junshan Zhang教授是 IEEE 会士,,,,, ,,2005 年荣获 ONR 青年钻研员奖,,,,, ,,2003 年荣获 NSF 职业奖,,,,, ,,2016 年荣获 IEEE 无线通讯技术委员会认可奖。。。。。。。 。他的论文曾获得多项奖项,,,,, ,,蕴含WiOPT 2018最佳学生论文、ACM SIGMETRICS/IFIP Performance 2016 Kenneth C. Sevcik卓越学生论文奖、IEEE INFOCOM 2009和IEEE INFOCOM 2014最佳论文亚军奖、IEEE ICC 2008和2017最佳论文奖。。。。。。。 ;;;;;谒淖暄谐删,,,,, ,,他于2015年共同缔造了Smartiply公司,,,,, ,,这是一家边缘推算草创公司,,,,, ,,为物联网利用提供加强的网络衔接和嵌入式人为智能。。。。。。。 。

    Conventional reinforcement learning (RL) techniques face the formidable challenge of high sample complexity and intensive computation load, which hinders RL's applicability in real-world tasks. To tackle this challenge, Warm-Start RL is emerging as a promising new paradigm, with the basic idea being to accelerate online learning by starting with an initial policy trained offline. Indeed, owing to the knowledge transfer from an initial policy, Warm-Start RL has been successfully applied in AlphaZero and ChatGPT, demonstrating its great potential to speed up online learning. Despite these remarkable successes, a fundamental understanding of Warm-Start RL is lacking. The primary objective of this study is to quantify the impact of function approximation errors on the sub-optimality gap for Warm-Start RL. We consider the widely used ‘Actor-Critic’ method for RL. For the unbiased case, we give sufficient conditions on the question ‘how good the warm-start policy needs to be’ to achieve fast convergence. For the biased case, our findings reveal that a ‘good’ warm-start policy (obtained by offline training) may be insufficient, and bias reduction in online learning also plays an essential role to lower the suboptimality gap. We then investigate bias reduction using adaptive ensemble learning and planning.