喻园管理论坛2022年第28期(总第797期)
演讲主题: Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management Problems: Enhancing Profits and Alleviating Bullwhip Effect
主 讲 人: 彭一杰,北京大学光华威尼斯欢乐娱人城·首页副教授
主 持 人: 李建斌,威尼斯欢乐娱人城·首页生产运作与物流管理系教授
活动时间: 2022年7月1日(周五)14:00-15:30
活动地点:管理大楼119室
主讲人简介:
北京大学光华威尼斯欢乐娱人城·首页副教授,博士生导师,国家级高层次人才项目获得者。北京大学人工智能研究院、国家健康医疗大数据研究院兼职研究员。主要研究方向包括仿真建模与优化、金融工程与风险管理、人工智能、健康医疗等。主持多项科研基金项目,包括国家青年科学基金项目,北京市青年骨干个人项目等。在《Operations Research》,《INFORMS Journal on Computing》和《IEEE Transactions on Automatic Control》等高质量期刊上发表学术论文20余篇。目前担任期刊《Asia-Pacific Journal of Operational Research》、《系统管理学报》与IEEE Control Systems Society 会议编委,中国运筹学会金融工程与金融风险管理分会常务理事,中国仿真协会人工社会专委会委员,中国人工智能协会社会计算分会理事,中国管理现代化研究会风险管理专业委员会委员,北京运筹学会副秘书长。
活动简介:
We apply Multi-Agent Deep Reinforcement Learning (MADRL) to inventory managementproblems with multiple echelons (multi-echelon inventory management problems) and evaluateMADRL’s performance for maximizing overall profits of supply chain. We also examine whether the informationsharing mechanism used in MADRL helps alleviate the Bullwhip Effect in supply chain. Methodology/results: We apply Heterogeneous-Agent Proximal Policy Optimization (HAPPO) on the PartiallyObservably Markov Games formulated from multi-echelon inventory management problems in serial supplychain and supply chain network. Our results show that policies constructed with HAPPO achieve higheroverall profits than policies constructed with single-agent deep reinforcement learning and state-of-the-artheuristic policies. Also, application of HAPPO results in less Bullwhip Effect than policies constructed withsingle-agent deep reinforcement learning where information is not shared among actors. Managerial implications:Our results buttress the empirical finding that information sharing inside the supply chain helpsalleviate Bullwhip Effect when decisions are not made by human beings but by policies constructed withMADRL. Our results also verify MADRL’s potential in solving various multi-echelon inventory managementproblems with complex supply chain structures and non-stationary environments.