喻园管理论坛2024年第55期(总第987期)
演讲主题: Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance
主 讲 人: 樊卫国,爱荷华大学讲座教授
主 持 人: 邓朝华,信息管理与数据科学系教授
活动时间: 2024年5月25日(周六)14:30-16:00
活动地点: 管院大楼406室
主讲人简介:
樊卫国,爱荷华大学讲座教授,主要研究方向包括人工智能、信息检索、数据挖掘、文本分析、商务智能技术、社交媒体分析等,在MIS Quarterly, Information Systems Research, Journal of Management Information Systems, Productions and Operations Management, INFORMS Journal on Computing, IEEE Transactions on Knowledge and Data Engineering, Information Systems, Information Systems Journal, Communications of the ACM, Information and Management, Journal of the American Society on Information Science and Technology, Information Processing and Management, Decision Support Systems等顶级学术期刊和会议发表论文300余篇,其中7篇被Web of Science指定为ESI高被引论文。截至2024年5月5日,其研究成果的谷歌引用次数超过18000次(h-index=64, i10-index=184)。樊卫国教授经常为世界500强公司提供咨询,并担任许多政府和资助机构的评审专家,在学术界和实业界享有广泛的影响力。
活动简介:
Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing task generalizability. Despite its advancements, current methods using trigger phrases such as “Let's think step by step” remain limited. This study introduces PRoMPTed, an approach that optimizes the zero-shot prompts for individual task instances following an innovative manner of “LLMs in the loop”. Our comprehensive evaluation across 13 datasets and 10 task types based on GPT-4 reveals that PRoMPTed significantly outperforms both the naive zero-shot approaches and a strong baseline (i.e., “Output Refinement”) which refines the task output instead of the input prompt. Our experimental results also confirmed the generalization of this advantage to the relatively weaker GPT-3.5. Even more intriguingly, we found that leveraging GPT-3.5 to rewrite prompts for the stronger GPT-4 not only matches but occasionally exceeds the efficacy of using GPT-4 as the prompt rewriter. Our research thus presents a huge value in not only enhancing zero-shot LLM performance but also potentially enabling supervising LLMs with their weaker counterparts, a capability attracting much interest recently.