我能在完成项目后从中下载作品吗？

是，您可以从课程中下载并保留您创建的任何文件。为此，请先确保您已将所有文件和工作保存到您的设备，然后再退出产品环境。

我需要具备多少经验才能做这个项目？

在页面顶部，您可以查看为此课程推荐的经验级别。

我能直接通过 Web 浏览器来完成此项目，而不必安装特殊软件吗？

是，您在浏览器中即可获得完成课程所需的一切。

Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback

位教师：Nikita Namjoshi

3,388 人已注册

项目

通过分步说明，培养热门的工作技能

4.7

（33 条评论）

中级等级

推荐体验

1 hour

自行安排学习进度

实践学习

了解更多

项目

通过分步说明，培养热门的工作技能

4.7

（33 条评论）

中级等级

推荐体验

1 hour

自行安排学习进度

实践学习

了解更多

您将学到什么

Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique.
Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library.
Evaluate tuned model performance against the base model with evaluation methods.

您将练习的技能

要了解的详细信息

授课语言：英语（English）

无需下载或安装

仅桌面可用

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

在 2 小时内学习、练习并应用岗位必备技能

接受行业专家的培训
获得解决实训工作任务的实践经验

关于此项目

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences.

Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.