From f38824960d99a1b9742ce91dd7d1793356b3ca62 Mon Sep 17 00:00:00 2001 From: Leon Ericsson Date: Wed, 6 Dec 2023 20:56:48 +0100 Subject: [PATCH] post and about update --- components/AboutCard/index.js | 29 +++++++++++++---------------- posts/2023-11-16-rr.md | 2 ++ posts/2023-11-30-cai.md | 27 +++++++++++++++++++++++++++ 3 files changed, 42 insertions(+), 16 deletions(-) create mode 100644 posts/2023-11-30-cai.md diff --git a/components/AboutCard/index.js b/components/AboutCard/index.js index d4b45bb..e109562 100644 --- a/components/AboutCard/index.js +++ b/components/AboutCard/index.js @@ -5,23 +5,20 @@ function AboutCard() { return (

- My name is Leon Ericsson. I recently earned a Master's Degree in Machine - Learning and have relocated to Stockholm with my sambo—a Swedish - term that roughly translates to 'cohabitant' (ugh) in English. When I'm - not researching I enjoy common cliches such as cooking, working out and + my name is leon ericsson. i'm a recent grad with a master's degree in machine + learning living in stockholm with my sambo. + when i'm not working i enjoy unique cliches; cooking, the gym and traveling. -

-

-

- In the professional realm, I work as a Research Engineer. My interests - span a wide range of topics, but I mostly focus on{" "} - foundational models, policy learning,{" "} - computer vision, and medical AI. If you're curious about - my current research interests, the best way to find out is to explore my - blog. I hacked this website over a weekend as a platform to share - summaries and thoughts on the latest (sometimes seminal) research my - areas of focus. While primarily for my own edification, I hope it offers - valuable insights to anyone interested in these rapidly evolving fields. +

+

+

+ professionally, i'm a research engineer. my interests are scattered, + feels like i find something new every other week, but broadly i'd say + they fall into {" "} foundational models, policy learning,{" "} + computer vision, and medical ai. if you're curious about + my current research interests, skim my blog. i hacked this website as a platform to share + and document my thoughts on research that i come across. while primarily for my own edification, + hopefully there's something here that proves insightful to you.

); diff --git a/posts/2023-11-16-rr.md b/posts/2023-11-16-rr.md index 227b6f5..217d495 100644 --- a/posts/2023-11-16-rr.md +++ b/posts/2023-11-16-rr.md @@ -4,6 +4,8 @@ title: "Reading Roundup" categories: [] year: 2023 type: paper +author: +exturl: --- ## The Reversal Curse: A Stark Reflection on LLMs' Limitations in Reasoning diff --git a/posts/2023-11-30-cai.md b/posts/2023-11-30-cai.md new file mode 100644 index 0000000..3ca68f2 --- /dev/null +++ b/posts/2023-11-30-cai.md @@ -0,0 +1,27 @@ +--- +layout: post +title: "Constitutional AI: Harmlessness from AI Feedback" +categories: [NLP] +year: 2022 +type: paper +author: Bai +exturl: https://arxiv.org/abs/2212.08073 +--- + +RLHF has, despite much skepticism, proven pivotal in accelerating state of the art dialogue based language models. The question on everyone's mind lately has been how this scales in the long term. It's generally agreed upon that we've exahusted almost all high quality tokens available on the internet and the next frontier that's rising is the field of synthetic data. If we're going to scale models far beyond what they're at today, we're also going to need to scale alignment which is already an expensive process. Human feedback lacks ability to scale to magnitudes beyond today's horizons and as a result, researchers have looked to ways of cutting out the need for human labels. When it comes to synthetic data, Anthropic stands out as the most prominent player. They've been persistent in working without humans for quite some time and today I'd like to take a deeper dive into a method they coin Constitutional AI. A method to train AI systems in being helpful, honest, and harmless without human supervision; governed entirely through the specification of a short list of principles or instructions, i.e. a constitution. The motivations behind their work was: + +1. Study the possibility of using AI systems to help supervise other AIs, thus *scaling supervision*. +2. Improve upon prior work in training harmless AI assistants by *eliminating evasive responses*, reducing tension between helpfulness and harmlessness. +3. Make the principles governing AI behavior more transparent. + +# The Constitutional AI (CAI) approach +CAI is an extreme form of scaled supervision - techniques that leverage AI to help humans efficiently supervise AI - that relies on a set of guiding principles as the only human input. The training process is two-fold where the first supervised stage gets the model "on-distribution" and the second RL stage refines and improves performance. Using SL as the first step in bootstrapping the RL process is standard, it counteracts the brittleness and difficulty in open form RL. + +**Supervised Stage.** The first stage of CAI which someone coined "principled instruction correction" consist of prompting the model with harmful prompts and collecting the responses. The model is then asked to critique the responses based on a random principle from the constitution and revise the original response. This builds a supervised dataset which is used to finetune a pretrained language model. *The main purpose of this phase is to easily and flexibly alter the distribution of the model’s responses, to reduce the need for exploration and the totall ength of training during the second RL phase.* + +**RL Stage.** The second stage mimics RLHF, except that human preference is replaced with AI feedback (i.e. RLAIF). The model trained through SL is asked to generate a number of responses to every harmful prompt in a dataset. The model is then asked to rank the responses according to a constitutional principle. This produces an AI-generated preference dataset which is used to train a preference model (PM). In the same vein as RLHF, the SL model is finetuned against the PM resulting in a policy trained by RLAIF. + +# Collective Constitutional AI +In a pioneering experiment, Anthropic, in collaboration with the Collective Intelligence Project, engaged around 1,000 Americans to help draft a constitution for an AI system. This initiative aimed to explore how democratic processes can influence AI development, particularly through Anthropic's Constitutional AI (CAI) method. Traditionally, Anthropic's AI models, like Claude, have been guided by an in-house constitution inspired by global ethical standards. This experiment was a departure, allowing public involvement in shaping AI values. + +The public's input led to a constitution that both aligned with and diverged from Anthropic's original version. The experiment was groundbreaking in its approach, as it was one of the first times a language model's behavior was directly shaped by collective public deliberation. This effort represents a significant step towards making AI systems more transparent, representative, and accountable, illustrating the potential for democratic processes to shape the future of AI development. \ No newline at end of file