Skip to content

Commit

Permalink
post and about update
Browse files Browse the repository at this point in the history
  • Loading branch information
LeonEricsson committed Dec 6, 2023
1 parent 1650555 commit f388249
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 16 deletions.
29 changes: 13 additions & 16 deletions components/AboutCard/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,20 @@ function AboutCard() {
return (
<div>
<p>
My name is Leon Ericsson. I recently earned a Master's Degree in Machine
Learning and have relocated to Stockholm with my <i>sambo</i>—a Swedish
term that roughly translates to 'cohabitant' (ugh) in English. When I'm
not researching I enjoy common cliches such as cooking, working out and
my name is leon ericsson. i'm a recent grad with a master's degree in machine
learning living in stockholm with my <i><a href="https://collectum.se/en/startpage/private/your-situation/i-have-a-sambo#:~:text=%E2%80%9CSambo%E2%80%9D%20is%20a%20Swedish%20legal,cover%20and%2For%20Repayment%20cover.">sambo</a></i>.
when i'm not working i enjoy unique cliches; cooking, the gym and
traveling.
</p>
<br></br>
<p>
In the professional realm, I work as a Research Engineer. My interests
span a wide range of topics, but I mostly focus on{" "}
<b>foundational models</b>, <b>policy learning</b>,{" "}
<b>computer vision</b>, and <b>medical AI</b>. If you're curious about
my current research interests, the best way to find out is to explore my
blog. I hacked this website over a weekend as a platform to share
summaries and thoughts on the latest (sometimes seminal) research my
areas of focus. While primarily for my own edification, I hope it offers
valuable insights to anyone interested in these rapidly evolving fields.
</p>
<br></br>
<p>
professionally, i'm a research engineer. my interests are scattered,
feels like i find something new every other week, but broadly i'd say
they fall into {" "} <b>foundational models</b>, <b>policy learning</b>,{" "}
<b>computer vision</b>, and <b>medical ai</b>. if you're curious about
my current research interests, skim my blog. i hacked this website as a platform to share
and document my thoughts on research that i come across. while primarily for my own edification,
hopefully there's something here that proves insightful to you.
</p>
</div>
);
Expand Down
2 changes: 2 additions & 0 deletions posts/2023-11-16-rr.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ title: "Reading Roundup"
categories: []
year: 2023
type: paper
author:
exturl:
---

## The Reversal Curse: A Stark Reflection on LLMs' Limitations in Reasoning
Expand Down
27 changes: 27 additions & 0 deletions posts/2023-11-30-cai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
layout: post
title: "Constitutional AI: Harmlessness from AI Feedback"
categories: [NLP]
year: 2022
type: paper
author: Bai
exturl: https://arxiv.org/abs/2212.08073
---

RLHF has, despite much skepticism, proven pivotal in accelerating state of the art dialogue based language models. The question on everyone's mind lately has been how this scales in the long term. It's generally agreed upon that we've exahusted almost all high quality tokens available on the internet and the next frontier that's rising is the field of synthetic data. If we're going to scale models far beyond what they're at today, we're also going to need to scale alignment which is already an expensive process. Human feedback lacks ability to scale to magnitudes beyond today's horizons and as a result, researchers have looked to ways of cutting out the need for human labels. When it comes to synthetic data, Anthropic stands out as the most prominent player. They've been persistent in working without humans for quite some time and today I'd like to take a deeper dive into a method they coin Constitutional AI. A method to train AI systems in being helpful, honest, and harmless without human supervision; governed entirely through the specification of a short list of principles or instructions, i.e. a constitution. The motivations behind their work was:

1. Study the possibility of using AI systems to help supervise other AIs, thus *scaling supervision*.
2. Improve upon prior work in training harmless AI assistants by *eliminating evasive responses*, reducing tension between helpfulness and harmlessness.
3. Make the principles governing AI behavior more transparent.

# The Constitutional AI (CAI) approach
CAI is an extreme form of scaled supervision - techniques that leverage AI to help humans efficiently supervise AI - that relies on a set of guiding principles as the only human input. The training process is two-fold where the first supervised stage gets the model "on-distribution" and the second RL stage refines and improves performance. Using SL as the first step in bootstrapping the RL process is standard, it counteracts the brittleness and difficulty in open form RL.

**Supervised Stage.** The first stage of CAI which someone coined "principled instruction correction" consist of prompting the model with harmful prompts and collecting the responses. The model is then asked to critique the responses based on a random principle from the constitution and revise the original response. This builds a supervised dataset which is used to finetune a pretrained language model. *The main purpose of this phase is to easily and flexibly alter the distribution of the model’s responses, to reduce the need for exploration and the totall ength of training during the second RL phase.*

**RL Stage.** The second stage mimics RLHF, except that human preference is replaced with AI feedback (i.e. RLAIF). The model trained through SL is asked to generate a number of responses to every harmful prompt in a dataset. The model is then asked to rank the responses according to a constitutional principle. This produces an AI-generated preference dataset which is used to train a preference model (PM). In the same vein as RLHF, the SL model is finetuned against the PM resulting in a policy trained by RLAIF.

# Collective Constitutional AI
In a pioneering experiment, Anthropic, in collaboration with the Collective Intelligence Project, engaged around 1,000 Americans to help draft a constitution for an AI system. This initiative aimed to explore how democratic processes can influence AI development, particularly through Anthropic's Constitutional AI (CAI) method. Traditionally, Anthropic's AI models, like Claude, have been guided by an in-house constitution inspired by global ethical standards. This experiment was a departure, allowing public involvement in shaping AI values.

The public's input led to a constitution that both aligned with and diverged from Anthropic's original version. The experiment was groundbreaking in its approach, as it was one of the first times a language model's behavior was directly shaped by collective public deliberation. This effort represents a significant step towards making AI systems more transparent, representative, and accountable, illustrating the potential for democratic processes to shape the future of AI development.

0 comments on commit f388249

Please sign in to comment.