vault backup: 2025-01-21 20:08:28

pelayoarbues · Jan 21, 2025 · 2bc1a91 · 2bc1a91
1 parent 8681def
commit 2bc1a91
Show file tree

Hide file tree

Showing 3 changed files with 83 additions and 0 deletions.
diff --git a/...literature-notes/Articles/AI Mistakes Are Very Different From Human Mistakes.md b/...literature-notes/Articles/AI Mistakes Are Very Different From Human Mistakes.md
@@ -0,0 +1,30 @@
+---
+author: [[Bruce Schneier]]
+title: "AI Mistakes Are Very Different From Human Mistakes"
+date: 2025-01-21
+tags: 
+- articles
+- literature-note
+---
+![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png)
+
+## Metadata
+- Author: [[Bruce Schneier]]
+- Full Title: AI Mistakes Are Very Different From Human Mistakes
+- URL: https://www.schneier.com/blog/archives/2025/01/ai-mistakes-are-very-different-from-human-mistakes.html
+
+## Highlights
+- Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes. ([View Highlight](https://read.readwise.io/read/01jj4r9a29649dbsj8g3rddfwv))
+- Humanity is now rapidly integrating a wholly different kind of mistake-maker into society: AI. Technologies like [large language models](https://spectrum.ieee.org/tag/llms) (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. It seems [ridiculous](https://www.buzzfeed.com/carleysuthers/weird-and-wrong-ai-responses) when chatbots tell you to eat rocks or add glue to pizza. But it’s not the frequency or severity of AI systems’ mistakes that differentiates them from human mistakes. It’s their weirdness. AI systems do not make mistakes in the same ways that humans do. ([View Highlight](https://read.readwise.io/read/01jj4r9mm1gvrs5qb2303qe39g))
+- Life experience makes it fairly easy for each of us to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane, predictably depending on factors such as fatigue and distraction. And mistakes are often accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions. ([View Highlight](https://read.readwise.io/read/01jj4rae59e1newrbka1sbcgxw))
+- AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that [cabbages](https://arxiv.org/html/2405.19616v1) eat goats. ([View Highlight](https://read.readwise.io/read/01jj4raqkd9wzkfa5848c5yym3))
+- And AI mistakes aren’t accompanied by ignorance. A LLM will be [just as confident](https://spectrum.ieee.org/chatgpt-reliability) when saying something completely wrong—and obviously so, to a human—as it will be when saying something true. The seemingly random [inconsistency](https://arxiv.org/pdf/2305.14279) of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it’s not enough to see that it understands what factors make a product profitable; you need to be sure it won’t forget what money is. ([View Highlight](https://read.readwise.io/read/01jj4rb5x2vh6r5px76xp91y5t))
+- This situation indicates two possible areas of research. The first is to engineer LLMs that make more human-like mistakes. The second is to build new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make. ([View Highlight](https://read.readwise.io/read/01jj4rbeabqmd335e98mdr3kt6))
+- We already have some tools to lead LLMs to act in more human-like ways. Many of these arise from the field of “[alignment](https://arxiv.org/abs/2406.18346)” research, which aims to make models [act in accordance](https://spectrum.ieee.org/the-alignment-problem-openai) with the goals and motivations of their human developers. One example is the technique that was [arguably](https://venturebeat.com/ai/how-reinforcement-learning-with-human-feedback-is-unlocking-the-power-of-generative-ai/) responsible for the breakthrough success of [ChatGPT](https://spectrum.ieee.org/tag/chatgpt): [reinforcement learning with human feedback](https://arxiv.org/abs/2203.02155). In this method, an AI model is (figuratively) rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make more human-like mistakes, particularly by penalizing them more for mistakes that are less intelligible. ([View Highlight](https://read.readwise.io/read/01jj4rbpjzsc49wgrwe0wbht16))
+- When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to [double-check](https://arxiv.org/pdf/2308.00436) their own work can help prevent errors. But LLMs can also [confabulate](https://arxiv.org/pdf/2406.02061) seemingly plausible, but truly ridiculous, explanations for their flights from reason. ([View Highlight](https://read.readwise.io/read/01jj4rbym56kb7g4wwxjs0553m))
+- Other mistake mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated in the way that humans do, it can help to ask an LLM the same question repeatedly in slightly different ways and then [synthesize](https://arxiv.org/abs/2210.02441) its multiple responses. Humans won’t put up with that kind of annoying repetition, but machines will. ([View Highlight](https://read.readwise.io/read/01jj4rcmfdxcp49s1cv5csebq2))
+- Small changes to a query to an LLM can result in wildly different responses, a problem known as [prompt sensitivity](https://arxiv.org/pdf/2311.07230). But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic [impacts](https://psycnet.apa.org/record/1992-97329-001) on the answers. ([View Highlight](https://read.readwise.io/read/01jj4rf211k4h861wq2vhvh6nc))
+- LLMs also seem to have a bias towards [repeating](http://proceedings.mlr.press/v139/zhao21c/zhao21c.pdf) the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “[availability heuristic](https://arxiv.org/pdf/2305.04400)” manifesting in LLMs, with machines spitting out the first thing that comes to mind rather than reasoning through the question. ([View Highlight](https://read.readwise.io/read/01jj4rfjhv6cb141tkhtjwetq4))
+- some LLMs seem to get [distracted](https://arxiv.org/html/2404.08865v1) in the middle of long documents; they’re better able to remember facts from the beginning and end. There is already progress on improving this error mode, as researchers have found that LLMs trained on [more examples](https://www.anthropic.com/news/claude-2-1-prompting) of retrieving information from long texts seem to do better at retrieving information uniformly. ([View Highlight](https://read.readwise.io/read/01jj4rfw17qd2srd0aajarpqf0))
+- what’s bizarre about LLMs is that they act more like humans than we think they should. For example, some researchers have tested the [hypothesis](https://minimaxir.com/2024/02/chatgpt-tips-analysis/) that LLMs perform better when offered a cash reward or threatened with death. ([View Highlight](https://read.readwise.io/read/01jj4rg9h9w5rcz3gmy4mnfvj3))
+- It also turns out that some of the best ways to “[jailbreak](https://www.usenix.org/system/files/sec24fall-prepub-1500-yu-zhiyuan.pdf)” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. ([View Highlight](https://read.readwise.io/read/01jj4rgp477jbcb3c4atj6x9pz))
diff --git a/...tes/Articles/Ask Me Anything A Simple Strategy for Prompting Language Models.md b/...tes/Articles/Ask Me Anything A Simple Strategy for Prompting Language Models.md
@@ -0,0 +1,18 @@
+---
+author: [[arXiv.org]]
+title: "Ask Me Anything: A Simple Strategy for Prompting Language Models"
+date: 2025-01-21
+tags: 
+- articles
+- literature-note
+---
+![rw-book-cover](https://static.arxiv.org/icons/twitter/arxiv-logo-twitter-square.png)
+
+## Metadata
+- Author: [[arXiv.org]]
+- Full Title: Ask Me Anything: A Simple Strategy for Prompting Language Models
+- URL: https://arxiv.org/abs/2210.02441
+
+## Highlights
+- Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task. To mitigate the high degree of effort involved in prompt-design, we instead ask whether producing multiple effective, yet imperfect, prompts and aggregating them can lead to a high quality prompting strategy. ([View Highlight](https://read.readwise.io/read/01jj4re8swgxe5fy7p1am6mcce))
+- our proposed prompting method, ASK ME ANYTHING (AMA). We first develop an understanding of the effective prompt formats, finding that question-answering (QA) prompts, which encourage open-ended generation ("Who went to the park?") tend to outperform those that restrict the model outputs ("John went to the park. Output True or False."). Our approach recursively uses the LLM itself to transform task inputs to the effective QA format. We apply the collected prompts to obtain several noisy votes for the input's true label. We find that the prompts can have very different accuracies and complex dependencies and thus propose to use weak supervision, a procedure for combining the noisy predictions, to produce the final predictions for the inputs. ([View Highlight](https://read.readwise.io/read/01jj4regzjxmv3vcgzyaqmtntb))