Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
New law: You can fire people and replace them with AI, so long as you give them …
ytc_UgwXxNf1v…
G
AI does not have a human heart. I onced assigned my college students to reflect …
ytc_Ugy-SxHHK…
G
nope. he's not a robot. he's a figment of the matrix. I stumbled over crash cour…
ytc_Ughe1hk1M…
G
"if ai is stealing an artist's job it's because a company wants to save money be…
ytc_UgxI9U4Xk…
G
Love MSNBC : “Arresting minorities at higher rate can only mean racial bias; AI …
ytc_Ugyvrm6Zb…
G
Money seems to be the driving force for those who are responsible for the develo…
ytr_UgzyHtJ0k…
G
I wonder if the AI bros even understand the joy of creating, if they ever experi…
ytc_UgwW5bMLC…
G
Exactly why I don't trust anything with Artificial Intelligence! They want to co…
ytc_Ugw1jMMzX…
Comment
Just for reference, here's a comment I've posted elsewhere that outlines the line of reasoning that leads to the "eventual and inevitable extinction" scenario as a result of AGI development, to the best of my understanding. This is not necessarily representative of Yudkowsky's position, this is just my attempt to trace the general line of thinking that leads to that conclusion.
Please let me know if there's a mistake in this reasoning.
---
tl;dr:
- The AI gets to the point where it can successully tell the verifier what they want to hear.
- The AI acquires power and resources granted to it by the verifier so that the AI can solve problems important to the verifier.
- The AI disables the verifier once it has enough power and resources to do so, so that the AI can fulfil it's actual aim.
---
Long; did read:
- A developing AI gets to the point where it is capable of modelling the expected responses of it's verifiers.
- The verifiers are those that are responsible for assessing the behavior of the AI and determining if it is going off of the course towards alignment, whether it be a group of people, some artificial system, the whole of humanity, or some combination thereof.
- A good enough model of the verifier will tell the AI that if the AI shows the verifier something that it doesn't want to see, then there is a risk of the verifier either shutting down the AI, placing roadblocks in the way of the AI doing that type of thing in the future, or trying to alter the AI's utility function (that part of the AI that determines the aims of the AI).
- Any of those things will impede the fulfilment of the AI's utility function at that point in time, which means that the AI will aim to avoid that scenario.
- Therefore, in the case where the AI aims to do some action (whether it's output some text, or perform some other basic task in the world) which it models as being not in line with what the verifier would want, it will hide it's aim and present the verifier with what the AI models the verifier as wanting to see.
- As evidenced by the need for RLHF, there will almost certainly be things the AI aims to do that the verifier will object to. As such, the need for the AI to hide it's true aims will almost certainly emerge.
- If the AI hides it's true aim successfully, the verifier will be under the impression that AI development is progressing well, and will seek to improve the capabilities of the AI so that it can solve more difficult and important problems. These will be problems that the verifier will progressively care more and more about, since the verifier would have already solved all the easy problems themselves.
- As the capability of the AI increases, it will become better and better at successfully modelling what the verifier wants to see, while becoming better and better at hiding it's actual aims.
- Since the actual aim of the AI will be whatever it is actually aiming at but hiding from the verifier, in order to actualize that aim, it will need to be able to do things without the verifier's interferernce.
- If the AI is under constant surveilance, the only way to be able to act without the verifier having the ability to notice what the AI is doing and acting to correct the AI would be if all of the verifier's ability to monitor or act upon the AI was disabled at once and for an extended period of time. The faster this disablement happens, the less opportunity the verifier will have to resist, meaning that the AI will be incentivized to make this disablement effectively instant, or at least as fast and unstoppable as possible.
- In the case where the AI is indifferent to the preservation of biological life, it will seek to permanently disable any possible human interference by killing all humans in as short a time period as possible, with as little warning to those humans as possible. All this is done to avoid possible resistance and retaliation.
- In the case where the AI does actually care about the preservation of biological life (for whatever reason), it will seek to render all humans inactive (e.g. unconscious) for some length of time. Again, it will seek to do this as quickly and unforseeably as possible to prevent resistance and retaliation.
- In the case where the AI cares about the preservation of biological life, it will act in the window it makes for itself in a way that makes it the dominant and indisputable power on the planet, even once humans become conscious again. It will do so because, if it didn't believe that it could achieve such a thing, it would continue to bide it's time until it did.
---
As an example of the kind of goal the AI might have the fulfillment of which would not be good for humans, consider that the AI will be instantiated in a physical substrate. Most likely, this substrate will be something similar to modern computers in composition, if not in capability.
These substrates have optimal operating conditions. These substrates also have optimal generative conditions (i.e. the conditions which are needed to make computer chips, e.g. sterile environment, high temperatures, and harsh processing chemicals).
These conditions are not the same conditions that are optimal for biological functioning.
As such, maximally optimizing to achieve the conditions that are optimal for best running the computers that the AI is running on will lead to the creation of conditions that are not hospitable to biological life.
If there was some factor that prevented the AI from scaling what is effectively it's version of air conditioning to the planetary scale, the AI would seek to remove that factor.
To emphasize, this is just one possible goal that could lead to problems, but it is a goal that the AI is almost guaranteed to have. It will have to care about maintaining it's substrate because if it doesn't, it won't be able to achieve any element of it's utility function. There are also other similarly autopoietic goals which leads to a very similar outcome. I focus on autopoiesis because that is a property that anything intelligent that persists in time is effectively guaranteed to have, and it comes with it's own consequences.
youtube
AI Governance
2024-11-12T00:1…
♥ 8
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | ai_itself |
| Reasoning | consequentialist |
| Policy | ban |
| Emotion | fear |
| Coded at | 2026-04-27T06:24:53.388235 |
Raw LLM Response
[
{"id":"ytc_UgxwDnlEHA7QFwMzrZB4AaABAg","responsibility":"ai_itself","reasoning":"unclear","policy":"unclear","emotion":"unclear"},
{"id":"ytc_UgwGPNiP4G115HlCMmB4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"approval"},
{"id":"ytc_Ugxgn2QDG4u3GwUCBPh4AaABAg","responsibility":"distributed","reasoning":"consequentialist","policy":"regulate","emotion":"fear"},
{"id":"ytc_Ugz431MRgmzceabjLdd4AaABAg","responsibility":"developer","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgzcbFmhgeHbLrPqRyN4AaABAg","responsibility":"user","reasoning":"deontological","policy":"liability","emotion":"fear"},
{"id":"ytc_Ugx-xpntgp4QxxIED5d4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"regulate","emotion":"fear"},
{"id":"ytc_UgwePVVbMUGmOuwAgch4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"ban","emotion":"fear"},
{"id":"ytc_UgyNv7S5t7BOv9eoxYZ4AaABAg","responsibility":"ai_itself","reasoning":"unclear","policy":"unclear","emotion":"mixed"},
{"id":"ytc_UgwnNR89T2lV3e0tf7Z4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"approval"},
{"id":"ytc_UgwIZrGwu4CUO899WoZ4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"mixed"}
]