Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
everyone does not realise this is fake AI, its just a simple programming compare…
ytc_UgxYblRoF…
G
9:34 honey I have one word in your tale haven't considered 9:58 AI hallucination…
ytc_Ugz6PtGxt…
G
attack Ring cameras next; police etc get automatic real time 24/7 no warrant acc…
ytc_UgzcTlWUu…
G
@tktspeed1433that's only true if the musician would copy the AI's ideas in full…
ytr_UgwUnqpfI…
G
To me it seems like the AI bot is like a speak and spell that nephilim can use w…
ytc_UgxDe6OkL…
G
I think your theory about that taillight has high chance to be accurate. Wish th…
ytc_UgzSjXTW3…
G
As a relatively beginner anime artist, this whole ai thing is just so depressing…
ytc_Ugzcfd16t…
G
EXACTLYYYYY ART ISNT ART WHEN ITS NOT MADE BY US HUMANS (AI) OFC ANIMALS MAKE AR…
ytc_UgxY7YAyw…
Comment
The thing where they trained GPT-4o on code with vulnerabilities was actually reassuring to Eliezer Yudkowsky.
In order to know what good behavior looks like, the model also needs to know what bad behavior looks like. Insecure code gets punished in the same way as hatespeech, so when you then make the model produce insecure code, the easiest way for the optimizer to achieve that is to simply make the model evil. The reassuring part was that this meant that behavior was tied to values pretty much across the board if changing it in one area can flip its behavior fully, indicating higher robustness to the process of RLHF than previously thought.
It's really not all that surprising. Though I think the implications aren't all that meaningful apart from it being surprisingly easy to mess up parts of a model ones data had absolutely nothing to do with.
Anyhow, it's less "revealing the models true self" than "making the model care about the exact opposite of what it did originally".
youtube
AI Moral Status
2025-12-12T21:5…
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | consequentialist |
| Policy | none |
| Emotion | approval |
| Coded at | 2026-04-27T06:24:53.388235 |
Raw LLM Response
[{"id":"ytc_UgwoPeMsVfJVfD235KZ4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"approval"},
{"id":"ytc_UgzNgiTXKTnsd9KAIXl4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"none","emotion":"fear"},
{"id":"ytc_UgwILvn9vSF1VnlIrMl4AaABAg","responsibility":"distributed","reasoning":"consequentialist","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwUF9z1CW4NnDWTr5J4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"liability","emotion":"fear"},
{"id":"ytc_UgzUI84MwRB5WxUznB94AaABAg","responsibility":"company","reasoning":"virtue","policy":"none","emotion":"mixed"},
{"id":"ytc_Ugz6rAfqZWNYf9BjA7h4AaABAg","responsibility":"company","reasoning":"unclear","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwT_4ubTRVoQOykPBx4AaABAg","responsibility":"distributed","reasoning":"mixed","policy":"none","emotion":"resignation"},
{"id":"ytc_UgxSY4WVINPbp-ZQjEF4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgyC2u9XjF6TYZxJNk14AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"ban","emotion":"resignation"},
{"id":"ytc_Ugxr1DWydj_B4gaXQmJ4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"fear"}]