Raw LLM Responses

Inspect the exact model output for any coded comment.

Comment
One idea to keep in mind is that you can use a cheap AI model to augment GPT-4/5 or even human output. A joke example is replacing the word "wand" with "wang" in the Harry Potter stories. Taping knives to roombas. Or consider how not every employee was aware they were working on the atom bomb (or are working at scam organizations today). Basically, advanced jailbreaking, as opposed to those jailbreaks that should be obvious to fix. I don't know if such a technique would actually scale for truly dangerous scenarios, but I believe it'd definitely scale for hate speech and erotica, and I've already found some success with this technique with barely any postprocessing at all. OpenAI would also probably not really care about this kind of misuse, so long as they weren't directly responsible. Terrorist level misuse is a different story, and I'm not sure how you could avoid the possibility without severely handicapping your product. Considering helpful business emails and manipulative phishing scams are basically identical, as one example...
reddit AI Responsibility 1682548472.0 ♥ 2
Coding Result
DimensionValue
Responsibilitynone
Reasoningunclear
Policynone
Emotionmixed
Coded at2026-04-25T08:33:43.502452
Raw LLM Response
[{"id":"rdc_jhspuqw","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"},{"id":"rdc_jht26c9","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"},{"id":"rdc_jhsqwc5","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"},{"id":"rdc_jhsre0c","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"approval"},{"id":"rdc_jhuh106","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"mixed"}]