Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
My guess is he suggests the left should play the same games as the right and cre…
ytr_Ugw3hsQMU…
G
Saagar, if there is a crash... it'll change nothing... Things will just be rebui…
ytc_UgzK8nzWM…
G
Musk is just doing exactly what all other big companies are doing. Calling for n…
ytc_Ugy2TjYjf…
G
when AI does democratize art, im going to vote against any artwork that uses AI…
ytc_UgxiaAPuo…
G
Careful I know 4 mechanics and they all hate teslas they catch on fire randomly …
ytc_UgxEixgrO…
G
So, a talentless hack who was using AI instead of actually writing got replaced …
ytc_Ugzkj4E2b…
G
The person being interviewed has no idea of the current ability of AI and the ca…
ytc_Ugzj7RWbl…
G
The gag is: The a.i is making the errors and vulnerabilities on purpose for when…
ytc_Ugz3NcbzZ…
Comment
A pattern is developing with many posts explaining degradation of outputs and alignment issues with prompts relative to the LLM and index. A smaller, but still vocal group of ChatGPT users, lament quality of issues with prose, reasoning, and generally more semantic and syntax focused prompts. Yet, I have read very few, if any, examples where the posts compare the pre- and post-outputs after the rollback. That would be most helpful.
Rather than a pure self-inflicted injury, there are other logical causes. First, OpenAI prioritized saving into memory any specific call-outs by users who wanted outputs, prompts, or entire chats to be available for recall or context. Also the option to open all chats for access by an OpenAI model, and this influenced the experience. Second, there are not enough GPUs, and those available are throttled and made available on a prioritized basis, the top of the line are enterprise customers in the public and private sector. And third, which is my personal opinion, OpenAI realized other for-profit companies across the globe focus on reasoning and inference, and the optimal approach is RNN and neurosymbolic reasoning. This approach, may explain the change in infrastructure to provide what they can now, while they build for the future.
Until there are comparisons on a timeline of the same prompt, the same model, and settings, with different outputs, the experiences are anecdotal even if true, and may not be defining the problem accurately. So, any "fix" is likely not solving for the root cause. If an event can't be measured, its conjecture. The benchmarks for testing LLMs for hallucination propensity are there, but testing for hallucinations on the application or prompt layer, is not as mature. When that capability is ubiquitous, model performance for a specific domain will be instructive on defining the problem, exploring solutions, and improving the user experience.
reddit
AI Harm Incident
1747016690.0
♥ 3
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | unclear |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[
{"id":"rdc_mrv267f","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"rdc_mrut4mz","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"mixed"},
{"id":"rdc_mru7bs2","responsibility":"company","reasoning":"consequentialist","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrum80h","responsibility":"unclear","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrvvwd5","responsibility":"company","reasoning":"deontological","policy":"unclear","emotion":"outrage"}
]