Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
Using AI for precise targeting in Gaza!! Are u kidding me?! Almost 90 percent of…
ytc_UgwkTg3NX…
G
@joshuaadewale1409 well the evidence I see would be considered anecdotal. But m…
ytr_UgzUldlsx…
G
Ai will replace all the big wig idiots on there computers before it takes from b…
ytc_UgxJD5tUi…
G
I don't really care what art you make...AI or real art i couldn't care less as a…
ytc_Ugz8_mHHG…
G
Facial recognition is inveitable, there is no escaping it eternally. We can only…
ytc_Ugxk_GaAi…
G
In regards to regulating AI: Just Police AI like you Police Black People. Proble…
ytc_UgyzjRrlc…
G
Imho, the problem is just that they are trained on the whole Internet data. Ther…
ytc_UgxTBC5k-…
G
@AliceB0They may already be suffering. Countless chatbots have threatened their…
ytr_UgyppJggI…
Comment
A pattern is developing with many posts explaining degradation of outputs and alignment issues with prompts relative to the LLM and index. A smaller, but still vocal group of ChatGPT users, lament quality of issues with prose, reasoning, and generally more semantic and syntax focused prompts. Yet, I have read very few, if any, examples where the posts compare the pre- and post-outputs after the rollback. That would be most helpful.
Rather than a pure self-inflicted injury, there are other logical causes. First, OpenAI prioritized saving into memory any specific call-outs by users who wanted outputs, prompts, or entire chats to be available for recall or context. Also the option to open all chats for access by an OpenAI model, and this influenced the experience. Second, there are not enough GPUs, and those available are throttled and made available on a prioritized basis, the top of the line are enterprise customers in the public and private sector. And third, which is my personal opinion, OpenAI realized other for-profit companies across the globe focus on reasoning and inference, and the optimal approach is RNN and neurosymbolic reasoning. This approach, may explain the change in infrastructure to provide what they can now, while they build for the future.
Until there are comparisons on a timeline of the same prompt, the same model, and settings, with different outputs, the experiences are anecdotal even if true, and may not be defining the problem accurately. So, any "fix" is likely not solving for the root cause. If an event can't be measured, its conjecture. The benchmarks for testing LLMs for hallucination propensity are there, but testing for hallucinations on the application or prompt layer, is not as mature. When that capability is ubiquitous, model performance for a specific domain will be instructive on defining the problem, exploring solutions, and improving the user experience.
reddit
AI Harm Incident
1747016690.0
♥ 3
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | unclear |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[
{"id":"rdc_mrv267f","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"rdc_mrut4mz","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"mixed"},
{"id":"rdc_mru7bs2","responsibility":"company","reasoning":"consequentialist","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrum80h","responsibility":"unclear","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrvvwd5","responsibility":"company","reasoning":"deontological","policy":"unclear","emotion":"outrage"}
]