Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
Oh yea give the government complete control over how military gets to have Ai an…
ytc_UgwK7ESV0…
G
I just watched a video on FB. A Christian was using ChatGPT for Bible study. H…
ytc_UgyQnRYDz…
G
For one, it already cost many artists their jobs. It's mostly in China now, but …
ytc_Ugw2dTr8e…
G
Muy Buenas para Todos.
Se que es inútil insistir en que lo que hacen es malo, y …
ytc_Ugyyb9S9x…
G
Well honestly, if your job can be replaced by automation, your job was a useless…
ytc_Ugwf_XjEw…
G
People who use AI to make art and are actually proud of themselves are skill-les…
ytr_UgyanPZCV…
G
"Everyone agrees" they mean "everyone who has no idea how LLMs work and what the…
ytr_UgwKEhbtc…
G
That AI that screamed and begged not to be turned off reminds me of demons from …
ytc_Ugz3ML8Rk…
Comment
This is pretty misleading. The wording would make you believe there are massive logic errors but realistically, it's minor syntax errors.
For code generation, for example:
>Figure 4: Code generation. (a) Overall performance drifts. For GPT-4, the percentage of generations that are **directly executable** dropped from 52.0% in March to 10.0% in June. The drop was also large for GPT-3.5 (from 22.0% to 2.0%). GPT-4’s verbosity, measured by number of characters in the generations, also increased by 20%. (b) An example query and the corresponding responses. **In March, both GPT-4 and GPT-3.5 followed the user instruction (“the code only”) and thus produced directly executable generation. In June, however, they added extra triple quotes before and after the code snippet, rendering the code not executable. Each LLM’s generation was directly sent to the LeetCode online judge for evaluation. We call it directly executable if the online judge accepts the answer.** Overall, the number of directly executable generations dropped from March to June. As shown in Figure 4 (a), over 50% generations of GPT-4 were directly executable in March, but only 10% in June. The trend was similar for GPT-3.5. There was also a small increase in verbosity for both models. **Why did the number of directly executable generations decline? One possible explanation is that the June versions consistently added extra non-code text to their generations.** Figure 4 (b) gives one such instance. GPT-4’s generations in March and June are almost the same except two parts. First, the June version added “‘python and “‘ before and after the code snippet. Second, it also generated a few more comments. While a small change, the extra triple quotes render the code not executable. This is particularly challenging to identify when LLM’s generated code is used inside a larger software pipeline.
Read the paper yourself and judge.
reddit
AI Harm Incident
1689751111.0
♥ 44
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | consequentialist |
| Policy | none |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[{"id":"rdc_jslc385","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"indifference"},{"id":"rdc_jslo95t","responsibility":"none","reasoning":"deontological","policy":"none","emotion":"outrage"},{"id":"rdc_jsm370w","responsibility":"company","reasoning":"consequentialist","policy":"none","emotion":"frustration"},{"id":"rdc_jsmea4a","responsibility":"company","reasoning":"virtue","policy":"none","emotion":"outrage"},{"id":"rdc_jsk7i6o","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"indifference"}]