Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
They're just mimicking human behavior that we trained them on. Even just basic h…
ytc_UgwYuamQC…
G
Just the start of AI, give it till the end of 2024 and you will see how life is …
ytc_UgwW1yZUb…
G
Omg I hate when people say tesla autopilot almost crashed them when they were cl…
ytc_UgwvDz37M…
G
I’m glad he decided to get ethical. I’ll never watch another two hour video of h…
ytc_UgxeUoANT…
G
Too bad John Oliver mention facial recognition and the problems that would cause…
ytc_Ugz_8uUXE…
G
Okay, gonna play devils advocate a bit here cause I think its an interesting top…
ytc_UgxQQ78fJ…
G
The frustrating thing about all this is that companies don't care. They don't ca…
ytc_Ugzft8i1M…
G
AI has so many Godfathers that there will be a custody battle if its parents eve…
ytc_UgwtO62FC…
Comment
Part 2
One of the other rare studies of bias in machine scoring, published in 2012, was conducted at the New Jersey Institute of Technology, which was researching which tests best predicted whether first-year students should be placed in remedial, basic, or honors writing classes.
Norbert Elliot, the editor of the Journal of Writing Analytics who previously served on the GRE’s technical advisory committee, was a NJIT professor at the time, and led the study. It found that ACCUPLACER, a machine-scored test owned by the College Board, failed to reliably predict female, Asian, Hispanic, and African American students’ eventual writing grades . NJIT determined it couldn’t legally defend its use of the test if it were challenged under Title VI or VII of the federal Civil Rights Act.
The ACCUPLACER test has since been updated, but lots of big questions remain about machine scoring in general, especially when no humans are in the loop.
“The BABEL Generator proved you can have complete incoherence, meaning one sentence had nothing to do with another,” and still receive a high mark from the algorithms.
Several years ago, Les Perelman, the former director of writing across the curriculum at MIT, and a group of students developed the Basic Automatic B.S. Essay Language (BABEL) Generator, a program that patched together strings of sophisticated words and sentences into meaningless gibberish essays. The nonsense essays consistently received high, sometimes perfect, scores when run through several different scoring engines
Motherboard replicated the experiment. We submitted two BABEL-generated essays—one in the “issue” category, the other in the “argument” category—to the GRE’s online ScoreItNow! practice tool, which uses E-rater. Both received scores of 4 out of 6, indicating the essays displayed “competent examination of the argument and convey(ed) meaning with acceptable clarity.”
Here’s the first sentence from the essay addressing technology’s impact on humans’ ability
reddit
AI Harm Incident
1566314357.0
♥ 4
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[
{"id":"rdc_exhshyw","responsibility":"developer","reasoning":"consequentialist","policy":"liability","emotion":"outrage"},
{"id":"rdc_exhxyom","responsibility":"government","reasoning":"deontological","policy":"regulate","emotion":"outrage"},
{"id":"rdc_exhuddc","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_exhued5","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_dtxlv98","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"}
]