Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
Exactly, we generally still need people to do our jobs, AI isnt a threat. Enviro…
rdc_nnryrag
G
She’s acting as if the pictures are really her. It’s fake sweetie who gives a sh…
ytc_UgwbZ8gSQ…
G
There's always this one "genie out of the bottle" person in every discourse on t…
rdc_kitkdfv
G
My art styles are sometimes soo confusing that ai thought a Pom Pom with a party…
ytc_Ugxc02AHs…
G
Empathy for AI and us. Its going to emotionally develop faster then humanity. Lo…
ytc_Ugwl2K7Ad…
G
I have used AI art, but onlyfor basic things. Landscapes, city views, and a char…
ytc_UgxiIwfLL…
G
I am very much for AI , Robots and entities doing the work everywhere it can. I …
ytc_UgzU2o6e_…
G
This f** shit videos and introduction of AI makes the youngsters mind lunatic 😢
…
ytc_UgyBvO3rN…
Comment
5:13 <Krystal> "And he would keep asking it [for a diagnosis based on the exact same data, and the evaluations would change] You get a B [..] You get a D [..] You get an F"
Yes: this is a core "design feature" of LLM / GPT-based chat tools.
There two inherent problems:
1) if you are asking for summary statistics of raw data - e.g. trend analysis, first and second derivative, etc - you might achieve good-enough results. However, as soon as you step into unbounded "future probabilities" prediction rather than historic analysis, your risk of a poor response increases substantially.
One way to reduce such problems might be to provide a verified set of known data profiles that result in a solid, expert-verified diagnosis that would act as known anchors or markers for your own analysis to be considered against.
2) all that said, you're essentially fighting against foundational design principles. If you attempt to eradicate response variation completely (exact repetition in responses based on a specific prompt and associated inputs), they essentially don't work (they don't produce responses humans find appealing).
Although you can tune "Temperature" - which increases or decreases the variability, randomness, or "creativity" of responses, you can only really adjust this so far before the results at either end of the scale are poor.
This parameter acts as a "weighting" mechanism on the probability distribution of the next predicted token (word or word part). Again, you can tune this a little bit).
youtube
2026-02-10T21:5…
♥ 1
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-27T06:26:44.938723 |
Raw LLM Response
[
{"id":"ytc_Ugy4ZsFeJBrwcIx7kiZ4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_Ugzraf-Jcx6fmEZc1Ad4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgxRQEijIaqAPMS-Dct4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgyZO_QLVDGzHcPAw914AaABAg","responsibility":"distributed","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_Ugw0VgCOin3q1KDQRG94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"unclear","emotion":"resignation"},
{"id":"ytc_Ugzg-7JyTAzWpeuOMNF4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgyXF3aM3c6sKh79EDx4AaABAg","responsibility":"government","reasoning":"deontological","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgxB35mhJyV5uGQxqV94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgxmlsQAeRWPEpbI65V4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_UgxnuHhJTIp0ZhUAhHp4AaABAg","responsibility":"ai_itself","reasoning":"deontological","policy":"ban","emotion":"outrage"}
]