Raw LLM Responses

Inspect the exact model output for any coded comment.

Comment
​@DOne-ci1jg ​ - Hah thanks though I could maybe have been a bit more diplomatic. He has just made quite a lot of rather bold claims before and it seems incredibly odd to consider him a leader in the field. Perhaps "has strong convictions not shared by the field" describes him better than ulterior motives though. I agree that he served a great role in pushing further about potential risks in this committee. It's just that I worry about the specifics and there should be many other notable researchers to consider. Ironically, prior to OpenAI's commerical success, Sam Altman would actually have been a strong recommendation. Congratulations on your progress anyhow and great to hear that you are getting into an area with plenty of exciting needs and fresh perspectives. I think the list you gave is already great when it comes to AI safety, along with perhaps some of the more technical famous names like Bostrom. I think for understanding how large language models actually do what they do (at the level of weights), it is an active research topic and I don't think there is anyone that maybe devotes themselves to popularizing ideas there yet and hence no notable thought leader. Including all of the people you named. I think myself, everyone in the field, you and others are rather astounded that these models that are actually relatively simple in their design actually already compete with our own intellect. It's still a mystery. There are some high-level intuitive explanations that may very well just be incorrect rationalizations on our part, but I guess that is a good starting point. You may have already looking into this but the key term here is "attention". Could be worth understanding that technique and why it was introduced (the alternative being fully-connected layers), and then read about some of the ideas about what the attention actually does when generating predictions and various processes it can simulate (including reasoning and gradient descent). There are blogs on this that will come up. These are mostly just ideas though. To get deeper, you may need to turn to reading papers rather than finding books or videos on this. You could start with one of the leading groups which should be Antrophic, OpenAI, Allen AI. Or, you start with a paper you think is highly relevant and then check the papers that it references, or references that paper, scan the abstracts and keep going with the ones that seem to focus on your topic of interest. Such as "Attention is not Explanation". This could also be a good starting point from the safety perspective - https://forum.effectivealtruism.org/posts/zdA3ZpGZ5FxfaRgjb/key-papers-in-language-model-safety If you want a naive answer though about the made-up URLs - the models are large but are still much smaller than the dataset, so they can mostly only memorize patterns or things that have been repeated a lot. So ask it for a Wikipedia URL and it should have it, but ask it for a paper URL and it will only be able to tell you an example of what those kinds of URLs tend to look like. The reason it gives you the best bet of such a URL rather than admitting that it does not know the source, is because people do not write things like "Source: I dont remember", and all it is trying to do is to replicate what was in the data. You could imagine this being encoded in the model as various levels of "what tends to come after this or that" - this is a truth claim, after truth claims sometime come references, references tend to have the structure of a name and a URL, a url tends to consist of http:// followed by one of these notable domains, followed by these kinds of paths etc. that reflect the content of the URL, which we know from the preceding statement. The attention recognizes when we are in a situation and suggests the type of continuation, from concrete to abstract to concrete as we move through layers. Another way to look at that is - we may expect these models to do well so long as they are able to interpolate within the space of variations of human texts, and to do sketchy things when it needs to extrapolate beyond that space. About legislating against AIs stating false things - I don't think that is a feasible solution.
youtube AI Governance 2023-05-17T00:0… ♥ 1
Coding Result
DimensionValue
Responsibilitynone
Reasoningunclear
Policynone
Emotionindifference
Coded at2026-04-27T06:24:59.937377
Raw LLM Response
[ {"id":"ytr_Ugxz8_9G92PzqsTLWCx4AaABAg.9pmnSIUISGq9pnDGoKE0Jo","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"}, {"id":"ytr_Ugxz8_9G92PzqsTLWCx4AaABAg.9pmnSIUISGq9ppLcTv0UEc","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"approval"}, {"id":"ytr_Ugw2IlJ7ficPPscglXd4AaABAg.9pmkyJ_XN9D9pmx6kWhlG-","responsibility":"government","reasoning":"consequentialist","policy":"regulate","emotion":"fear"}, {"id":"ytr_Ugywn7_BGlhgG1ueOjd4AaABAg.9pmgv1rNU309pn8UhG53ui","responsibility":"none","reasoning":"consequentialist","policy":"unclear","emotion":"fear"}, {"id":"ytr_UgwoEc7OBagRiK_0GsZ4AaABAg.9pmg4bmPGxZ9pmt0L2_6sP","responsibility":"government","reasoning":"contractualist","policy":"regulate","emotion":"indifference"}, {"id":"ytr_UgyghHjE14Ky6xux1JJ4AaABAg.9pmfqzePAXv9pmqe-uR1Xp","responsibility":"none","reasoning":"virtue","policy":"none","emotion":"approval"}, {"id":"ytr_Ugwwp6n0OMeqRB-lHQB4AaABAg.9pmfIAn4dAn9pn3jgO6Xon","responsibility":"government","reasoning":"deontological","policy":"none","emotion":"outrage"}, {"id":"ytr_UgzGKrXfeYwqjiKS2lp4AaABAg.9pmcztGQBSE9pn7Y6ydjjh","responsibility":"none","reasoning":"virtue","policy":"none","emotion":"mixed"}, {"id":"ytr_UgwEOwAjW15hp4ixmjN4AaABAg.9pmbHdMEisR9pnw_ynXQT7","responsibility":"government","reasoning":"deontological","policy":"none","emotion":"outrage"}, {"id":"ytr_UgzMYYKu4zRQiU0OBOp4AaABAg.9pmaXvVtsmA9pmwUXBimb5","responsibility":"company","reasoning":"consequentialist","policy":"none","emotion":"indifference"} ]