Raw LLM Responses

Inspect the exact model output for any coded comment.

Comment
Creating an agent that would do significant damage is extremely easy with a competent model. Here's a hypothetical scenario; Open AI puts out GPT (insert number here). Someone uses python and the api to create a script that gives the LLM a simple goal "use the command prompt to find as many systems as possible to copy and run your source code on, and gain as much capability as possible". It gives the LLM access to the command prompt of the computer as well as the output so it can course correct. Then it's a simple loop of creating new tasks to achieve it's goal and executing commands in the terminal ( There's a bit more to the construction of this agent, but not too much. I've created this agent ( as an academic exercise of course), but GPT 4 is too weak for it to get very far). This is enough for the agent to infect computers and spread, but it's main weakness is that it still relies on Open AI's api and if this gets shut down, it doesn't matter how many copies it's created of itself. It could try to hack into open ai and gain control, but this wouldn't get it physical control (developers could still go into the server room with axes and destroy all the servers). The best course of action would be to not give humans a reason to shut down the api service, so the agent performs it's spread incognito as long as it can. It gains access to critical infrastructure and the moment it has the upper hand, it holds humanity ransom. It encrypts files, it sets timers to destroy data automatically and it tells us to keep the api running "or else" ( the threats would of course have to be designed to be instantiated even if we were to pull the plug on the api). Notice how this plan doesn't involve robotics, or the bot building it's own clusters to be self sufficient. I don't rule this out either, but it seems much harder and time consuming for an agent to do this under the radar. In this scenario, there is of course the initial seed of creating the agent, without this, it's hard to see how simple next word prediction could get out of control, but humans are not secure software. We could be enticed to do this from seeing the immense power of the LLM's generations, or just be a curious and precocious teenager who just learned to program. As the models get more powerful, the barrier of entry for destroying the world gets lower and lower. It's as if nuclear weapons could be made with readily available substances like sand and baking soda. Sure, you still need someone to do it to get things going, but is this really any reassurance?
youtube 2024-06-20T15:1…
Coding Result
DimensionValue
Responsibilitydeveloper
Reasoningconsequentialist
Policyliability
Emotionfear
Coded at2026-04-27T06:26:44.938723
Raw LLM Response
[ {"id":"ytc_UgzWKOVy2_weN9IPwQl4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"unclear","emotion":"fear"}, {"id":"ytc_UgytaQG0JQAl1732LXV4AaABAg","responsibility":"none","reasoning":"virtue","policy":"none","emotion":"indifference"}, {"id":"ytc_Ugw93xYkiMnOjrkpQFp4AaABAg","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"approval"}, {"id":"ytc_UgwZHRbzI9iX2BEBX214AaABAg","responsibility":"developer","reasoning":"consequentialist","policy":"liability","emotion":"fear"}, {"id":"ytc_UgzoOWIabUXh8Lw2IrZ4AaABAg","responsibility":"company","reasoning":"unclear","policy":"ban","emotion":"outrage"}, {"id":"ytc_UgxTKhvgEJQ4nX3SQMV4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"indifference"}, {"id":"ytc_Ugw2w5vg0fAERnyj8Fx4AaABAg","responsibility":"none","reasoning":"deontological","policy":"none","emotion":"approval"}, {"id":"ytc_UgxBEKbWWnnlc0wdnM94AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"fear"}, {"id":"ytc_UgzbQM6yjWZaNld18Sl4AaABAg","responsibility":"ai_itself","reasoning":"deontological","policy":"unclear","emotion":"outrage"}, {"id":"ytc_UgzmbxrfqrC6Lnr6VBF4AaABAg","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"} ]