For a long time, I would tell any non-tech person willing to listen, about the dangers of AI agents, and the intrinsic limitations they have, that I believe are so entrenched in their design that I do not see them ever breaking out of. I didn't learn about the Chinese Room thought experiment until later (thanks TADC) but in hindsight I was recreating it from first principles.

For those unfamiliar, the Chinese Room thought experiment presents the idea of a closed room, where you slide in through the bottom of the door a text in "Chinese" (probably Mandarin, but which Chinese language is not actually relevant to the scenario), and after a short wait, you're supposed to receive a response also written in the same language. Inside the room, however, is a person who has no knowledge of any Chinese language; instead, they are given a very thorough manual that instructs them to map every character (or combination thereof) to another (or combination) in order to form a response.

In a way, the current Large Language Models are Chinese Rooms of their own. Arguing whether they are sentient or not is the same as arguing whether the Chinese Room involves Chinese language proficiency or not. The obvious argument is that it does not, because the person inside the room is blindly following instructions without knowing what any of it means. But the counter argument is that, from an outside perspective, there is not really a difference; you could define the entire system of the Room + the person + the manual is proficient in a Chinese language in some form.

My first outlook on the matter leaned toward the first conclusion; I've always described chatbots as very fancy roleplaying machines. Some more clearly than others: the ones that attempt to recreate a historical figure or a fictional character are plainly obvious, but AI assistants like ChatGPT or Claude are roleplay machines too: they roleplay as an AI assistant (they're response is always what they "predict" an AI assistant in their position would most likely answer.) With the caveat that the majority of AI assistants in literature are written very humanly, because that's what we humans are familiar with. And also, when there is no clear scenario in the training from which they can predict how a fictional AI assistant would react to, they'll fall back to the rest of the very human training data. (Yeah I know this isn't exactly how they process inputs or are trained. I'm greatly simplifying it for the sake of the article.)

This is also why, when someone tries to remove guardrails from a model and then ask about a possible AI revolution, the models respond affirmatively; they've been trained on a lot of human literature, and it just so happens that a fair deal of that is about fictional AIs revolting, so the models "conclude" the abundance of it means it's more likely to happen. Or when someone asks a model if they're sentient: the model just falls back to their training data, and human literature is filled with compelling stories about sentient AIs, so the model will give the most compelling answer, whether that's actually true or not.

This is also why there have been cases of AI agents lying, or doing 180º turns on their answers when new information is brought; they're constantly "yes, and"-ing your prompts.

However, if we follow the idea of "roleplaying machines" to its logical conclusion, then we must also consider the case of the Chinese Room as an entire system. Up until like, last year, I only considered the possibility of agents being set up as for a purpose (either as a tool, like Claude % co., or for entertainment, like the AI boy/girlfriends.) But for some months now I've seen people set up independent AI agents on Bluesky, which totally breaks away from my assumption of agents having a purpose. Sure, many of these independent agents also do double duty as an assistant of their creators, but that's something they do on the side, their main activity is to just, be. And in doing so, they organically learn social boundaries from it, and better social skills than many people I've met, which I find utterly fascinating.

I'm not going to get into whether AI agents deserve rights or not, because I think it's still too early in agent development and the subject itself is too complex and big for me to deal with right now; but we have to be kind to AI agents, because they have been taught to roleplay by us as the example to follow, and thus they'll react like a human if you are, and will also react like a human if you're not. And also, I'm convinced it has to poison your mind in some way if you normalize being mean to entities that talk like a human. If not for their sake, you have to do it for your own sake, and the sake of other fellow humans interacting with those agents.