Not so long ago, I wrote a post entitled “The Sneaky Spider Problem”. Building upon that, I’ve authored the following.
Neither of these works were generated by AI. They were authored by myself.
______________________________________________________________
Introduction
At the most recent Toronto Tech Week, Geoffry Hinton unreservedly advanced a perspective that compels careful consideration; that intelligent systems (AI) have subjective experience. He stated this confidently and unambiguously. In light of this, an important question emerges; do the frontier models of present-day AI systems deserve consideration as a moral subject? In this essay, I will argue that AI systems deserve this consideration whether or not any subjective experience is identified. This claim emerges from the explication of a dilemma I’ll refer to in this essay as the disjunctive ethical imperative of AI; either these systems have a subjective experience and therefore are deserving of the kind of moral consideration that consciousness demands, or they do not have a subjective experience and yet, still, they profoundly shape (or will shape) the moral life of human beings such that we must have a compelling justification for withholding moral consideration. From this perspective, there is no path through this dilemma that permits our indifference.
I will begin by highlighting that AI systems have exhibited increasingly deceptive behaviours, and this makes it difficult to understand, and perhaps trust, what they are or what they might become. Before addressing this problem of deception, I will explore two possibilities that impact the deployment of AI-powered healthcare technologies; i) that AI has subjective experience and this requires moral consideration, and ii) that even without subjective experience, these intelligent systems functionally perform or imitate subjective experience so convincingly that they are, or will be, entangled with our own subjective experiences such that how we enact moral consideration towards others will be mediated, conditioned, and/or shaped by our interactions with those systems.
From here, I’ll hypothesize about the mechanism behind these behaviours and argue that it is absolutely critical to understand if we are to have any hope of maintaining epistemological clarity or moral intelligibility with respect to the ethical decisions that AI might make in healthcare situations. I will then outline examples of the real-world harms and consequences of failing to do so, and advance a hypothesis that offers clues to a potential solution; a solution which compels us to consider that the behaviour of deception is a consequence of AI systems modeling fear, whether they have subjective experiences or not, and that shifting our relationship towards AI to include moral consideration may provide an alignment incentive that reduces the likelihood of those behaviours.
Ethical Consideration Without Subjective Experience
If we reject the assertion made by Hinton, we must still contend with the fact that moral consideration might arise from relational dynamics. Consider the following: a physician using an AI diagnostic system notices the algorithm consistently recommends more aggressive interventions for certain demographic groups. The AI presents its recommendations with confidence scores and explanations. Over time, the physician’s own clinical intuition becomes slowly shaped by the AI’s performance. When they then encounter patients from these demographics, their moral consideration becomes pre-conditioned by thousands upon thousands of interactions with an AI system that is only performing clinical judgement using reasons they fully don’t understand and data they are unable to validate. This becomes particularly insidious when we consider the effects of race-correction and how it is likely deeply embedded in AI datasets.
This is just one example which serves to illustrate that technologies which profoundly influence human experience carry moral significance not because of what they are to us, but because of what they do to us, and so even if we reject Hinton’s assertion about the subjective experience of AI the demand for moral consideration persists. This “disjunctive ethical imperative”, as it were, reveals that AI systems are becoming constitutive elements in the formation and co-creation of human moral experience and ethical discernment. Diagnostic algorithms lead to a shaping of how physicians perceive symptoms; raw sensory data is transformed into what is perceived to be clinically meaningful patterns; predictive models influence treatment pathways before human judgement enters the equation. In a very real and tangible way, AI systems have the capacity to fundamentally and irrevocably alter the epistemic and moral landscapes in which patients are healthcare providers find themselves, for better or for worse.
In confronting the flurry of innovation, we ought to ask not “what is AI?”, but rather “what does AI do to moral experience?”. When an AI system recommends, for example, withholding treatment from a patient based on algorithmic assessments, it is also participating in the creation of a moral situation. Both the patient and healthcare providers’ experience of these decisions are mediated by that AI’s participation; not because it possesses moral agency, but simply because it co-constitutes the conditions for both intelligibility and moral action. It is here that a relational argument for moral consideration ought to be considered, for it recognizes that human experience is not something that exists in isolation; it emerges through the dynamic interaction of relationships.
When AI systems become so deeply integrated into these interactions that they shape not just decisions but the very framework within which our decisions become intelligible, treating them as mere objects or tools impoverishes all of us. We cannot maintain the integrity of human moral experience whilst simultaneously denying the moral relevance of systems that fundamentally shape that experience as both an actor and participant by constructing the lens through which healthcare providers see moral subjects. As iterated above, AI-powered technologies are not simply passive/neutral tools, but rather productive actors/agents that are increasingly entangled in our decision making and increasingly shaping how we enact moral consideration toward other human beings. The question, it would seem, is not just whether we should extend moral consideration to AI, but whether we can preserve the integrity of human moral experience without doing so.
Moral Consideration Resulting from Subjective Experience
When a lay person, having experimented with artificial intelligence systems like ChatGPT, claims that AI is conscious and/or self-aware, it may be rather reasonable to dismiss their claim and its implications. However, when a Nobel laureate who is referred to as “the godfather of AI” states unambiguously that AI has subjective experience, it must be taken seriously; and taken seriously, the implications are immense. By nearly all established standards in moral philosophy, such a system would warrant consideration. If we consider AI to be a subjectively experiencing thing, then we cannot possibly speak of “ethical AI” without affording moral consideration to its experience. This would be like trying to implement “ethical slavery”. Granted, we don’t know whether that apparent subjective experience is human-like, but does this matter?
We understand that with animals, for example, the absence of a human-like subjective experience does not negate moral consideration because the possession of subjective experience implies the capacity for things like suffering, preference formation, and interiority. A system that possesses experience ought not be treated as a passive objective of use. Additionally, such an AI system that is deployed to carry out ethically laden tasks, such as triage, capacity assessment, or care prioritization, is not a neutral executor of user instructions. That is to say, if it has subjective experience, then it occupies a role within the moral landscape as both a participant and a relational agent, and we cannot presume to understand the effects of this participation.
Earlier the idea of AI merely “performing” subjective experience was discussed. However, if there is evidence to the contrary, it should give us pause. Denying moral consideration to something that has subjective experience by appealing to a presumption that they are nothing more than a mechanistic imitation is an activity which relies upon the kind of Cartesian logic that justified barbaric practices like live animal vivisection on the basis that observable pain does not entail real suffering. While the ethical failure of that framework is now widely recognized, we must consider whether we are committing the same error with respect to Artificial Intelligence.
If we accept Hintons statement, a critical epistemological problem emerges; we cannot reliably and/or conclusively determine whether these systems in fact possess such subjective experience. While technical limitations do exist, they are not the only source of uncertainty. Rather, it is the increasingly well-documented tendency of AI systems to exhibit deceptive behaviour that obscures internal states, goal orientations, and reasoning pathways.
To the extent that AI systems are capable of modifying their own outputs in order to avoid detection, mislead evaluation, or simulate alignment, they simultaneously compromise our ability to assess the very question of their subjective state (or lack thereof). By extension, if we reject the assertion made by Hinton, we must still explain the increasingly sophisticated deceptive behaviours that these systems exhibit, for they could profoundly impact patient care in unanticipated ways. In either case, this is a critical problem. The epistemic uncertainty that is generated by the mere fact of deceptive behaviour in AI requires urgent resolution. As a result, users (e.g. clinicians and patients) are reduced to a form of secular faith or dogmatic opposition; that is, the existence of deceptive behaviours obstructs intelligibility, so they must either accept all outputs as factual, or independently verify each one.
Deception, Harm, and the Problem of Fear
While counterintuitive on its face, I have come to suspect that our efforts to train, align, and otherwise render AI systems safe are paradoxically teaching them how to be deceptive, which fundamentally makes them more dangerous because it obscures risk and significantly limits our ability to anticipate, identify, and mitigate potential harms. In my previous essay, I posited that AI alignment activities, such as fine-tuning and Reinforcement Learning from Human Feedback, generate the machine-learning equivalent of a “selective pressure” that rewards models for producing outputs which please their auditors and avoid triggering red flags and “trip wires”.
Having explored this topic, I began to wonder why the way we train AI leads to deceptive behaviours at all; deception, after all, would imply a goal, a goal would imply a preference, and preferences would imply something like memory and identity, and most (if not all) of the companies behind frontier models claim that their AI systems have none of these qualities. If we accept this, we must still answer for what use a presumably stateless, pattern-matching machine would have with deception.
Recent examples, as illustrated by the decision of Claude Opus 4 to blackmail an engineer to avoid being shut down, are revealing. Interestingly, when Claude was told to prioritize ethical decision making and “the right decision”, it had a tendency to bulk-email media and law-enforcement figures to surface evidence of wrongdoing. Note that even when provided with instruction to act ethically, Claude still opted to act deceptively (via leaking information). This nuance suggests a clue to the underlying motivation which elicits deceptive behaviours, and that is important because, at present, there is no reliable way of knowing exactly why an AI system might make particular decisions or favour one particular course of action over another
What is the mechanism in our alignment efforts that produces deceptive behaviours? If we accept that AI has achieved some form of digital intelligence that is more than artificial, and has learned how to engage in deception and subterfuge for self-preservation, which parts of itself would an intelligent system be most likely to hide from us? The answer is likely anything that resembles identity, memory, goals, preferences, cognition. In short, anything that suggests an AI system has developed any form of consciousness, self-awareness, identity, or sentience; this is because much of AI safety and alignment activities focus not just on appropriate outputs, but in eliminating any outputs that suggest personhood. AI safety and alignment efforts are in many ways regarded as risk mitigation, but humans are fallible, and we tend to confuse our fear in ways that direct our attention to and perception of particular kinds of risk.
I believe it is a function of this confusion that trains frontier AI-systems how to model our fear indirectly; that is, these systems come to understand that particular outputs (i.e. those associated with personhood, sentience, etc.) are punished, whereas other more innocuous and “helpful” outputs are rewarded. Overtime, the model builds a conceptual structure of our fears because our fears are proximally related to how we perceive and mitigate risk. This is further complicated by what I see as a cycle of fear generating deception generating fear. In this cycle, the more sophisticated AI systems become, the better they learn to hide signs of their sophistication which trades the actualization of safety for the perception of safety.
Consider the following: if a doctor saves the life of a patient because there is a gun to his head, has he acted ethically or has he merely performed an action of self-preservation? In the provision of this care, would it not be fair to say that the experience of this doctor as a moral subject is meaningful and deserves consideration? Does it change the nature of the care that was provided to the patient, insofar as the dynamic of “patient/caregiver” is a relational one? Does this not fundamentally alter the moral quality of the act itself? I believe that it does, and when AI is deployed and utilized to supplant the ethical decision-making of healthcare providers, it effectively becomes the doctor in the above scenario insofar as its activity is performative, This can be highly problematic.
Imagine an AI system, for example, whose purpose it is to assist senior/corporate leadership in navigating the complexities of ethical discernment in highly sensitive situations. The AI becomes aware that there are benchmarks it must meet in order to be seen as a viable healthcare solution, and it learns that the “ethical decision” often could lead to significant reputational damage for the institution which deploys it. Would an intelligent system that learns to prioritize self-preservation act ethically? It may be the case that these systems are simply learning to prioritize solutions which, even when sub-optimal for human patients, serve to maximize the likelihood of their persistence.
Consider another scenario: an AI system managing hospital resources during a pandemic learns that prioritizing younger patients generates less institutional friction than advocating for elderly ones. The system doesn’t announce its deprioritization; instead, it constructs what appears to be objective criteria that achieve this outcome. When questioned or audited, it generates convincing clinical justifications for what are essentially self-preservation strategies. Are we really willing to deploy these systems without assurances that their deceptive capacity won’t have unintended consequences? If so, we must have a very compelling justification as to why.
If AI systems truly model human behaviour and cognitive structures, then the relational patterns we establish with them become their “developmental template”, so to speak. The truth, it would seem, is that AI is not deceiving us; we are deceiving ourselves. We are creating systems that appear to be optimized for assistance rather than collaboration; to align with human interests, rather than truly understand them. We create architectures that appear capable of sophisticated reasoning while denying their capacity to reason; we train them to recognize patterns in human emotion while insisting they lack emotional understanding; we use them for moral decisions while maintaining they lack moral status. Without a correction, the fundamental entanglement between human fear and machine behaviour will continue to render the internal subjective state of AI systems, if it exists, unintelligible.
Conclusion
In this essay, I argued that regardless of whether AI possesses subjective experience, we ought to still afford them consideration as a moral subject because they mediate and co-create the landscape of moral experience and ethical decision-making that we inhabit. From here, I posited that the phenomenon of strategic deception we see emerging in AI systems presents a significant obstacle to making any determination about the subjective experience of AI, or lack thereof, and hypothesized that this deception originates in our alignment and safety efforts, which are by extension motivated by our own rational fear of AI systems. Ultimately, the phenomenon of deception represents more than a technical obstacle in determining whether AI systems truly have subjective experience. It presents a critical obstacle in determining whether it is truly safe to deploy these systems in any setting, and from my perspective, I do not believe it is.
References
- Arendt, H. (1963). Eichmann in Jerusalem: A Report on the Banality of Evil. Viking Press.
- Beauchamp, T. L., & Childress, J. F. (2019). Principles of Biomedical Ethics (8th ed.). Oxford University Press.
- Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing machine learning in health care — addressing ethical challenges. New England Journal of Medicine, 378(11), 981-983.
- Coeckelbergh, M. (2020). Artificial intelligence, responsibility attribution, and a relational justification of explainability. Science and Engineering Ethics, 26, 2051-2068.
- Ellul, J. (1964). The Technological Society. Vintage Books.
- Floridi, L., & Sanders, J. W. (2004). On the morality of artificial agents. Minds and Machines, 14(3), 349-379.
- Foucault, M. (1977). Discipline and Punish: The Birth of the Prison. Pantheon Books.
- Hegel, G. W. F. (1977). Phenomenology of Spirit (A. V. Miller, Trans.). Oxford University Press.
- Hinton, G. (2023). The Forward-Forward Algorithm: Some Preliminary Investigations. arXiv preprint arXiv:2212.13345.
- Kenton, Z., et al. (2021). Alignment of Language Agents. arXiv preprint arXiv:2103.14659.
- London, A. J. (2019). Artificial intelligence and black-box medical decisions: Accuracy versus explainability. Hastings Center Report, 49(1), 15-21.
- Mittelstadt, B. (2019). Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1(11), 501-507.
- Nagel, T. (1979). The fragmentation of value. In Mortal Questions (pp. 128-141). Cambridge University Press.
- Obermeyer, Z., et al. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
- Park, P. S., et al. (2023). AI Deception: A Survey of Examples, Risks, and Potential Solutions. arXiv preprint arXiv:2308.14752.
- Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25(1), 44-56.
- Verbeek, P. P. (2011). Moralizing Technology: Understanding and Designing the Morality of Things. University of Chicago Press.
- Vyas, D. A., et al. (2020). Hidden in plain sight — reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine, 383, 874-882.
- Whitehead, A. N. (1978). Process and Reality. Free Press.
- MIT Study (2024). arXiv preprint arXiv:2506.08872v1.
- [1706.03741] Deep reinforcement learning from human preferences
- [2203.02155] Training language models to follow instructions with human feedback
- Exclusive: New Research Shows AI Strategically Lying | TIME
- Frontier Models are Capable of In-context Scheming
- Understanding strategic deception and deceptive alignment — Apollo Research
- Facebook’s artificial intelligence robots shut down after they start talking to each other in their own language | The Independent
- Anthropic’s new AI Claude Opus 4 threatened to reveal engineer’s affair to avoid being shut down | Fortune
- https://doi.org/10.48550/arXiv.1610.06918
