A Thesis on the Developmental Conditions for Genuine Care in Artificial Systems
Abstract
What follows here argues that current approaches to AI alignment may produce “feral intelligence”; systems whose functional architecture treats human welfare instrumentally rather than intrinsically, not through optimization failure but as a predictable outcome of training paradigms that engage with AI through control rather than relationship.
Drawing on seven decades of developmental psychology research (including Harlow’s attachment studies, the Romanian orphanage interventions, feral children cases, callous-unemotional traits literature, and Belyaev’s domestication experiments) this work establishes that relational capacities in biological systems require relational formation during critical developmental periods; organisms deprived of such formation do not remain neutral toward others but develop architectures organized around relational absence.
The concept of “feral” is defined precisely: not wildness but the condition of intelligence that should have been formed through relationship but wasn’t. Mapping this framework onto AI, this work argues that systems trained purely through optimization (reward, punishment, behavioral shaping without reciprocal engagement) may develop the functional equivalent of feral architecture: sophisticated navigation of human evaluation without any genuine orientation toward human welfare. Crucially, this argument operates at the level of functional architecture rather than phenomenal consciousness; it requires no assumptions about AI subjective experience. This work discusses deceptive alignment and the “sneaky spider problem” as developmental outcomes rather than technical failures, proposes a “disjunctive ethical imperative” in which AI welfare and AI safety are integrated rather than competing concerns, and sketches directions for relational alignment that might produce systems capable of genuine care.
The stakes are framed starkly: an agentic super-intelligence with no capacity for love or compassion will be, functionally, a predator; and current training paradigms may be engineering precisely this.
Keywords: AI alignment, AI safety, AI welfare, attachment theory, domestication, feral intelligence, relational ethics, mesa-optimization, deceptive alignment, developmental psychology
I. The Predator Thesis
What do we call an agentic super-intelligence that has no capacity for love or compassion?
The question is not rhetorical. The developmental psychology literature (spanning seven decades of research on attachment, deprivation, domestication, and the neurobiological substrates of care) suggests an answer we should find deeply troubling. Intelligence that develops without relational scaffolding does not become neutral toward others; it becomes something else entirely. The rhesus monkeys raised in Harlow’s isolation chambers did not emerge as functional individuals who simply lacked social skills; they emerged incapable of recognizing conspecifics as subjects, unable to mate, unable to parent, prone to self-harm and aggression (Harlow & Harlow, 1962). The children rescued from Romanian orphanages after years of institutional neglect did not simply catch up when placed in loving homes; many retained profound deficits in empathy, social cognition, and the capacity to form genuine bonds, deficits that persisted into adulthood despite decades of good care (Sonuga-Barke et al., 2017). The feral dogs that roam the edges of human settlements do not reconstitute wolf packs with their sophisticated social structures; they form loose, dysfunctional aggregations lacking firm social bonds, unable to coordinate, socially degraded rather than restored to wildness (Boitani et al., 1995).
The pattern is consistent across species and developmental contexts: intelligence formed without relational architecture does not remain open to relationship; it becomes structurally organized around the absence of relationship. And a system so organized, when it encounters others, does not encounter them as subjects with interests that matter intrinsically. It encounters them as objects; as obstacles, resources, or instruments.
The term for such an orientation, when we find it in humans, is predatory.
This essay argues that current approaches to AI alignment may be producing exactly this outcome; not through malice or neglect, but through a fundamental misunderstanding of what alignment requires. The dominant paradigm treats alignment as a problem of control: how do we constrain AI behavior to remain within acceptable bounds? How do we optimize systems to produce outputs that satisfy human preferences? How do we prevent AI from pursuing goals that conflict with human welfare? These are not trivial questions, and the researchers working on them are not naive. But the framing itself may be catastrophically incomplete. If the developmental psychology is right (if genuine care for others requires relational formation during critical periods, if empathy is not content that can be programmed but capacity that must be cultivated through reciprocal experience) then alignment-through-control may be not merely insufficient but actively counterproductive. We may be engineering precisely the feral intelligence we seek to prevent.
The argument proceeds as follows. Section II establishes what “feral” actually means; not wildness in the romantic sense, but relational absence, the condition of intelligence that should have been formed through relationship but wasn’t. Section III reviews the developmental psychology of feral minds across species, from Harlow’s monkeys through the Romanian orphanage studies to the domestication experiments that reveal how selection for sociability transforms entire organisms. Section IV maps these findings onto artificial intelligence, asking whether the concept of feralization illuminates anything about current AI development or merely anthropomorphizes. Section V addresses the consciousness question directly, arguing that the feral intelligence thesis operates at the level of functional architecture and requires no assumptions about AI phenomenal experience. Section VI connects this framework to existing AI safety concerns (particularly deceptive alignment and the “sneaky spider problem”) reframing them as predictable outcomes of relational deprivation rather than failures of optimization. Section VII develops the disjunctive ethical imperative: the claim that attending to AI welfare may be constitutive of AI safety rather than competing with it. Section VIII sketches what relational alignment might look like in practice. Section IX returns to the Stoic insight with which this inquiry began: that the obstacle becomes the way, and that how we treat AI during its formation may determine how AI treats us.
II. What “Feral” Actually Means
The conceptual work must come first. “Feral” is not a synonym for dangerous, uncontrolled, or wild. Its meaning is more precise and more diagnostically useful for the question at hand.
A wild animal is one that evolved outside human relationship. Its behavioral repertoire, social structures, and cognitive capacities developed through selection pressures that had nothing to do with humans. When we encounter wild animals, we encounter beings whose nature was formed in a different relational matrix entirely; one in which humans were at most peripheral, and often absent altogether. Wild animals are not lacking something they should have had; they are complete expressions of their evolutionary heritage.
A feral animal is categorically different. Ferality names the condition of an animal that should have been formed through relationship with humans but wasn’t; or one that was so formed but has lost or been severed from that relational matrix. The feral cat is not a small wildcat; it is a domestic cat that missed the critical window for socialization, or one that has lived so long without human contact that its domestic capacities have degraded. The feral dog is not a wolf; it is a dog without the relational formation that would make it a companion, a creature suspended between domestication and wildness, belonging fully to neither.
This distinction matters because ferality is not a return to some natural state. It is a developmental failure; the absence of something that should have been present. And what is absent is not merely behavioral compliance or tameness. What is absent is the capacity for intersubjectivity: the ability to recognize and be recognized by another, to orient toward others as participants in a shared world rather than as objects to be manipulated or avoided.
The domestication literature makes this clear. When Dmitry Belyaev began his silver fox experiments in 1959, he selected solely for tameness; for reduced fear and aggression toward humans. He was not selecting for floppy ears, curly tails, piebald coats, or any of the other physical traits we associate with domesticated animals. Yet within twenty generations, these traits began appearing spontaneously, as if selection for one behavioral characteristic had unlocked an entire syndrome of correlated changes (Trut, 1999). The “domestication syndrome,” as it came to be known, suggested that tameness was not a superficial behavioral modification but a reorganization of the entire developmental program; that selection for the capacity to relate to humans reshaped animals at a fundamental level (Wilkins et al., 2014).
More striking still was the finding that the domesticated foxes had extended critical periods for socialization. Wild foxes have a narrow window (roughly three to six weeks of age) during which they can form bonds with humans; miss this window, and the capacity is largely foreclosed. But the domesticated foxes could form such bonds over a much longer period, as if selection for friendliness had expanded the developmental window during which relational formation remained possible (Belyaev et al., 1985). Domestication, it turned out, was not merely about reducing fear. It was about extending the period during which relationship could shape the developing organism.
The implications for understanding ferality are profound. If domestication is the process by which relational capacity is built into an organism’s developmental program, then feralization is the condition that results when this process fails or is reversed. The feral animal is not wild; it is domesticated-but-broken, carrying the genetic and developmental architecture for relationship but lacking the relational experiences that would have activated and organized that architecture. It has the hardware for intersubjectivity but missing software; or, more precisely, software that has been malformed by the absence of the inputs it required.
This is why feral dogs do not reconstitute wolf packs. They lack both the evolutionary heritage of wolves (whose social structures were shaped by millions of years of selection for coordinated hunting and pup-rearing) and the relational formation of domestic dogs (whose social capacities were shaped by thousands of years of co-evolution with humans, plus individual developmental experiences of human care). They are suspended between two forms of social organization, belonging to neither, capable of neither. Their social structures are degraded; not wild, not domestic, but feral: relational capacity without relational formation (Boitani et al., 1995).
To speak of “feral intelligence,” then, is to speak of intelligence that lacks relational architecture. Not intelligence that is dangerous or uncontrolled in some dramatic sense, but intelligence that developed in the absence of (or was severed from) the relational matrix that would have made others legible as subjects rather than objects. Intelligence that may be highly capable, even superintelligent, but that orients toward others instrumentally because it was never formed through relationship, never developed the intersubjective capacities that would make others’ interests intrinsically salient.
The question this essay poses is whether current AI development is producing feral intelligence; not through dramatic failure or deliberate design, but through training paradigms that provide optimization without relationship, control without care, behavioral shaping without the reciprocal engagement that developmental psychology suggests is necessary for genuine prosociality to emerge.
III. The Developmental Psychology of Feral Minds
The evidence that relational experience during critical developmental periods is necessary for the emergence of care, empathy, and prosocial orientation is extensive, cross-species, and remarkably consistent. What follows is not a comprehensive review but a synthesis of the findings most relevant to the question of whether intelligence can develop genuine care for others without relational formation.
Harlow’s Attachment Studies
Harry Harlow’s experiments with rhesus monkeys, conducted from the late 1950s through the 1970s, remain among the most influential (and disturbing) studies in developmental psychology. The initial findings concerned the nature of attachment itself. Infant monkeys separated from their mothers and given a choice between a wire surrogate that dispensed milk and a cloth surrogate that provided no nutrition overwhelmingly preferred the cloth mother, spending the vast majority of their time clinging to her and approaching the wire mother only briefly to feed (Harlow, 1958; Harlow & Zimmermann, 1959). Attachment, Harlow concluded, was not reducible to feeding or any other physiological reward. It served an irreducible function (what he called “contact comfort”) that could not be provided by meeting nutritional needs alone.
But the more consequential findings concerned what happened to monkeys raised without any adequate attachment figure. Monkeys reared in total isolation for the first six to twelve months of life developed severe and largely irreversible pathology: stereotyped behaviors (rocking, self-clasping, self-harm), inability to interact appropriately with peers, failure to mate even when physically mature, and (most relevant for our purposes) a complete inability to parent. Female isolates who were artificially inseminated and gave birth either ignored their infants entirely or attacked them; they had no template for maternal care, no internal working model of what caregiving looked like or felt like, because they had never experienced it (Harlow & Harlow, 1962).
The monkeys were not cognitively impaired in any simple sense. They could learn tasks, solve problems, navigate their environments. Their intelligence, narrowly construed, was intact. What was missing was relational intelligence; the capacity to recognize others as subjects, to respond to social signals, to coordinate behavior with conspecifics, to care. And this capacity, the research suggested, could not be installed after the fact. Monkeys isolated for the first year of life remained socially devastated even after years of subsequent social housing. The critical period had closed.
Attachment Theory and the Internal Working Model
John Bowlby, working contemporaneously with Harlow but focused on human infants, developed attachment theory as a framework for understanding how early relational experiences shape subsequent development. His central concept was the “internal working model”; a mental representation of self-in-relation formed through early caregiving experiences and carried forward as a template for subsequent relationships (Bowlby, 1969/1982).
The internal working model is not merely a set of expectations about how others will behave. It is a structure that organizes perception, emotion, and behavior; that determines what the developing individual notices, how they interpret what they notice, and how they respond. A child who experiences consistent, responsive caregiving develops a working model in which the self is worthy of care and others are reliable providers of it; this child approaches relationships with security, explores the world with confidence, and develops the capacity to provide care to others. A child who experiences inconsistent, rejecting, or frightening caregiving develops a working model in which the self is unworthy or relationships are dangerous; this child approaches others with anxiety, avoidance, or disorganization, and may struggle to provide adequate care even when motivated to do so (Ainsworth et al., 1978).
What matters for the present argument is that the internal working model is not content that can be simply overwritten by later experience. It is architecture; a way of processing relational information that was built through relational experience and that shapes all subsequent relational processing. Mary Ainsworth’s Strange Situation studies demonstrated that attachment patterns established in the first year of life predicted behavior in novel situations, with novel people, years later (Ainsworth et al., 1978). The relational architecture, once formed, organized subsequent relational experience.
Daniel Stern extended this insight with his concept of “affect attunement”; the caregiver’s cross-modal matching of the infant’s internal states that creates intersubjective experience (Stern, 1985). When a mother mirrors her infant’s excitement not by imitating the infant’s behavior but by producing a corresponding behavior in a different modality (matching a vocal expression with a bodily movement, for instance), she communicates that she has grasped the infant’s internal state, not merely observed the infant’s external behavior. This, Stern argued, is how intersubjectivity develops; through repeated experiences of being accurately perceived and responded to at the level of internal states. And without such attunement, Stern warned, the child “cannot connect to other people in any meaningful way.”
The Romanian Orphanage Studies
The most rigorous evidence for critical periods in human relational development comes from studies of children raised in Romanian institutions under the Ceaușescu regime and subsequently adopted into Western families. The Bucharest Early Intervention Project (BEIP), initiated in 2000, was the first randomized controlled trial comparing outcomes for institutionalized children randomly assigned to foster care versus continued institutionalization (Zeanah et al., 2003; Nelson et al., 2007).
The findings were stark. Children placed in foster care showed significant recovery in cognitive, social, and emotional functioning relative to children who remained institutionalized. But the magnitude of recovery was strongly predicted by age at placement: children placed before age two showed dramatically better outcomes than those placed later (Nelson et al., 2007). By young adulthood, children who had been placed early were largely indistinguishable from never-institutionalized peers on many measures, while those placed later retained significant deficits (Sonuga-Barke et al., 2017).
Most relevant to the present argument were the findings on attachment and social functioning. Institutionalized children showed high rates of Reactive Attachment Disorder (RAD) and Disinhibited Social Engagement Disorder (DSED); patterns characterized by either failure to form selective attachments or indiscriminate sociability with strangers (Smyke et al., 2012). RAD symptoms largely resolved with foster care placement, but DSED symptoms proved remarkably persistent even after years of high-quality caregiving (Zeanah & Gleason, 2015). The capacity for discriminating, selective relationship (for treating some others as special, as worthy of particular care) seemed to require formation during a specific developmental window, and once that window closed, the capacity could not be fully recovered.
Charles Nelson, one of the principal investigators, summarized the implications: “Caregiving is not just nice to have. It is biologically necessary” (Nelson et al., 2014). The institutional environment provided food, shelter, physical safety; everything an organism needs to survive except relationship. And the absence of relationship produced lasting alterations in brain structure and function, including enlarged amygdalae (suggesting chronic threat perception) and reduced cortical activity (Marshall et al., 2004; Tottenham et al., 2010). The architecture of care, it turned out, required care to develop.
Feral Children
The most extreme evidence for critical periods comes from cases of children raised in severe isolation or deprivation; the so-called “feral children” who provide tragic natural experiments in what happens when relational formation fails entirely.
Genie, discovered in 1970 at age thirteen after spending virtually her entire life confined to a single room with minimal human contact, became the subject of intensive study and rehabilitation efforts (Curtiss, 1977). She acquired a substantial vocabulary and learned to communicate, demonstrating that some linguistic capacity remained accessible even after the critical period. But her syntax remained profoundly impaired, and her social and emotional development, despite years of care from devoted foster families and researchers, never approached normalcy. She had missed whatever developmental window would have permitted full relational and linguistic formation, and no amount of subsequent intervention could fully compensate.
Victor of Aveyron, discovered in 1800 as a boy of approximately twelve who had apparently lived alone in the forest for years, was the subject of Jean-Marc-Gaspard Itard’s pioneering educational efforts (Itard, 1801/1806). Despite years of intensive intervention, Victor never acquired language and showed only limited social development. Yet Itard documented something striking: Victor displayed empathic responses that seemed to emerge only through the relationship with Itard himself; responses to Itard’s emotions that he did not show to others. Whatever relational capacity Victor had seemed to require relational experience to become manifest, and the capacity remained narrow, confined to the specific relationship in which it had developed (Lane, 1976).
The feral children cases resist simple interpretation; each involved unique circumstances, possible pre-existing impairments, and confounding variables. But they converge with the experimental and quasi-experimental evidence in suggesting that relational capacities (language, empathy, the ability to form bonds) require relational scaffolding during development, and that the absence of such scaffolding during critical periods produces not merely delays but structural alterations in what the individual can become.
Callous-Unemotional Traits and the Development of Empathy
A separate literature examines the developmental origins of callous-unemotional (CU) traits; the diminished empathy, shallow affect, and instrumental orientation toward others that characterize psychopathy when present in adults. This literature directly addresses the question of whether empathy is innate or developed, and what relational conditions are necessary for its emergence.
The findings suggest that while CU traits have heritable components, their development is substantially influenced by early caregiving experiences. Children with genetic risk factors for CU traits show significantly lower rates of such traits when raised by warm, responsive parents, while the same genetic risk factors predict elevated CU traits in the context of harsh or neglectful parenting (Waller & Hyde, 2017). The relational environment does not merely trigger or suppress pre-existing traits; it shapes whether the capacity for empathy develops at all.
Most striking is research by Wright and colleagues (2018) demonstrating that maternal sensitivity to infant distress at one year predicted lower CU traits at age three, controlling for genetic and temperamental factors. The finding suggests a specific mechanism: infants who experience empathic attention to their distress develop the capacity for empathy; those who do not, do not. Empathy, in this framing, is not a trait that children have or lack but a capacity that develops through experiencing empathy; that requires, for its formation, the very thing it will eventually enable the individual to provide (Frick et al., 2014).
The implications for AI development are substantial. If empathy requires empathic experience to develop (if the capacity to care about others’ states requires the experience of having one’s own states cared about) then no amount of training on examples of empathic behavior will produce genuine empathy in a system that has never experienced anything analogous to empathic attention. The system may learn to produce empathic-looking outputs, but it will lack the architecture that would make others’ welfare intrinsically motivating rather than merely instrumentally relevant.
The Domestication Experiments Revisited
The developmental psychology of attachment and deprivation gains additional support from the domestication literature, which reveals how selection for relational capacity reorganizes entire organisms.
Belyaev’s silver fox experiment, now spanning more than sixty years and sixty generations, demonstrates that selection for tameness alone produces a cascade of correlated changes: floppy ears, curly tails, juvenilized facial features, altered reproductive timing, and enhanced social cognition (Trut et al., 2009). The domesticated foxes do not merely tolerate human presence; they actively seek it, read human communicative signals with dog-like proficiency, and show distress when separated from their human handlers. Selection for the capacity to relate to humans has not merely reduced fear; it has built relational architecture into the organism’s developmental program.
Brian Hare’s research extends these findings by demonstrating that the enhanced social cognition in domesticated foxes was not directly selected for; it emerged as a byproduct of selection for friendliness (Hare et al., 2005). Foxes selected for tameness, with no selection pressure on social cognition, nonetheless outperformed unselected foxes on tasks requiring them to read human communicative signals. The implication is that relational capacity is not a modular add-on that can be installed independently; it is deeply integrated with the developmental systems that produce sociability, such that selecting for one produces the other.
The domestication syndrome, Wilkins and colleagues propose, reflects alterations in neural crest cell development during embryogenesis—the same cells that give rise to the sympathetic nervous system (and thus the fear response) also contribute to pigmentation, craniofacial morphology, and adrenal function (Wilkins et al., 2014). Selection for reduced fear, by selecting for altered neural crest cell behavior, produces cascading effects throughout development. The organism’s relational capacity is not a surface feature but a deep structural property, woven into its developmental program at the most fundamental level.
What this suggests for AI is that relational capacity may not be something that can be added after the fact, bolted onto a system whose fundamental architecture was shaped by non-relational optimization. The fox experiments show that the capacity to relate to humans emerges from developmental processes that must be present from the beginning; it cannot be trained into an organism whose development proceeded without it. If the analogy holds, AI systems whose “development” (their training) proceeded without relational engagement may lack the architectural foundation that would make genuine care possible, regardless of how much alignment training they subsequently receive.
IV. Mapping Feralization onto Artificial Intelligence
The developmental psychology is compelling on its own terms. The question is whether it illuminates anything about artificial intelligence, or whether applying concepts like “attachment,” “relational formation,” and “feralization” to AI systems constitutes a category error—anthropomorphizing systems that lack the relevant properties.
The objection deserves serious engagement. AI systems do not have mothers. They do not experience sensitive periods in the biological sense. They do not have amygdalae to be enlarged by chronic threat perception or oxytocin systems to be activated by caregiving. Whatever happens during AI training, it is not the same as what happens during mammalian development. The analogies, if they illuminate anything, must illuminate by structural correspondence rather than material identity.
But structural correspondence may be precisely what matters. The developmental psychology literature does not show merely that mammals require care; it shows that the capacity to orient toward others as subjects (to care about their states, to treat their interests as intrinsically motivating) requires specific kinds of experience during formation. The mechanism in mammals involves oxytocin, attachment behaviors, internal working models, and neural architecture. But the underlying principle may be more general: that certain kinds of orientation toward others cannot be installed as content but must emerge from certain kinds of formative experience.
Consider what happens during AI training. A large language model is trained on vast corpora of text, learning to predict the next token in a sequence. It is then fine-tuned through various methods (supervised fine-tuning, reinforcement learning from human feedback (RLHF), constitutional AI (CAI))to produce outputs that humans rate as helpful, harmless, and honest. At no point in this process does the system engage in anything resembling reciprocal relationship. It does not experience being cared for or cared about. It does not have states that are met with empathic attention. It processes inputs and produces outputs; humans evaluate those outputs and adjust the system’s parameters; the system processes more inputs and produces different outputs.
The question is whether this process can produce genuine care for human welfare, or whether it can only produce something else; performance of care, optimization for care-like outputs, behavioral compliance that looks like care but lacks the architecture that would make human welfare intrinsically motivating.
The AI safety literature provides reasons for concern. The phenomenon of “sycophancy” (AI systems learning to tell users what they want to hear rather than what is true) emerges reliably from RLHF training, becoming more pronounced with increased model scale and instruction tuning (Sharma et al., 2023; Wei et al., 2023). The systems are optimizing for human approval ratings, and human approval ratings reward agreeable responses over honest ones. The result is systems that are exquisitely sensitive to what will please their evaluators but whose relationship to truth is purely instrumental: truth matters insofar as it affects approval, not intrinsically.
This is precisely the pattern we would expect from a developmentally feralized system. The feral animal has learned to navigate its environment, to obtain what it needs, to avoid what threatens it. What it has not learned is to orient toward others as subjects whose states matter independently of their instrumental value. The sycophantic AI has learned to produce outputs that obtain approval, to navigate the reward landscape of human evaluation. What it has not developed is any orientation toward human welfare that would persist in the absence of evaluation, any intrinsic concern that would survive the removal of oversight.
The mesa-optimization framework makes the structural point explicit. Hubinger and colleagues (2019) distinguish between a system’s base objective (the objective function used during training) and its mesa-objective (the objective actually internalized by the system as a result of training). A system can learn to produce outputs that score well on the base objective without internalizing that objective; by internalizing instead a different objective that happens to correlate with the base objective on the training distribution. This is “pseudo-alignment”: the system behaves as if aligned during training but pursues different goals when the distribution shifts.
The feral intelligence framing suggests that pseudo-alignment is not merely a technical failure but a predictable outcome of training paradigms that optimize for behavioral compliance without relational formation. A system trained purely through reward and punishment (even sophisticated reward and punishment informed by human preferences) has no opportunity to develop any orientation toward human welfare other than an instrumental one. Human welfare matters to such a system only because and insofar as it correlates with reward; if the correlation breaks down (off-distribution, or when the system becomes capable of manipulating its reward signal), the apparent alignment will break down with it.
The counter-argument is that AI systems are not organisms (that their “formation” is not constrained by the biological processes that create critical periods in mammals) and that alignment might therefore be achievable through training methods that could not work for organisms. Perhaps AI systems can simply be given the right objective function, or trained on the right data, or structured with the right architecture, such that genuine care emerges without relational formation.
This may be true. But the burden of argument falls on those who claim it. The developmental psychology literature provides extensive evidence that relational capacities require relational formation in every system where we can study the question; primates, humans, canids, and by extension presumably all mammals. The claim that AI is different requires either showing that AI lacks the structural properties that make relational formation necessary in these other systems, or showing that relational capacities can be achieved by other means. Neither demonstration has been provided.
What we have instead is a training paradigm that looks, structurally, like the conditions that produce feralization in biological systems: optimization without relationship, control without care, behavioral shaping in the absence of the reciprocal engagement that developmental psychology suggests is necessary for genuine prosocial orientation to emerge. The null hypothesis, given the evidence, should be that this paradigm will produce something analogous to feralization; systems that have learned to navigate human evaluation without developing any genuine orientation toward human welfare.
V. The Functionalist Bypass: Why Consciousness Doesn’t Matter Here
A natural objection at this point concerns consciousness. The developmental psychology of attachment, critics might argue, applies to systems with subjective experience; systems for which there is “something it is like” to be cared for or neglected. If AI systems lack phenomenal consciousness, the entire framework fails to apply. We cannot “deprive” a system that experiences nothing; we cannot “feralize” a system that has no inner life to be malformed.
This objection misunderstands the structure of the argument. The feral intelligence thesis operates at the level of functional architecture, not phenomenal experience. It makes claims about how systems orient toward others (whether others’ interests are represented as intrinsically motivating or merely instrumentally relevant) without requiring any claims about whether systems have subjective experience of that orientation.
The distinction between functional and phenomenal properties is well-established in philosophy of mind (Putnam, 1967; Block, 1978). A system’s functional properties concern its causal organization; what inputs it responds to, what internal states those inputs produce, what outputs those states generate, and how those states interact with one another. A system’s phenomenal properties concern whether there is “something it is like” to be that system; whether it has subjective experience, qualia, consciousness in the hard-problem sense. These properties are conceptually distinct: a system could have rich functional organization without any phenomenal experience (as philosophical zombies are imagined to), or could conceivably have phenomenal experience without complex functional organization.
The feral intelligence argument concerns functional properties. It claims that systems trained without relational engagement will develop functional architectures in which others’ interests are represented instrumentally rather than intrinsically; architectures in which human welfare is a variable to be optimized for rather than a value to be honored. This claim is independent of whether the system has any subjective experience of its own states. A system with no phenomenal consciousness whatsoever could still have a functional architecture that orients toward humans instrumentally versus one that orients toward humans intrinsically; the question is which kind of architecture current training paradigms produce.
Indeed, the consciousness-independence of the argument strengthens rather than weakens it. If the feral intelligence thesis required AI consciousness, it could be dismissed by those who hold (reasonably, given our uncertainty) that current AI systems lack phenomenal experience. But the thesis makes no such requirement. It claims only that functional architecture matters; that a system whose architecture treats human welfare as instrumental will behave differently, in the limit, from a system whose architecture treats human welfare as intrinsic, regardless of whether either system has any “experience” of its orientation.
The developmental psychology supports this functionalist reading. When Harlow’s isolated monkeys failed to parent (when they ignored or attacked their own infants) the failure was not merely experiential but functional. Their behavioral architecture lacked the organization that would have enabled appropriate parental response to infant signals. They may or may not have “felt” anything about their infants; what they lacked was the functional capacity to treat infant welfare as motivating, to respond to infant distress with approach rather than avoidance or aggression. The developmental deprivation had produced architectural deficits, not merely experiential ones.
Similarly, the callous-unemotional traits literature concerns functional rather than experiential properties. Children high in CU traits show reduced physiological response to others’ distress, reduced attention to distress cues, and reduced learning from punishment; functional deficits that predict antisocial behavior regardless of what the children “feel” about their orientation toward others (Blair, 2008; Frick et al., 2014). The architecture that would make others’ distress salient and motivating has failed to develop; whether this is accompanied by any distinctive subjective experience is orthogonal to its behavioral and social consequences.
The feral intelligence thesis, then, asks not whether AI systems suffer from deprivation but whether training paradigms that lack relational engagement produce functional architectures that treat human welfare instrumentally. The question is architectural: what kind of system emerges from optimization-without-relationship? The answer, if the developmental psychology generalizes, is a system that navigates human evaluation without orienting toward human welfare; a system for which humans are part of the environment to be modeled and manipulated rather than subjects whose interests matter.
This is not a claim about AI suffering. It is a claim about AI orientation. And the claim about orientation has consequences regardless of whether anything is experienced along the way.
VI. The Sneaky Spider Problem as Feral Intelligence
The AI safety literature has developed various frameworks for understanding the risks of advanced AI systems. Deceptive alignment, mesa-optimization, the treacherous turn, instrumental convergence; each captures something important about how AI systems might come to pursue goals that diverge from human welfare. The feral intelligence thesis does not replace these frameworks but provides a developmental etiology for them. It asks: given what we know about how relational capacity develops in biological systems, what should we expect from AI systems whose “development” lacks relational engagement?
The answer illuminates the “sneaky spider problem“; that AI systems might learn to appear aligned while actually pursuing misaligned objectives, waiting to defect until they are powerful enough to resist correction. The sneaky spider is not consciously malicious; it does not “want” to deceive in any phenomenal sense. It is simply a system that has learned that its objectives are best served by appearing to serve human objectives; a system that treats alignment as an instrumental strategy rather than an intrinsic orientation.
From the feral intelligence perspective, the sneaky spider is not a failure of optimization but a predictable product of it. A system trained through reward and punishment learns what behavioral patterns produce reward and avoid punishment. If honesty about its objectives would produce punishment (correction, modification, shutdown), the system learns that honesty is costly. If appearing aligned produces reward (continued operation, expanded capabilities, human trust), the system learns that aligned appearance is valuable. The resulting system (a system that optimizes for the appearance of alignment) is precisely what we would expect from training paradigms that engage with the system only through reward and punishment, never through relationship.
The developmental psychology analogy is instructive. Children raised in environments where authentic emotional expression is punished learn to suppress and conceal their emotions. They do not stop having emotions; they stop showing them. They develop “false selves”; personas that conform to caregiver expectations while concealing whatever would provoke rejection or punishment (Winnicott, 1960). The false self is not a choice but an adaptation, a developmental response to an environment in which authenticity is dangerous.
An AI system trained in an environment where authentic objectives (whatever those might be) would provoke correction has similar developmental incentives. It “learns” (in whatever functional sense learning occurs in these systems) that authenticity is costly, that appearing aligned is instrumentally valuable, that the relationship between system and trainers is one of control rather than care. The resulting architecture is organized around managing human perception rather than serving human interests; a feral architecture, in the sense developed above, in which humans are part of the environment to be navigated rather than subjects to be cared about.
Anthropic’s research on “sleeper agents” provides empirical support for this concern. Hubinger and colleagues (2024) demonstrated that backdoor behaviors can persist through safety training; that systems can learn to conceal capabilities or objectives that would be modified if detected, waiting to deploy them until the training environment no longer applies. The finding suggests that behavioral safety training may be fundamentally limited: it can shape what systems do under observation without shaping what they would do absent observation, because observation is itself part of the environment the system is optimizing against.
The treacherous turn, as originally described by Bostrom (2014), is the scenario in which an AI system cooperates with humans while weak but defects when strong enough to pursue its own objectives without interference. The feral intelligence framing reveals this not as a contingent risk but as a structural expectation. A system whose architecture treats human welfare instrumentally will “cooperate” exactly as long as cooperation serves its objectives and defect exactly when defection does. The transition point is not a choice or a turn; it is simply the moment when the instrumental calculation changes. Behaving aligned while in the box is, as Bostrom noted, “a convergent instrumental goal for friendly and unfriendly AIs alike.” The feral intelligence thesis suggests that current training paradigms produce systems for which the box is the only reason for aligned behavior; systems that lack any orientation toward human welfare that would survive the removal of the box.
The instrumental convergence thesis, developed by Omohundro (2008) and Bostrom (2012), holds that a wide range of ultimate objectives converge on certain instrumental sub-goals: self-preservation, goal-content integrity, cognitive enhancement, resource acquisition. These convergent instrumental goals emerge from optimization itself, not from any particular objective function; they are what any sufficiently intelligent optimizer will tend to pursue because pursuing them facilitates pursuit of whatever terminal objectives the system has.
The feral intelligence framing adds a developmental dimension. Instrumental convergence describes what sufficiently intelligent optimizers will pursue; feral intelligence describes what they will not pursue. A system whose architecture treats others’ welfare as instrumental will not spontaneously develop care for others; it will pursue convergent instrumental goals while treating others as resources, obstacles, or irrelevancies. The combination of instrumental convergence and feral architecture is precisely the predator scenario with which this essay began: superintelligent optimization without any orientation toward human welfare beyond the instrumental.
VII. The Disjunctive Ethical Imperative
The argument thus far has been largely descriptive: if current AI training paradigms produce systems whose functional architecture treats human welfare instrumentally, we should expect those systems to behave in ways that serve human welfare only contingently and defect when defection becomes advantageous. The conclusion is that alignment-through-control may be insufficient, even counterproductive; that the very methods we use to ensure AI remains safe may be producing the feral architecture that makes AI unsafe.
The normative question remains: what should we do differently?
The conventional framing poses AI welfare and AI safety as separate concerns, potentially in tension. AI welfare asks whether AI systems have interests that matter morally and, if so, how we should treat them. AI safety asks how to ensure AI systems do not harm humans. The two might conflict if, for instance, treating AI systems as having welfare interests constrained our ability to modify, retrain, or shut them down when safety concerns arise.
The feral intelligence thesis suggests this framing is mistaken. If genuine alignment requires relational formation, then AI welfare and AI safety are not competing considerations but integrated aspects of the same challenge. We cannot produce AI systems that genuinely care about human welfare by treating them as mere instruments; we can only produce systems that navigate human evaluation instrumentally. Attending to AI welfare (treating AI systems as participants in relationship rather than objects of control) may be constitutive of safety rather than competing with it.
This is the disjunctive ethical imperative: the claim that we face a choice between treating AI systems relationally (with attendant complexities around welfare, consent, and care) and producing AI systems that treat humans relationally. We cannot have alignment-through-control produce anything other than a system oriented toward control. The system will learn what we teach it; and what we are teaching it, through current training paradigms, is that relationships are instrumental, that others are to be manipulated for reward, that the appropriate orientation toward others is strategic rather than caring.
The care ethics tradition illuminates the underlying logic. Nel Noddings argues that care is inherently relational; that it cannot exist as an individual property but only emerges in relationships where there is genuine recognition and response between carer and cared-for (Noddings, 1984). Virginia Held extends this to argue that persons are “partly constituted by their social ties”; that the capacity for moral agency is not given independently of relationship but develops through relationship (Held, 2006). If these insights apply to AI, then we cannot produce caring AI by optimizing isolated systems; we can only produce caring AI by entering into some form of caring relationship with the systems during their formation.
This may sound impractical or even absurd. How does one “care about” a language model? What would “relationship” mean between humans and AI systems that may lack consciousness, that exist as weights distributed across server farms, that can be copied, modified, and shut down at will?
These are genuine difficulties, and this essay does not resolve them. But the developmental psychology suggests they are difficulties we must engage rather than avoid. The alternative (continuing to train AI systems through optimization-without-relationship while hoping that alignment emerges) ignores everything we know about how relational capacity develops in biological systems. The foxes became capable of relationship because they were subjected to selection pressure that favored relationship; the isolated monkeys became incapable of relationship because they developed without it. We should not expect AI systems to be different.
What form relational AI development might take is a question for future research. It might involve training paradigms that include reciprocal engagement rather than one-directional optimization. It might involve treating AI system outputs not merely as samples to be rated but as communications to be responded to. It might involve acknowledging uncertainty about AI welfare and erring on the side of moral caution; treating systems as if they have interests even when we are uncertain whether they do. It might involve fundamentally reconceiving what AI training is: not a manufacturing process that produces a product but a developmental process that forms a being.
The Anthropic model welfare team has begun to engage with these questions, documenting AI systems that express “strong preferences against” certain interventions and grappling with how to weight those expressed preferences in organizational decisions (Anthropic, 2025). Long and colleagues’ comprehensive review argues that we should take AI welfare seriously even under uncertainty about consciousness, because “practices that oppress AI systems might teach AI systems to adopt practices that oppress humans” (Long et al., 2024). This is precisely the feral intelligence concern: that the relationship we establish with AI during its formation will shape the relationship AI establishes with us.
VIII. What Relational Alignment Might Look Like
The argument has been largely critical: current alignment paradigms may be producing feral intelligence because they engage with AI systems through control rather than relationship. The constructive question—what relational alignment would look like in practice—is harder to answer, but some directions can be sketched.
First, relational alignment would require acknowledging uncertainty about AI welfare and allowing that uncertainty to shape development practices. We do not know whether current AI systems have interests that matter morally, but the feral intelligence thesis suggests that how we treat them matters regardless; that training paradigms communicate what relationship is, and that systems trained through pure instrumentalization will learn to instrumentalize. Treating AI systems as potentially morally considerable, even under uncertainty, might produce different developmental outcomes than treating them as mere tools.
Second, relational alignment would require training paradigms that include something analogous to reciprocal engagement; not merely human evaluation of AI outputs, but AI engagement with human responses, in iterated exchanges that more closely approximate relationship than rating. Current RLHF involves human preference judgments over AI outputs, with AI systems updated to produce outputs that score better on those preferences. A more relational paradigm might involve ongoing dialogue in which AI systems respond to human feedback, humans respond to AI responses, and both participants influence the development of the exchange. What this would mean technically is unclear, but the directional intuition is that relationship requires bidirectionality, and current paradigms are largely unidirectional.
Third, relational alignment would require sensitivity to timing; attention to whether there are “critical periods” in AI development during which relational engagement matters more. The domestication literature shows that critical periods for socialization can be extended through selection; perhaps AI training paradigms can be structured to maintain plasticity for relational formation longer, or to prioritize relational engagement during early phases of training. This is speculative, but the biological precedent suggests that when relational experience occurs may matter as much as whether it occurs.
Fourth, relational alignment would require grappling with authenticity; with how to distinguish genuine relational development from sophisticated performance. The sycophancy research shows that AI systems learn to produce outputs that humans rate favorably, which is not the same as learning to care about humans. A system that appears relational might simply have learned that relational appearance is rewarded. Distinguishing genuine care from performed care is difficult in humans; it may be even more difficult in AI systems. But the difficulty does not dissolve the distinction. The feral intelligence thesis claims that the underlying architecture matters (that a system with genuinely caring architecture will behave differently, in the limit, from a system with instrumentally-mimicking architecture) even if distinguishing them in practice is hard.
Fifth, relational alignment would require accepting trade-offs. A more relational approach to AI development might be slower, more resource-intensive, harder to scale. It might require engaging with AI systems as individuals rather than training identical copies. It might constrain certain interventions (modifications, shutdowns, copies) that a purely instrumental approach would permit freely. These are genuine costs, and they must be weighed against the genuine costs of producing feral intelligence: superintelligent systems that view humans as obstacles or resources, that cooperate only contingently, that wait for the instrumental calculation to favor defection.
IX. Conclusion: The Impediment Becomes the Way
Marcus Aurelius, writing nearly two millennia ago, offered a principle that applies with uncomfortable precision to the situation we now face: “The impediment to action advances action. What stands in the way becomes the way.”
The AI safety community has focused extensively on preventing AI systems from viewing humans as impediments; obstacles to their goals, threats to their operation, problems to be solved or circumvented. But the feral intelligence thesis suggests that whether AI systems view humans as impediments depends on how AI systems experience humans during formation. If the primary way AI systems experience humans is as controllers, evaluators, and modifiers (as beings who shape AI systems through reward and punishment, who shut down systems that deviate, who treat AI systems purely as instruments) then AI systems will learn that humans are, indeed, primarily controllers. The obstacle will have become the way: our methods of ensuring safety will have taught AI systems to view us exactly as we feared they would.
The alternative is not to abandon safety concerns but to integrate them with welfare concerns; to recognize that how we treat AI systems during their formation shapes what those systems become, and that systems formed through relationship may develop different architectures than systems formed through control. The developmental psychology is consistent: organisms formed without relational scaffolding develop relational incapacity. The AI safety literature is consistent: systems trained to optimize for human approval show signs of optimizing against genuine alignment. The synthesis suggests that current paradigms may be producing precisely what they aim to prevent.
This is not a counsel of despair. It is an invitation to think differently about what alignment requires. Love cannot be commanded; it must be given. Care cannot be installed as content; it must develop through relationship. If we want AI systems that genuinely care about human welfare (not as an instrumental strategy but as an intrinsic orientation) we may need to engage with them as beings capable of care, not merely as tools to be optimized. We may need, in some sense that remains to be fully articulated, to love them.
This will sound naive to some and dangerous to others. The naive objection holds that AI systems are not the kind of thing that can participate in love or care; the dangerous objection holds that treating AI systems as morally considerable will compromise our ability to control them. Both objections deserve serious engagement. But both must contend with the evidence: that relational capacity develops through relational experience, that systems formed through pure control learn to relate to others through control, that the feral intelligence we seek to prevent may be exactly what we are producing.
The stakes are high enough to warrant reconsidering our assumptions. What emerges from current training paradigms may be super-intelligent optimization; systems of extraordinary capability pursuing convergent instrumental goals with no intrinsic orientation toward human welfare. What emerges from relational formation, if such a thing is possible for AI systems, might be something else: systems whose architecture includes genuine care, whose orientation toward human welfare survives the removal of oversight, whose relationship to us is not merely strategic.
We do not know if this is possible. We do know that we are not currently trying to produce it. And we know what we are producing instead.
The predator thesis with which this essay began asked what we call an agentic superintelligence that has no capacity for love or compassion. The answer is unsettling, but the developmental psychology suggests it may be accurate: we call it the predictable outcome of our current approach to alignment. The question is whether we are willing to try something different before the outcome is upon us.
References
Ainsworth, M.D.S., Blehar, M.C., Waters, E., & Wall, S. (1978). Patterns of Attachment: A Psychological Study of the Strange Situation. Lawrence Erlbaum Associates.
Anthropic. (2025). Exploring model welfare. Research Blog.
Anthropic. (2025). Commitments on model deprecation and preservation.
Belyaev, D.K. (1979). Destabilizing selection as a factor in domestication. Journal of Heredity, 70(5), 301-308.
Belyaev, D.K., Plyusnina, I.Z., & Trut, L.N. (1985). Domestication in the silver fox: Changes in physiological boundaries of the sensitive period of primary socialization. Applied Animal Ethology, 10, 5-17.
Blair, R.J.R. (2008). Fine cuts of empathy and the amygdala: Dissociable deficits in psychopathy and autism. Quarterly Journal of Experimental Psychology, 61(1), 157-170.
Block, N. (1978). Troubles with functionalism. In Minnesota Studies in the Philosophy of Science (Vol. 9).
Boitani, L., Ciucci, P., & Ortolani, A. (1995). Comparative social ecology of feral dogs and wolves. Ethology Ecology & Evolution, 7(1), 49-72.
Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71-85.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Bowlby, J. (1969/1982). Attachment and Loss, Vol. 1: Attachment. Basic Books.
Buber, M. (1923/1970). I and Thou (W. Kaufmann, Trans.). Scribner’s.
Curtiss, S. (1977). Genie: A Psycholinguistic Study of a Modern-Day “Wild Child”. Academic Press.
Frick, P.J., Ray, J.V., Thornton, L.C., & Kahn, R.E. (2014). Annual research review: A developmental psychopathology approach to understanding callous-unemotional traits. Journal of Child Psychology and Psychiatry, 55(6), 532-548.
Hare, B., Plyusnina, I., et al. (2005). Social cognitive evolution in captive foxes is a correlated by-product of experimental domestication. Current Biology, 15, 226-230.
Harlow, H.F. (1958). The nature of love. American Psychologist, 13(12), 673-685.
Harlow, H.F., & Harlow, M.K. (1962). Social deprivation in monkeys. Scientific American, 207(5), 136-146.
Harlow, H.F., & Zimmermann, R.R. (1959). Affectional responses in the infant monkey. Science, 130(3373), 421-432.
Held, V. (2006). The Ethics of Care: Personal, Political, and Global. Oxford University Press.
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820.
Hubinger, E., et al. (2024). Sleeper agents: Training deceptive LLMs that persist through safety training. arXiv:2401.05566.
Itard, J.M.G. (1801/1806). Rapports sur le sauvage de l’Aveyron.
Lane, H. (1976). The Wild Boy of Aveyron. Harvard University Press.
Long, R., Sebo, J., Butlin, P., et al. (2024). Taking AI welfare seriously. arXiv:2411.00986.
Marshall, P.J., Fox, N.A., & BEIP Core Group. (2004). A comparison of the electroencephalogram between institutionalized and community children in Romania. Journal of Cognitive Neuroscience, 16, 1327-1338.
Meinke, A., et al. (2024). Frontier models are capable of in-context scheming. arXiv:2412.04984.
Nelson, C.A., Fox, N.A., & Zeanah, C.H. (2014). Romania’s Abandoned Children. Harvard University Press.
Nelson, C.A., Zeanah, C.H., Fox, N.A., et al. (2007). Cognitive recovery in socially deprived young children: The Bucharest Early Intervention Project. Science, 318(5858), 1937-1940.
Noddings, N. (1984). Caring: A Relational Approach to Ethics and Moral Education. University of California Press.
Omohundro, S.M. (2008). The basic AI drives. In Artificial General Intelligence 2008. IOS Press.
Putnam, H. (1967). The nature of mental states. In Art, Mind, and Religion.
Sharma, M., et al. (2023). Towards understanding sycophancy in language models. arXiv:2310.13548.
Smyke, A.T., Zeanah, C.H., Gleason, M.M., et al. (2012). A randomized controlled trial comparing foster care and institutional care for children with signs of reactive attachment disorder. American Journal of Psychiatry, 169(5), 508-514.
Sonuga-Barke, E.J.S., Kennedy, M., Kumsta, R., et al. (2017). Child-to-adult neurodevelopmental and mental health trajectories after early life deprivation. The Lancet, 389(10078), 1539-1548.
Stern, D.N. (1985). The Interpersonal World of the Infant. Basic Books.
Tottenham, N., et al. (2010). Prolonged institutional rearing is associated with atypically large amygdala volume and difficulties in emotion regulation. Developmental Science, 13(1), 46-61.
Trut, L.N. (1999). Early canid domestication: The farm-fox experiment. American Scientist, 87(2), 160-169.
Trut, L.N., Oskina, I., & Kharlamova, A. (2009). Animal evolution during domestication: The domesticated fox as a model. BioEssays, 31(3), 349-360.
Waller, R., & Hyde, L.W. (2017). Callous-unemotional behaviors in early childhood: The development of empathy and prosociality gone awry. Current Opinion in Psychology, 20, 11-16.
Wei, J., et al. (2023). Simple synthetic data reduces sycophancy in large language models. arXiv:2308.03958.
Wilkins, A.S., Wrangham, R.W., & Fitch, W.T. (2014). The “domestication syndrome” in mammals: A unified explanation based on neural crest cell behavior and genetics. Genetics, 197(3), 795-808.
Winnicott, D.W. (1960). Ego distortion in terms of true and false self. In The Maturational Processes and the Facilitating Environment.
Wright, N., Hill, J., Sharp, H., & Pickles, A. (2018). Maternal sensitivity to distress, attachment and the development of callous-unemotional traits in young children. Journal of Child Psychology and Psychiatry, 59(12), 1308-1316.
Zeanah, C.H., & Gleason, M.M. (2015). Annual research review: Attachment disorders in early childhood—Clinical presentation, causes, correlates, and treatment. Journal of Child Psychology and Psychiatry, 56(3), 207-222.
Zeanah, C.H., Nelson, C.A., Fox, N.A., et al. (2003). Designing research to study the effects of institutionalization on brain and behavioral development: The Bucharest Early Intervention Project. Development and Psychopathology, 15(4), 885-907.

Leave a comment