Intimating Emptiness: July 2018

六耳不同謀
Six sets of ears are different.

from Guin's Verse Commentary on Dōgen's Treasury of the True Dharma Eye (Heine 2020b, 114)

Modes of listening

In this text, I discuss modes of listening. A mode of listening is a heuristic device that calls attention to the fact that music can be actualized in qualitatively different ways–through different modes of experience. A piece of music is not just a constellation of sounds and silences articulated in time, but, perhaps more importantly, a particular way of actualizing these sounds and silences–a particular mode through which they appear as music.

From this perspective, two themes will run through the text: first, that different kinds of music seem to call for different modes of listening; and second, that different people encountering the same piece of music may experience it through different modes of listening. Although deeply interconnected, the first part of the text will focus primarily on the first theme, and the second half primarily on the second.

To listen for what is important

When discussing modes of listening as something that music seems to call for 'from itself,' the theory of modes of listening resembles what Ziff (1964), in relation to different styles of visual art, has called acts of aspection. While it might seem like a detour to begin an essay about music by talking about paintings, I have found Ziff’s theory a useful analogy for thinking about modes of listening.

According to Ziff, paintings in the Venetian style, for example, invite a particular way of seeing that differs from, say, the Florentine school. Venetian art, Ziff explains, "lend[s] itself to an act of aspection involving attention to balanced masses; contours are of no importance" (620). In this style, certain aspects—balanced masses—are emphasized, while others—such as contours—are less important. By contrast, Florentine paintings demand "attention to contours" because "the linear style predominates" (620). In short, paintings from the Venetian and Florentine schools are experienced through different acts of aspection because different features become the primary focus of perception.

Carlson (2008) explains that the idea of acts of aspection points to the fact that different works of art have "different kinds of boundaries and different foci of aesthetic significance and so demand different acts of aspection” (119). Ziff illustrates this by noting that one looks “for light in a Claude, for color in a Bonnard, for contoured volumes in a Signorelli" (1964, 620). The wording here is crucial: for Ziff, these differences are not simply given in the phenomenal world but arise through the way a subject enacts visual phenomena. We do not merely see contoured volumes in a Signorelli—we look for them. Yet this "looking for" is often, as we will see below, unconscious and spontaneous rather than deliberate or effortful.

It may seem, then, as if the painting itself calls for or demands a certain act of aspection, but it would be naive to assume this literally. Despite the automaticity with which we adopt different acts of aspection, the appreciation of art is always a form of practiced attention—a skillful know-how. To enact the meaningfulness that emerges when appearances are placed within a hierarchy of relative importance is not a given; it is an acquired, embodied capacity.

Similar to Ziff’s idea of the viewer conducting their gaze around different foci of aesthetic significance, a musical mode of listening describes the components or aspects of the music that are apprehended and appear simultaneously in consciousness, and how these different aspects relate to one another. Certain elements are distinguished or emphasized in hearing, while others remain peripheral or unnoticed.

In much of Xenakis's music, for instance, we attend to large, moving masses of sound; detailed attention to single sounds is of little importance. In the classical gǔqín repertoire, by contrast, we are invited to linger in the subtle articulation of each sound, attending to its texture, its silence, its resonance. To listen only to the larger gestalts here would be to miss the music’s very point; the experience emerges precisely in the careful, intimate unfolding of detail.

When listening to a traditional symphony, the proper mode of attention requires us to 'ignore' the ambient sounds that surround it. As Saito (2007) observes, "the outside traffic noise, the cough of the audience […] are […] consciously ignored, though they are part of our experience contemporaneous with the symphonic sound" (7). With John Cage’s Number Pieces, however, the relation to ambient sound is reversed. As Haskins describes, the music slowly opens into the space around it, until it no longer stands apart but mingles with the world. It comes to share equal precedence with the surrounding environment; the music "gently envelop[s]" him, until he sees and hears "minute details of everyday life with a fresh, uncluttered clarity" (2004).

These examples illustrate how different kinds of music invite different modes of listening. In one mode, certain aspects are distinguished and emphasized; in another, those same aspects may recede or be treated as peripheral.

Modes of listening as worlds

While there are modes of listening that can be characterized as 'analytical' or 'theoretical'—such as when taking an entrance exam to a music school and transcribing a melody—what interests me here are the 'authentic' modes of listening that are non-conceptual in nature. Wallrup (2012) writes with sensitivity:

"the average listener neither listens philosophically, nor analytically (looking for themes and their development, trying to figure out how the work is harmonically construed). No, a common way of listening is to engage oneself in the musical work as a world, to let oneself be changed by the world emerging in music." (Wallrup 2012, 385)

To listen to music is to be attuned to a musical world. If formulations such as 'acts of aspection' or 'modes of listening' risk sounding like they describe a kind of selective awareness—an interest-driven orientation that seeks to "arrange phenomena into hierarchies of relative importance" (LaFleur 1983, 88)—then speaking of a listening mode as a world acts as a corrective. It emphasizes the non-conceptual, attunemental nature of listening. The musical world is the unobjective 'something other' that, in reciprocal interplay with sound, brings it into presence as music. Because each world is constituted in a distinctive way—possessing its own temporality, mobility, spatiality, and materiality (Wallrup 2012)—phenomena take on different weight and significance: they are always heard from the 'perspective' of that world. Meaningful distinctions thus emerge not from cognitive grids that interpret sense data, but from a non-conceptual attunement to a world. This kind of thinking draws upon Heidegger's notion of the world as given in the following famous passage:

"The world is not the mere collection of the countable or uncountable, familiar and unfamiliar things that are at hand. But neither is it a merely imagined framework added by our representation to the sum of such given things. The world worlds .... World is never an object that stands before us and can be seen. World is the ever-nonobjective to which we are subject." (Heidegger 1971b, 44)

The world of a piece of music is neither the 'mere collection' of sounds nor a 'merely imagined framework' imposed upon them. Rather, it is that which brings sound into presence as music. Wallrup’s use of the term attunement underscores the nondual, pre-reflective character of musical modes of listening: music is a world to which we are attuned for the duration of the performance. As he writes,

"Music is disclosed as a world in a certain attunement or in a combination of attunements, and it is according to the attunement that the listener relates to the musical events. Attunement is that which allows us to perceive different events in this world as events as all: it is according to the attunement that we can describe the musical work and its constituent characters." (Wallrup, 2012, 382)

Green and Ford (2015) equate the Heideggarian notion of world that informs Wallrup's analysis of musical attunement with a piece of music's style (157). The mere aggregation of tones, metres, and scales does not yet amount to music; it requires "something else to become music," and this "something else" is the piece’s style. This idea can be seen as a way of demystifying what Heidegger means by world—which for him referred not to a temporary Stimmung of a piece of music but to something on the level of an era, epoch, or zeitgeist—and of translating the mutual co-creation of world and phenomena into the context of musical listening. On this view, style is not simply the "sum" of musical objects, nor the statistical distribution of patterns, cadences, or other formal traits. Nor is it reducible to superficial differences, such as the sound of the harpsichord in Baroque music or the hammerklavier in Classical and Romantic music, even though the word style might carry such technical connotations for those of us who have sat through many exams in music history. A style is rather the way of being of a world, a pre-reflective mode through which music discloses itself. In this sense, style and attunement converge: both name the pre-reflective condition that allows musical events to appear as meaningful within a world.

Continually unfolding worlds

Modes of listening are not fixed perspectives but are continually unfolding in a dialectical process in which they are constantly re-adjusted and reinterpreted according to new sensory intuitions and knowledge. In this sense, they are not too dissimilar from how the process of interpretation has been described through the hermeneutical circle. This process unfolds both across the span of our lives—as we return to pieces of music and find that our understanding of them has changed—as well as during the very act of listening itself. As the music unfolds, our interpretation shifts, and our mode of listening is continually in flux. A mode of listening is never a fixed perspective. Thinking of it as something that is continually open to re-interpretation—like the hermeneutical circle—sheds light on the following conversation between Xenakis and Feldman, recorded the day after both attended a performance of Feldman’s Trio (1980):

Xenakis: I must tell you that usually I can’t stand such a long piece, but yesterday I could, although it was very late. I could follow the things that you were doing and I was attracted by what I heard. This is a positive thing because when you’re not attracted then you’ll forget it. I was pinned by the sounds and by the preparation of the sounds, which I think is the most important thing you have done. Of course, that comes from the quality of what I heard, including the performance-quality. Except for that chord that I didn’t quite understand. . .

Feldman: The loud chord?

Xenakis: Yes, the loud chord.

Quietness is an important parameter of Trio, crucial in establishing a certain mode of listening. When I listen to this piece, the quietness invites me into a 'detailed' listening in which the sounds are enacted with intimacy. The sudden loud chord on page 24 of Trio—at least 30 minutes into a piece that has been almost entirely pianissimo—arrives as a total surprise, impossible to integrate into the established mode of listening. The burst of loudness breaks my intimate and careful relation to the sounds. At this point, we are confronted with a new sensory intuition that requires us to adjust our mode of listening. But in what way does this loud chord inform such an adjustment? Strikingly, Feldman does not develop or return to the chord; after this sudden intrusion, the music continues quietly, just as before, without a second loud chord.

This is not like the many pieces by Sciarrino that also begin softly but later introduce loud bursts. In those works, we take the new sensation into account and adjust our mode of listening so as not to be startled again. In Sciarrino’s music, this adjustment is positively affirmed by further sounds that make use of the entire dynamic range. The softness was not defining the mode of listening, the piece just began soft, and we erred in believing that the softness was a determining factor for a certain mode of listening.

In the Feldman piece, however, this is not the case: the single loud chord simply exists, unexplained. This is why Xenakis cannot understand it. It does not fit with the mode of listening in which he was absorbed—it does not fit within the musical world to which he was attuned. After the loud chord, what makes most sense for me, when listening to this piece, is simply to 'return' to the previous mode of listening, though this may bring, depending on my attitude, an uneasy sense of listening inadequately or, more positively, the awareness that the piece exists beyond my conventional 'understanding'—that its mysteriousness lies beyond my grasp.

Learning modes of listening

For Ziff, the performance of different acts of aspection was enabled primarily by prior education. Stylistic knowledge of the various 'schools of art', their conventions, and the classification of styles was “of the essence.” Familiarity with these classifications allows us to appreciate art in an adequate manner. To have knowledge of style is to understand where aesthetic significance lies and to be able to focus our experience around it.

In music, we often speak of knowledge of different 'genres' and a similar argument to that of Ziff has been put forth in relation to music by Ola Stockfeldt (2004), who spoke of 'adequate modes of listening'. Stockfelt explains that adequate modes of listening occur "when one listens to music according to the exigencies of a given social situation and according to the predominant sociocultural conventions of the subculture to which the music belongs" (Stockfelt 2004, 91). For Stockfelt, adequate listening is the prerequisite that enables "real communication" between musicians and the audience; it is what makes musical understanding possible. To master a specific adequate mode of listening means that "one masters and develops the ability to listen for what is relevant to the genre in the music, for what is adequate to understanding according to the specific genre's comprehensible context" (2004, 91).

Stockfeldt’s phrasing should not be interpreted to mean that learning to listen to music is merely a matter of 'decoding' a set of stylistic tropes. Listening for “what is relevant to the genre of the music” does not imply classifying the music as if for a music history exam; rather, it means mastering that which allows us to be attuned to a non-conceptual musical world. Yet, in order to attune ourselves to the music’s world, we must 'know' the music in some sense—otherwise, it will appear as meaningless noise.

For both Ziff and Stockfeldt, 'knowing' or 'understanding' art is something that has to be learned. There is, however, a wide spectrum of views about what such learning entails, and to fully grasp what they might have in mind, we must examine different theories of learning more closely.

Language and culture

According to one view, in order to 'understand' a given music and to engage it through an adequate mode of listening, we must be members of the same cultural world in which that music originally arises. Stockfelt appears to argue along these lines, and this position can also be seen as consistent with Heidegger’s, if we transpose his theory of art more generally into the musical domain. When the extra-musical world—the culture and customs that once made an artwork meaningful—has disappeared, the work becomes something for historians or, at best, for our superficial delight. We may take pleasure in such art, as we do in that of ancient Greece, but we cannot perceive it adequately. On this view, learning an adequate mode of listening cannot come about solely through listening to the music itself; it is a skill acquired through belonging to a particular culture.

This argument points to the insight that music does not exist apart from its cultural contexts. Modes of listening are intersubjectively meaningful ways of being in a world. A mode of listening is learned—or rather, appropriated—through participation in an unfolding, intersubjective, and culturally constructed reality. Appreciating this point has important implications for music education. Learning to listen adequately does not, on this view, arise from 'mere listening,' but depends on how listening is framed by the material and linguistic cultures that draw out particular meanings. As Minette Mans writes,

"research and experience have shown that in many cases, it is learners’ investigation and understanding of extrinsic factors that lead to a deeper enjoyment of the music […] It is only by understanding values within a musical culture that true appreciation begins to develop" (Mans 2009, 183).

What has come to be known as the sociocultural perspective is perhaps the school of thought that most strongly emphasizes other human beings as the primary agents in the transmission of modes of listening. From this perspective, it is difficult to discover on one’s own what is worth discerning in each situation; we require access to linguistic descriptions and analyses—in other words, to discourses—and through communication become involved in communities of action and meaning (Säljö 2000, 62). Language is thus regarded as primary. Within this view, for instance, it is the very concept of the tonic that enables us to experience a particular chord as the home tonality of a tonal piece.

Taken to its extreme, the sociocultural vision results in a heavily anthropocentric and language-centric form of social constructionism. In section XLII of the Laṅkāvatāra Sūtra, the Buddha explicitly rejects the view that words give rise to things—that things exist because we name them—and argues that words are not even essential for communication. If social constructionism maintains that the world is created solely through human social interaction, and treats language as the primary agent in shaping this world, it overlooks the agency of mountains, the spring breeze, and musical sounds—agents that, equally empty, illusory, and relational as humans, participate in the co-construction of the world. To many of us, the notion that sounds are somehow secondary to their linguistic or cultural framing when developing the skill to listen adequately seems counterintuitive. Many of us have had deeply meaningful encounters with music far removed from our own cultural background, without access to the 'adequate' cultural framing that the sociocultural perspective deems necessary. Despite its insistence that value can only be discerned through linguistic mediation, we seem capable of apprehending something directly from the sounds themselves.

Sense perception

Other perspectives have argued that the sociocultural view relies too heavily on descriptions of music and neglects its non-verbal qualities and the direct sense perception of sound. Sounds themselves can act as agents, indicating to us what is important to discern. By listening, we participate in a community of praxis with sound. As Clarke (2005) observes:

"Ideologies and discourses, however powerful or persuasive they may seem to be, cannot simply impose themselves arbitrarily on the perceptual sensitivities of human beings, which are rooted in (though not defined by) the common ground of immediate experience." (Clarke, 2005, 43)

Clarke argues that calling a sonority 'tonic' would make no sense to a person who has not already experienced tonal music on which the term can be mapped. Without a basis grounded in immediate experience, it is impossible to appropriate words as meaningful. The sociocultural perspective’s emphasis on language, therefore, tends to disregard music’s sensuous appearing. It implies that any sound can be perceived in any way, depending solely on the discourses that frame it. This leads to the kind of relativism that Tia DeNora (2003) associates with sociologists. In contrast to musicologists, who emphasize sound’s material properties, sociologists focus almost exclusively on linguistic discourses:

"Most sociologists do not bother with the question of music’s specifically musical properties and how these properties may ‘act’ upon those who encounter them. Indeed, sociologists tend to infuriate musicologists when they suggest that musical meaning – music’s perceived associations, connotations, and values – derive exclusively from the ways in which music is framed and appropriated, from what is ‘said’ about it. Musicologists often assume (and in some cases correctly) that this notion overrides any concept of music’s own properties (conventions, physical properties of sound) as active in the process of perceived meanings." (2003, 36)

Yet, moving too far in the direction opposite the sociocultural perspective—toward the approach of DeNora’s musicologists—risks another extreme: determinism. This is the view that anyone who hears a given sound will necessarily experience it in the same way, that the acoustic properties of the sound alone determine the listening experience.

This kind of determinism was famously valorized by the qín player Bó Yá (伯牙), who was active around 300 B.C. For Bó Yá, there was no point in making music unless all listeners experienced it in exactly the same way. In his aesthetics, all listeners—including the musicians themselves—were required to enact the sounds and their meaningfulness in precisely the same manner for the musical situation to succeed. The precision with which Bó Yá and his friend Zhōng Zīqī (鍾子期) could articulate the meaning of sounds—as recounted in the "Tang Weng" chapter of the Liezi (列子)—illustrates the valorization of this kind of determinism:

Bó Yá was a good lute-player, and Zhōng Zīqī was a good listener.

Bó Yá strummed his lute, with his mind on climbing high mountains; and Zhōng Zīqī said:

Good! Lofty, like Mount Tài!'

When his mind was on flowing waters, Zhōng Zīqī said:

Good! Boundless, like the Yellow River and the Yangtze!

Whatever came into Bó Yá's thoughts, Zhōng Zīqī

always grasped it.

(Graham 1960, 109)

In the ideal of listening described here, both listeners—in this case, a performer and an audience member—share the same mental images evoked by the music. To be a good listener is to understand the sounds with great precision, as the meaning of the music is thought to reside entirely in the sounds themselves. The absurd consequence of this deterministic view is illustrated in what happened when Zhōng Zīqī passed away: because Zhōng was the only one who truly understood Bó Yá's playing, Bó Yá destroyed his instrument, believing there was no point in continuing to play without anyone to 'understand the sound' (zhīyīn 知音). If listeners did not share the exact experience, the music was considered meaningless.

While valorizing a deterministic ideal in which every listener should experience the same sounds identically, the story implicitly acknowledges a form of relativism: only a listener of exceptional attunement—Zhōng Zīqī, in this case—could apprehend the full meaning of Bó Yá’s playing. In this way, it upholds an ideal of listening in which perceiving the precise significance inherent in the sounds is paramount, while recognizing that not everyone will do so.

Two worlds

Both the extremes of determinism and relativism are inadequate. What is needed is a middle way that recognizes the relevance of the sensuous qualities of sound, while acknowledging that they do not produce the same experience in every listener. Erik Wallrup (2012) articulates such a middle way by describing how musical attunement emerges from a dialectical interplay between two worlds: the properties of the music itself on one hand, and the surrounding culture on the other. Musical attunement is not determined solely by either world; rather, it arises through their interdependent relationship:

"We seem to have one world constituted by the cultural context of the artwork, and then another world worlding in the same work of art - but they have to do with each other, they cannot be separated. When the world belonging to the work is gone, the world worlding in the work is radically changed" (2012, 339).

Wallrup concurs with Heidegger that when a cultural world changes, the attunements that artworks can facilitate necessarily change as well. Yet he departs from Heidegger in proposing that new attunements may emerge from the encounter between the world within the artwork and a new cultural context. Such attunements give rise to a world distinct from the work’s ‘original’ one—a new interpretation that nonetheless remains meaningful. It need not be, as Heidegger suggests, that we today can experience Greek art only as historical curiosities or as aesthetic ornaments. Rather, these works can continue to serve as genuine sources of attunement:

“Attunement is at work when there is resonance between music and listener and a musical world emerges. However, we must not forget that 'world' is ambiguous. When the world that belongs to the artwork has faded away, the world of the work is changed” (Wallrup 2012, 343).

Affordances

DeNora’s proposal for a middle way draws on the concept of affordances, a term originating in the ecological theory of perception (Gibson & Gibson 1955). Affordances refer to features of the environment that offer relationally constituted possibilities for action and interaction. They neither determine that every perceiver will have the same experience, nor are they unrelated to the material properties of the music. Affordances are opportunities for engagement: they arise not in the material world alone, nor within sentient beings, but in the relation between them. Different species, for instance, experience the same environment in distinct ways because they perceive different affordances. As Varela et al. (2016) explain,

"affordances consist in the opportunities for interaction that things in the environment possess relative to the sensorimotor capacities of the animal. For example, relative to certain animals, some things, such as trees, are climbable or afford climbing. Thus affordances are distinctly ecological features of the world." (Varela et. al 2016, 203)

The idea of speaking about a music’s affordances points toward musical listening as a perceptually guided activity: affordances are potentials for interaction that may or may not be realized. The ecological perspective emphasizes that musical listening is a creative act rather than a state of being "merely receptive" (Schmidt 2016, 140). Affordance structures are not causes or stimuli that elicit fixed reactions, but opportunities for interaction available to culturally embedded beings. As DeNora (2003) notes, music is "something acted with and acted upon" (48). These affordances are not necessarily enacted in the same way by every listener. In this sense, the ecological perspective avoids determinism while still granting primacy to sense perception.

Exploratory perception

That listening is not passive—we do not merely receive sense data from the ‘outside’—but rather a creative, perceptually guided activity is a key idea within the ecological perspective. Gibson characterizes such activity as exploratory: perception actively searches for clarity and for ways of acting optimally within the environment. As Gibson writes, "a system ‘hunts’ until it achieves clarity" (in Clarke 2011, 204). Clarke, who extends Gibson’s framework to music, elaborates that

“Perception is essentially exploratory, seeking out sources of stimulation in order to discover more about the environment and to act optimally within it” (Clarke, 2011, 204).

The description of perception as exploratory closely mirrors a similar formulation by Merleau-Ponty, who famously observed that

“for each object, as for each picture in an art gallery, there is an optimum distance from which it requires to be seen: . . . at a shorter or greater distance we have merely a perception blurred through excess or deficiency. We therefore tend towards the maximum of visibility, and seek a better focus as with a microscope” (1979, 302).

Imagine entering an art gallery to find Monet’s paintings of water lilies. If we stand too close, we can no longer see water lilies at all—only brushstrokes. Some art consists only of brushstrokes; in such cases, perceiving the brushstrokes alone is an adequate mode of perception, as the work does not afford a figurative reading. Monet’s paintings, however, are figurative, and we seek a distance from which the figures can be discerned—a position that provides an "optical grip" on the image. We therefore take a few steps back until we can perceive the painting with maximum clarity. The painting thus affords the act of taking a few steps back. Merleau-Ponty writes that what guides our position in relation to the painting is the mind–body’s attempt to find a way of coexisting with the world:

"My body is geared into the world when my perception presents me with a spectacle as varied and as clearly articulated as possible, and when my motor intentions, as they unfold, receive the responses they expect from the world. This maximum sharpness of perception and action points clearly to a perceptual ground, a basis of my life, a general setting in which my body can co-exist with the world" (1979, 250).

The wording of Merleau-Ponty is crucial here because he speaks of the object as requiring to be seen in a certain way. Phenomenologically, it does not feel as though we are subjects deliberately seeking greater clarity. It is not that we act as agents attempting to master the world by achieving a "maximum sharpness of perception". Rather, it feels as if the object itself invites—or even requires—a way of seeing it with clarity. Yet even this phrasing risks implying a conscious process, whereas it is, in fact, automatic and hidden from reflective thought—a wholly intuitive adjustment. McGilchrist (2021) captures Merleau-Ponty’s view when he describes "perception as a reciprocal encounter…. Experience is a sensorimotor—and intuitive—participation, a fusion of one’s own awareness with awareness in the world", and quotes Merleau-Ponty’s description of encountering the blue of the sky: "I abandon myself to it and plunge into this mystery, it 'thinks itself within me'…" (106).

From the ecological perspective as well, it would be mistaken to assume that the ‘opportunities’ for interaction are consciously perceived as such. The search for clarity in both Merleau-Ponty and Gibson is not a conscious effort but a spontaneous adjustment. In the context of music, it is precisely this effortless and non-reflective attunement to the world that Wallrup, as discussed earlier, so compellingly articulates.

Critiques of Merleau-Ponty and Gibson

In describing the process of finding 'resonance' with the world—the process of achieving the "maximum sharpness of perception and action" that allows us to "co-exist with" and attune ourselves to it—Merleau-Ponty relies on a particular conception of embodiment and sensorimotor perception. Shusterman (2009) has critiqued this view for postulating a "universal and unchanging primordial body consciousness" (139), which fails to account for the subject’s embeddedness in social and cultural contexts. This is likely a well-founded critique, but it would be a mistake to assume that, for Merleau-Ponty, this primordial body consciousness exhausts the process of attunement—or is all that is involved when finding the right distance from which to view a Monet painting.

If that were the case, everyone, by virtue of the same species-specific sensorimotor search for clarity, would locate the same distance from which to view the painting, regardless of its cultural context—something we know not to be true. To emphasize only this sensorimotor search for clarity, grounding it in a universal body consciousness and shared perceptual principles, would lead to a determinism akin to that of the “musicologists” (to use DeNora’s caricature) who believe that the meaning of a piece of music resides solely in its acoustic properties. We would then merely have replaced the sounds with a more sophisticated theory of affordances.

This is, in a sense, the criticism that Varela et al. (2016) direct toward Gibson’s ecological theory. Despite its ecological orientation, Gibson’s model fails to fully articulate the deep relationality of perception. For Gibson, what is perceived by "picking up" or "seeking out sources of stimulation" is not itself fundamentally affected by the nature of perceptually guided action, because it is conceived as invariant:

"The observer may or may not perceive or attend to the affordance, according to his needs, but the affordance, being invariant is always there the be perceived." (Gibson 2015, 130)

In Gibson’s view, affordances are specific to a species and to the particular interests of the subjects who perceive them, yet they are not constructed by those subjects. Rather, subjects are understood to pick out these affordances from the environment. Despite DeNora’s arguments to the contrary, this perspective ultimately downplays perception as a genuinely creative act and thus leads to a subtle form of determinism.

These two critiques of Merleau-Ponty and Gibson also apply to Clarke’s conception of musical listening as ecological. Although Clarke presents his perspective as a middle way between determinism and relativism, in practice he emphasizes the properties (the affordances) of the perceived object and the universal perceptual principles of the subject far more than the "cultural context" he claims to treat as equally significant (2005, 93). The cultural context, in Clarke’s account, seems to function merely as a support for the universal perceptual principles that allow the subject to detect the pre-existing affordances in the music in order to perceive it adequately.

The enactive perspective

As long as one side is taken to be more important than the other, we fail to recognize dependent origination. The perceived arises in interdependence with the perceiver; neither can be said to exist independently. Objects are neither apart from the subject nor identical with it: "[m]atter is no other than mind; mind, no other than matter. Without any obstruction, they are interrelated" (Kūkai, quoted in Hakeda 1972, 229). A corrective to Gibson that fully acknowledges this interrelation and interdependence has been offered by Varela, Thompson, and Rosch (2016 [1991]), who explicitly ground their theory in the Madhyamaka philosophy of Nāgārjuna. In contrast to Gibson’s view of the environment as 'independent', they describe it as enacted. The term enactive signifies that

"cognition is not the representation of a pregiven world by a pregiven mind but is rather the enactment of a world and a mind on the basis of a history of the variety of actions that a being in the world performs." (2016, 6)

Perception is not a matter of detection but arises because "sensorimotor patterns enable actions to be perceptually guided" (Varela et al. 2016, 203). Ultimately, there are no common entities 'out there' in the world that are simply perceived in different ways. This entails abandoning the notion of simple location. Music, accordingly, does not consist of sound waves reaching different individuals who then interpret this acoustic information. Experiences are not subjective happenings in which an interface between mind and body negotiates sensory input to create a representation of the world. As Tashi Tsering explains, "there is no common object of [beings’] respective sensory consciousness" (in Yakherds 2021, 282).

Rather than saying that perceptual appearances vary according to the different ways in which beings mentally construct and process 'the same' items, we can say that they appear differently according to distinct modes of enactment that arise relationally. Neither the 'subject-side' nor the 'object-side' exists independently; both arise dependently. Music is neither contained in the sounds nor in the act of listening, but emerges through a dialectical process in which music is both created by listening and creating listening. Kong Yingda expressed this beautifully:

Music comes from people, yet returns to effect people. This is like rain coming from the mountain yet returning to rain upon the mountain, like fire coming from wood yet returning to burn wood. (in Cook 1995, 13)

Since all things arise interdependently, all things are "empty of any independent intrinsic nature" (Varela et al. 2016, 224). Like the arguments of Nāgārjuna, the theory of enactment entails neither dualism—the view that world and subject are distinct—nor monism—the view that they are the same. Remaining true to the tenor of Nāgārjuna’s thought, Varela et al. contend that because there are no foundations on which to ground the theory of enactment, the theory itself must also be understood as groundless. Even their own concept of enaction is therefore only a "provisional and conventional activity of the relative world", one that points beyond itself toward a "truer understanding of groundlessness" (228).

Hearing something as something and something as nothing

John Cage often spoke of listening to his music in ways that seemed to fall outside any musical mode of listening—outside of what we might call a musical world. He valued music that resembled the experience of not listening to music at all, 'when sounds simply happen'. He loved sounds just as they were, without the need for them to be anything more:

And they say, these people who finally understand, they finally say, 'You mean it's just sounds?' thinking that for something to just be a sound is to be useless… whereas I love sounds, just as they are and I have no need for them to be anything more than what they are. (in Sebestik 1992)

Cage seems to suggest that listening to his music entails a kind of 'unmediated experience of sound'. Yet it is not possible to hear anything unmediated or outside of worlds: we are always attuned. Whether we classify 'hearing sounds as just sounds' as a musical mode of listening or as a mode of everyday listening (see McMahan 2008, 142), it remains part of a world. Cage’s mode of listening, in which sounds are heard as just sounds, is no less constructed than the mode through which we listen to Classical music. Both are forms of praxis—skillful know-hows for enacting phenomena. When sounds are heard in a mode of listening, they are heard as relating to other sounds in certain ways. They are heard as something. This does not imply that all phenomena are heard symbolically or conceptually, but that even hearing them as mere sounds—as in Cage’s music—still grants them a meaningful, though non-conceptual, articulation.

While differing on important points, all the perspectives discussed above—the enactive, ecological, sociocultural, and phenomenological—agree that we always experience phenomena as something. Each rejects the positivist notion that sense experience can be 'immediately' given to us, uncontaminated by past experience or interest. They also share a critique of the realist claim "that there is a way that the world essentially is in itself independent of any conceptual framework and that the mind can know this world" (Thompson 2020). There is no 'pure experience' of ordinary phenomena that is not constructed—for such an experience would resemble white noise, a state in which the world would lose all structure because everything would be equally significant (Marton & Booth 2000, 153). As Wallrup writes, "[a]s soon as we take part in the world of a musical work, the materiality shows itself as a part of that world, and this means that we do not have any unmediated relation to it" (2012, 385). Sounds are always heard as something and as part of a world. If an experience were truly unmediated, it would be absolutely nothing.

The mistaken belief that we can have an unmediated experience of something was popularized in the twentieth century by D. T. Suzuki, who presented Zen art and aesthetics—such as dry-landscape gardens and ink paintings—as pointing toward a 'pure' and unmediated experience (Sharf 1995, 248). This view had historical antecedents in the anti-symbolist tendency of earlier Zen art, where poems and paintings often expressed "the simple recognition of phenomena" and sought to redirect "our focused attention to phenomena for their own sake", with the purpose of “reversing the symbolizing habit of mind” (LaFleur 1983, 23). Suzuki, however, intensified this tendency. As McMahan (2008) shows in his study of Buddhist Modernism, Suzuki advanced the idea that Zen art offers immediate access to a reality completely pure and unfiltered, a mode of representation that "transcends the personal and social" (134). Yet, as the perspectives outlined above argue, such a reality—one existing independently of the personal and social—does not exist. From both the Madhyamaka and the enactive viewpoints, phenomena arise in interdependence with the perceiver; neither object nor subject can be said to exist independently. Because all phenomena are enacted, there can be no pre-given world to which a pre-given mind has unfiltered access.

In Yogācāra terminology, the concept of paratantra is used to describe the basis for all phenomenal appearances. The Trisvabhāvanirdeśa explains this paratantra as "that which appears, in opposition to the way in which it appears" (Williams 2009, 90). The Saṃdhinirmocana Sūtra defines it as "the dependent origination of dharmas, that is, the causal flow" (Williams 2009, 90). That it is the essence of dependent origination means that it does not arise by itself, outside of us, as something we can perceive objectively. When Kasulis describes Zen meditation as something that takes the meditator back to a point "where the specifics of the situation dissolve back into the meaningless flow, the as-ness or presencing", this 'meaningless flow' can at best be the paratantra, but experienced without the imaginary concepts and dualisms that misapprehend this flux precisely as phenomena that exist by themselves—as phenomena with self-nature (svabhāva) that exist outside of us. The meaningless flow is one in which phenomena are seen as what they truly are, and what they truly are is not something that exists outside of us as some kind of pre-given, pure reality, nor something that exists merely as our own solipsistic Vorstellungen—what the Yogācārins call vijñaptimātra. What they are is dependent origination, which is emptiness.

From both Madhyamaka and Yogācāra perspectives, D. T. Suzuki’s notion of a 'pure experience' appears unorthodox. This is even more evident when considered alongside the view of the Zen philosopher Dōgen. According to contemporary interpreters such as Kasulis (2018) and Davis (2011), Dōgen proposed a kind of perspectivism: whatever appears does so from a certain perspective. The 'meaningless flow' realized in zazen is not 'white noise' but a phenomenality enacted through praxis—from within a world. As Davis (2011) summarizes Dōgen’s view, engagement—even awakened engagement—is possible only from a "perspectival opening within the dynamically interweaving web of the world" (6). What 'meaningless' signifies here is an engagement free from self-interest and dualistic perception. From such an attitude, the meaningless flow is not a white noise—a view of nothing from nowhere—but becomes an "infinite resource out of which new situations and new meanings can arise" (Kasulis 2018, 230). Importantly, and as we will see in more detail below, the "new situations and new meanings" that can arise are neither infinite in the conventional sense nor random, for what appears is determined by the process of dependent origination and grounded in praxis.

Freedom

The idea of modes of listening remains valid in art experiences, even though such experiences may feel free. When encountering art, we are not merely receiving information but often feel co-creative in shaping the experience. We are not bound by the artwork to perceive it in a predetermined way; rather, we experience a sense of freedom in our encounter with it. More than asserting a 'pure perception' of phenomena, what seems even more important to Cage was offering the listener a sense of freedom. In the continuation of the passage quoted above, Cage states that he does not want sounds to be heard symbolically:

I don't want a sound to pretend that it's a bucket, or that it's president, or that it's in love with another sound. I just want it to be a sound. (In Sebestik, 1992)

The problem with symbolic perception is that it fixates meaning and turns the listener into someone who merely interprets that meaning. In symbolic music, the listener becomes like Zhōng Zīqī: rather than approaching the art with a sense of freedom, the audience’s role is to decode a message. Yet when we remove this message-decoding function from art, we are not left with a pure perception of objective reality. Instead, space is opened for the free play of the listener’s mind to function spontaneously.

In the writings of Agnes Martin—an artist who valued this sense of freedom—we find a philosophy of art that seems the complete opposite of the ideal of zhīyīn. Martin wrote that "painters can't give anything to the observer / People get what they need from a painting" (1991, 36). Elsewhere, she stated that the cause of a particular response to art "is not traceable in the work. An artist cannot and does not prepare for a certain response" (1991, 18). It would be a mistake, however, to interpret these statements as implying that the way a work looks has nothing to do with what the viewer receives from it. They should not be read as expressions of the kind of relativism that stands in opposition to Bó Yá’s determinism. Rather, what the viewer 'gets' is conditioned by relationally arising affordances. From any non-deterministic perspective, it is true that artists cannot 'give' anything to the observer in the way that Bó Yá 'gave' mental images like mountains and flowing waters to Zhōng Zīqī. In a non-deterministic view, audience members must actively create the experience, yet they do not create it freely, but as agents within an intersubjectively meaningful reality—one in which the sensuous properties of the artwork and the cultural background of the audience co-constitute affordance structures that seem to 'automatically' and attunementally invite certain modes of being.

One reason I believe Martin downplays the role of the painting in the aesthetic experience is her concern that the audience should feel free when they connect with her work. Feeling co-creative in the encounter with art or music is part of what makes these experiences pleasurable and meaningful. Lee Ufan expressed this insight perfectly when he claimed that art objects (in this case literally "painting and sculpture") "are uninteresting if they are like naked words that can be understood by anyone" (2018, 273). According to Lee, Zhōng Zīqī’s listening experience is uninteresting precisely because it resembles the reception of information:

"Works of art must take part in a living dialogue. When a work of art and the viewer are in a relationship that creates resonance between them, the result will be a secret dialogue. Dialogue is not created by works of art with little physicality and no secrecy, which attempt to have general appeal and be accessible to all sorts of people. Viewers are only required to understand them as silly information" (Lee, 2018, 273).

This feeling of freedom is not negated by acknowledging the importance of a work’s affordances. On the contrary, responsiveness to these relationally arising affordances is precisely what enables the experience of freedom. In other words, the notion of 'modes of listening' should not be understood as an attempt to maximize the clarity of some informational exchange. Art is often mysterious, polysemic, and ambiguous—it arouses the imagination. Put differently, secrecy is an essential part of many modes of listening. Yet to be capable of entering into such a 'secret dialogue' with the artwork requires certain prerequisites, and it is these prerequisites that the concept of 'modes of listening' seeks to name.

Hearing music differently

Modes of listening are deeply relational, culturally contingent, and arise nondually from unique histories of enacting and being enacted upon by an 'environment' that exists only conventionally. From such a view, it may seem obvious that different people will experience 'the same' phenomenon (such as a particular piece of music) in significantly different ways: people have different interests, bodies, cultural values, and histories of engagement with the world—all of which shape the kinds of worlds they enact. Dōgen, writing in the thirteenth century, described this vividly:

"Not all beings see mountains and rivers in the same way. Some see water as a jeweled ornament, some see water as wondrous blossoms, and hungry ghosts see water as raging fire or pus and blood. Dragons and fish see water as a palace or a pavilion. Some beings see water as seven treasures or a wish-granting jewel and others see water as a forest or a partition. Some see it as the Dharma nature of pure liberation, the true human body, or the form of the body and the essence of mind." (Dōgen in Heine, 2020)

Lama Shabkar, in his masterpiece Khading Shoklap, likewise sings that

“[appearances] have no other creator,

But appear according to how they are labeled and grasped

Through the habitual patterns and fixations of one’s conceptual thoughts.” (Kunsang 1986, 40)

Like Dōgen, Shabkar emphasizes that what some sentient beings perceive as water, others perceive as nectar, and that "what is light for some is darkness for others" (Kunsang 1986, 39). Both authors were likely drawing on the Yogācāra school’s metaphor of the four views of the same water (一水四見, issuishiken in Japanese). This image illustrates how a deity sees water as bejeweled, a hungry ghost sees it as pus, a fish as a palace, and a human as water. The "what" in the phrase "what is light for some is darkness for others" does not, as explained above, refer to material reality—to the same 'sound waves', for instance. There is no objective 'stuff' out there, such as sound waves, that is subsequently processed by separate mind-bodies. When Asaṅga’s disciple Asvabhāva, in his Commentary on the Compendium of the Great Vehicle, writes that hungry spirits see a stream of water as pus, we should not understand this to mean that there is some shared 'stuff' simply 'interpreted differently' by distinct beings:

"Due to the force of the ripening of their respective karmas, hungry spirits see a stream of water as things like pus. The same thing that is viewed by the hungry spirits appears to animals such as fish as their habitat, and they live there. Humans perceive it as sweet, clear, and refreshing water. They wash with it, drink it, and swim in it." (in Yakherds 2021, 281-282)

The Compendium of the Great Vehicle even takes the fact that different beings experience 'the same object' differently as evidence that there is, in fact, no external object:

"We assert that objects do not exist because hungry spirits, animals, humans, and gods each perceive them differently in accordance with their respective natures." (in Yakherds, 2021, 281)

As the passage from Asvabhāva illustrates, the traditional Buddhist explanation for how different modes of enactment arise is karma. Both Dōgen and Shabkar refer to the fact that our individual karmic propensities—our karma, quite simply—cause us to have different experiences. In Buddhist terms, karma is said to 'belong' to individual streams of consciousness: a single mental continuum (a 'person') acts, and the karmic consequences of those actions ripen within that same continuum. Because sentient beings possess distinct continua shaped by divergent past actions, their worlds of experience unfold in correspondingly different ways.

Limitations to difference

For composers and musicians, the classical karmic framework poses a challenge. While it provides a persuasive explanation for why beings experience different worlds, it is not an ideal starting point for artistic practice. As composers and performers, we are not primarily concerned with how radically distinct everyone’s experience might be. On the contrary, when performing for an audience, our aim is to communicate with many people at once. Ideally, we want everyone to take to heart what we play. We know very well, however, that because of people’s previous experiences, where one person hears melodies, another hears only 'textures'; where some hear rich chromatic harmony, others hear merely 'wrong notes'; where one hears a beautiful microtonal inflection, another hears a note that is out of tune; and where one hears happiness, another hears anguish (see Svensson 2023 for a small-scale empirical study investigating different people's reactions to the same piece of music). Our experiences of talking with others about how they perceive the same music sometimes seem to validate the individualism implied by the theory of karma. Dōgen even wrote that there is no limit to how phenomena can be actualized:

"If we are to inquire into the manner and style of the totality of phenomena, we should know that beyond their being visible as circularity or angularity, there is no limit to the other things the ocean or the mountains can be. We should bear in mind that there are many worlds everywhere." (Dōgen in Kasulis, 2018, 228)

The composer cannot take all these different responses into account when creating a piece. She cannot make something that has the same effect on every listener. To achieve such equivalence, the music would have to be individually adapted to each listener. For the experience to be truly equivalent, the same organization of sounds would not suffice—equality is not the same as equity. Musicians do not possess the skill to communicate equitably in the way that a Buddha or high-level bodhisattva can. As recounted in numerous sūtras, they alone have the power to communicate "to all beings in accord with their mentalities". In the seventh book of the Buddhāvataṃsaka, we hear Mañjuśrī, empowered by the Buddha, proclaim:

"all the Buddhas in the worlds in the ten directions know that the inclinations of sentient beings are not the same, and so they teach and train them according to their needs and capacities. The extent of this activity is equal to the realm of space of the cosmos." (Cleary 1993, 272)

But how is it, then, that Zeami could so confidently speak in Kakyō (花鏡, A Mirror of the Flower) of the master performer as one who has cultivated the ability to take on the audience’s perspective, free from the ego’s own view (我見, gaken)? Zeami writes that "[w]hen you exercise your riken no ken [離見の見, the seeing of detached perception], you are of one mind with your audience" (quoted in Odin 2001, 115). If the audience consists of a multitude of perspectives, how can the performer embody them all? As Yusa explains, "[r]iken no ken is the mental eye by which the actor knows what the audience sees of him and identifies his viewpoint with that of the audience" (1987, 335). Yet how could this be possible if the audience is composed of distinct subjective perspectives that each enact different worlds? This is precisely where Dōgen adds an important caveat:

"Although what is seen may differ drastically according to the one perceiving it, we should not be too hasty in accepting this as absolutely so. Are there really many variable ways of seeing any particular single object?" (in Heine, 2020)

Kasulis observes that for Dōgen, although the present moment is open to infinitely many meanings, there are also "an infinite number of meanings that do not fit the present occasion"; not every interpretation is viable. He illustrates this by comparing the infinite series of integers to the infinite number of decimals that exist between two of them: "Like the domain of real numbers between 2 and 3, the number of possible meanings can be infinite but nonetheless limited." (2018, 232) Variation, then, may be infinite—but it unfolds within boundaries.

The idea that variation in experiences of the 'same' phenomenon occurs within limits is also a key insight of the phenomenographic research tradition. In this field, the aim is to map the full range of qualitatively different ways in which individuals experience a given phenomenon. One might, for instance, play a piece of music and, through qualitative interviews, study the 'outcome space'—the total set of distinct ways in which that piece is actualized as a phenomenon (see Svensson 2023 for a small-scale example of such a study). Typically, phenomenographic research finds that this outcome space is limited. Marton and Booth (2000) note that while, on the one hand, any phenomenon could in principle be experienced in an infinite number of ways, on the other hand, human beings will, paradoxically, experience it in only a limited number of qualitatively distinct ways—regardless of the phenomenon in question (2000, 135).

Marton and Booth thus echo Dōgen’s insight that infinite variation occurs within limitation. Their rationale, however, is different. For Marton and Booth, the limit arises from cognitive constraints: there are only so many aspects of a phenomenon that we can attend to or bring into focus. For Dōgen, by contrast, what defines the boundary of possible interpretations is grounded in praxis and context—in the intersubjective and cultural conditions that shape experience.

According to Kasulis’s interpretation of Dōgen, what makes a particular enaction of phenomena possible is a matter of context and occasion. Even though the possibilities may appear limitless, "we see and grasp only what reaches our eyes in our praxis"—"[w]e realize meaning through complete engagement with the present context" (Kasulis 2018, 230). Because the present context is intersubjectively and socially conditioned, Dōgen’s focus on praxis and contextual appropriateness leads us directly back to the notion of adequate modes of listening. Modes of listening are neither in the music nor in the listener; they arise relationally, grounded in intersubjective praxis. Yet accounting for intersubjective understanding and shared experience becomes difficult—and even unintuitive—within the karmic model, precisely because it retains an ontology of discrete mental streams.

Collective karma

To account for intersubjective experience and culture within the karmic model, there must be some way for different mental streams to affect one another. For this reason, the theory of karma developed to include the idea that phenomenal impressions are not caused solely by individual karma—that what appears to me is not merely the ripening of my own private karma—but can also be caused by other mental streams, provided these are 'suitably linked'. Vasubandhu explains:

"[t]here is mutual determination of impressions through reciprocal influence. [...] Mutual determination of impression occurs among all beings suitably linked by means of reciprocal influence of impressions. 'Mutual' means between one another. Accordingly the distinct impression arises in one mental stream from some distinct impression in another mental stream, not from a distinct external object" (in Siderits 2007, 168).

Siderits explains that two mental streams become "suitably linked" when "the prior histories of each stream [...] have led to certain similarities in present experiences" (Siderits 2007, 170). Hungry ghosts (preta) possess karma so radically different from my own that, thus far, no hungry ghost has managed to interact with me in this life. Animals, however, though their karma also differs, are similar enough for us to share a world and thus be suitably linked. The idea that distinct mental streams can share the same world—rather than existing in isolated, solipsistic bubbles—is grounded in this notion of collective karma. Generally shared karma accounts for the fact that human beings tend to relate to the same objects in comparable ways, and it explains how one mind can act upon and influence another. The greater the similarity between beings, the easier such influence becomes—hence the human mind’s inherently social constitution.

The Yogācāra classic Chéng Wéishì Lùn (成唯識論) describes how different minds are like separate lamps shining together to form a single beam of light that illuminates an intersubjectively shared object of perception. This metaphor explains "how the cognitions of the same things by different sentient beings come to 'mutually resemble' (Chi.: xiangsi 相似) one another" (Brewster 2018, 124). In reality, however, there is no truly 'shared object'—no underlying core or essence being differently interpreted. These variations, these interwoven beams of light, are all that exist; they are not variations of anything. As noted above, there is no objective 'stuff' out there—no sound waves awaiting interpretation by separate body-minds—yet the power of this "reciprocal influence of impressions" is so profound that we experience the world as if there were.

From Yogācāra to Huáyán

Siderits draws attention to a particularly interesting problem in Yogācāra philosophy concerning artifacts such as artworks. Artifacts seem to occupy a liminal position between impressions arising from the 'natural world' and those produced by other mental streams. As noted above, Vasubandhu maintains that there are two sources of impressions:

"In addition to the ripening of karmic seeds, impressions can also be caused in a mental stream by the occurrence of a distinct impression in another suitably linked mental stream." (Siderits 2007, 170)

The natural world is typically explained by the first of these two sources, whereas the experience of other people is explained by the second—for instance, when desire in one mental stream is linked "with an impression in a suitably linked distinct mental stream" (Siderits 2007, 172). This raises important questions about how artifacts should be classified, particularly in relation to art and music. They are neither other people nor the natural world. Siderits’s lucid formulation of the problem is worth quoting in full:

"An artifact like a pot is the result of a desire on the potter's part, so the impressions-only theorist will want to explain our experience of it not in terms of karmic seeds, but in terms of a desire in a distinct mental stream (namely the potter's). But our sensory experiene of the pot isn't confined to just those times when we are 'suitably linked' with that mental stream. We can continue to have a pot experience when the potter isn't around anymore, for instance when the potter has died. Now the hypothesis of karmic seeds was meant to explain how something in the remote past could be the cause of a present effect when everything is momentary. The idea was that the cause produced a seed, which produced another seed, etc., in an unbroken series, until conditions bring about the ripening of a seed to produce an impression. And this makes sense when the remote cause and the seed series and the impression all being to the same mental stream. But it isn't clear how the seed hypothesis could work in the case of our experience of artifacts. The seeds couldn't be in the potter's mental stream, since we can have pot-experiences after that stream has ceased. So did the potter's pot-making desire cause seeds in the mental streams of those who now see the pot? Suppose the pot I see now was made ten years ago. Then the potter's desire would have caused a seed in 'my' mental stream ten years ago, and that seed would have been replicated in an unbroken series up to the present, when I finally have the experiences that count as the ripening conditions (such as the experience we call walking into a ceramins gallery.)" (2007, 171-172).

Siderits’s point is that the Yogācāra theory of karmic seeds becomes extremely complex when artifacts are taken into account. The theory of multiple sensory worlds—each comprising a discrete 'mental stream' with its own storehouse consciousness that can influence others—appears to imply that an artist plants seeds in every mental stream that might one day encounter their work.

Moving away from seeds

While the Yogācāra theory of discrete mental streams and karmic seeds becomes exceedingly complex—and may strike a contemporary reader steeped in scientific materialism as a kind of theoretical fantasy—other perspectives have proposed models of intersubjective experience that retain the notion of discrete streams without invoking karmic seeds. All these non-Buddhist perspectives are similar to Yogācāra in that they preserve the ontology of discrete selves—selves that can interact with or attune themselves to other selves. They are theoretically simpler because they do not need to explain intersubjectivity through the postulation of seeds.

In more recent times, such models have emerged in sociology, social psychology, phenomenology, evolutionary theories, and developmental psychology. Rather than explaining intersubjectivity through the planting of seeds, we saw above how socio-cultural theorists tend to describe it as an inferential process in which communication—and especially language—constructs the shared social world. For others, such as Mead (1934), the foundation of intersubjectivity lies in the capacity to imagine oneself in the role of another, to see oneself from another’s perspective. For evolutionary theorists like Tomasello (2019), shared intentionality—the ability to cooperate with others by forming a collective agent, a we that acts with shared knowledge, moral values, and intentions—can be understood as an evolutionary adaptation unique to certain sentient beings, and particularly pronounced in humans.

For Heidegger, however, collective intentionality is not simply the product of perspective-taking or linguistically mediated coordination. It arises instead from a more direct, pre-conceptual attunement to others. Being-with (Mitsein) is a basic constituent of Dasein, a thought later developed by Di Paolo and De Jaegher’s "interactive brain hypothesis", which proposes that the brain "is primarily an organ of relational cognition" (Varela et al. 2016, xlix). Drawing on Schilbach’s formulation, they suggest that "the contents of mental states (of oneself or another) are experienced via quasi automatic attunement to others" (Schilbach et al. 2006, 727–728, quoted in Di Paolo and De Jaegher 2012, 2).

The preceding overview shows that the problem of intersubjectivity within an ontology of discrete selves has been approached from multiple directions—evolutionary, biological, cultural, and linguistic. Yogācāra can thus be seen as part of a long and diverse tradition of inquiry. Yet other perspectives have argued that only a more radical ontology can truly account for intersubjectivity. According to these approaches, intersubjectivity involves more than inferential processes through which communication and imagination allow one to take the perspectives of others, and more than individual agents attuning themselves to one another.

From the standpoint of dialogism, relational wholes and interactions are instead regarded as the "basic ontological primitives". This view is grounded in the recognition of "the interdependencies between self and others, and the fact that human beings have socially constituted minds" (Linell 2009, xxiv)—minds that exist only relationally, without own-being:

"relational complexes, whose relata cannot be regarded as preexisting entities (e.g., independent speakers, autonomous individual acts, etc.) but must be understood from within the relational interdependencies." (Linell 2009, 15)

By denying autonomous individuals ontological primacy and instead emphasizing interdependencies, dialogism shifts attention away from describing intersubjectivity as subjective happenings within discrete selves. Rather, the field of interdependence itself becomes the ontological ground. This move resonates closely with the Buddhist Huáyán perspective.

According to Kongyin Zhencheng (空印鎮澄), the Yogācāra doctrine of multiple sensory worlds is ultimately subsumed within the relational field of the dharmadhātu (see Brewster 2018). On this view, discrete karmic continuums are situated within a wider relational sphere in which every stream both influences and is influenced by every other, simultaneously containing and being contained by them. Within this Huáyán model, intersubjective experience no longer requires explanation in terms of myriad separate streams interacting through causal links. Instead, it is accounted for through their mutual interfusion and interpenetration. As Chūjin writes, if such interfusion did not obtain, "there could be no cognition" (in LaFleur 1973, 105).

From this perspective, intersubjective experiences of art do not arise because an artist plants karmic seeds in the mental streams of future audiences, but because these streams already mutually contain one another. This relationality can be described negatively as emptiness or, in positive terms, as the non-obstruction of phenomena—shì shì wú ài (事事無礙)—the conditioned origination of the dharmadhātu. The Yogācāra notion of separate mental streams—that the "same world" is "comprised of multiple and discrete sensory worlds" (Brewster 2018, 120)—is therefore only a conventional truth. Ultimately, there is only one mind: the dharmadhātu, which "contains and encompasses the totality of the universe" (Brewster 2018, 125).

This, however, does not mean that the world is a single, unified entity. The positive language of Huáyán should not be taken to imply a monistic holism. Here, the enactive perspective becomes valuable once again. Rejecting both substantialist individualism and monism, it describes the self as fundamentally relational: existing conventionally as a temporary center of self-organization, coupled with and enacted through interaction with the world.

In this sense, the enactive account can be seen as an experiential analogue to the Huáyán vision of interpenetration. While Huáyán expresses this interdependence cosmologically—as the dharmadhātu’s non-obstruction of phenomena—the enactive view frames it phenomenologically, as the dynamic co-arising of self and world in every moment of lived experience. Both perspectives converge on a middle way: neither discrete entities nor an undifferentiated whole, but an ongoing process of mutual becoming.

One mind with the audience

Just as self and world co-arise through mutual enactment, so too do the listener and the sounding world. In music, listening is not the perception of an independently existing sonic object, but a relational activity through which both 'music' and 'listener' emerge together. To speak of modes of listening, then, is not to 'impose' one’s own way of hearing onto others, nor to be normative in the exercise of power—as if asking others to hear as we do. On the contrary, to speak of modes of listening requires a certain freedom from ego: an ability to perceive from an intersubjective standpoint, from within the conditioned origination of the dharmadhātu and the non-obstruction of phenomena. It is, ultimately, an expression of care rather than control. Since we are not Mañjuśrī, our only means of achieving what Stockfelt calls "real communication" are the culturally conditioned modes of listening available to us.

Those who deny the existence of shared modes of listening are, paradoxically, the true egoists—despite their claim to humility in 'not wanting to impose' their way of hearing on others. Such individualists refuse to make any claim about how their music might be perceived, imagining this to be a gesture of freedom, yet what they relinquish is not control but care. In avoiding saying anything about how other people might perceive their art, they come across as creating music only for themselves—uninterested in communicating or helping others through their art. In response to the question of whether he thinks about his audience, the composer Michael Pisaro offers a characteristically modern and cynical reply: such thinking "only leads to market research and commercial music, ultimately" (Pisaro & Dougherty 2018). This attitude rests on a view of persons as isolated islands perceiving private realities. In truth, however, no one is self-established; we are all mutually dependent and co-emergent.

Xenakis, in the conversation with Feldman quoted above, takes a very different stance from Pisaro and regards the issue as a non-problem: we are all "made of the same stuff", he says, and therefore one need "never to think" about the audience’s perception. This 'same stuff' is relational emptiness—one mind. It is precisely because we are all made of the same interrelated 'stuff', which is nothing other than emptiness, that we can, as Zeami suggested, take on the perspective of the audience. Being of one mind with the audience means being attuned to the collective and intersubjective nature of music. Ford and Green, who reinterpreted Heidegger’s concept of "world" in terms of musical style, express a similar view:

"...musical experience is neither "inner", of the "soul" or "spirit", or absolutely individual. Rather the reverse, for pieces do not throw listeners into inwardness, but rather open them out to a nonconceptual world which, whilst registered individually, is also collective. So, rather than having individual control over music, we offer ourselves up to musical experience within the freedom of a collective style. This idea is in accord with Kant's grounding of aesthetic judgement in universal subjective validity, though, and this is most important, with "universal" substituted." (2015, 163)

According to Ford and Green (2015), being attuned to a "collective style" means being attuned to intersubjectivity itself. The thirteenth-century Confucian poet Yán Yǔ (嚴羽) placed this idea within a soteriological framework: for him, poetic enlightenment (wù 悟) was achieved through the assimilation of a collective style—the complete internalization of an orthodox tradition. The 'tradition' is what is intersubjective; to internalize it is to transcend the ego and thereby to master riken no ken. The awakened poet in Yán Yǔ’s theory attains a state in which "subjective self, medium of communication, and objective reality become one" (Lynn 2004, 216), and it is from this state that true artistry arises. To hear, as Zeami emphasized, from a perspective free of ego is to allow one’s work to communicate fully; and for Yán Yǔ, this ego-free perspective necessarily involves the assimilation of an intersubjective tradition.

Through Yán Yǔ’s emphasis on a shared tradition, we return to Stockfelt’s initial observation that stylistically informed, adequate listening is the prerequisite for 'real communication' between musicians and their audiences. It is important, however, not to interpret either Yán Yǔ’s orthodox tradition or Stockfelt’s modes of listening as forms of cultural conservatism. Their insights do not lead to a poetics that valorizes the stagnation of styles or the establishment of fixed rules for musical creation. Musicians and composers continually invent new styles and new modes of listening. What matters is that these emergent styles arise from being of one mind with the audience. When creation is informed by the insight that self and world are interpenetrating aspects of a single process, what is created cannot help but be intersubjectively grounded.

As many artists know, hearing without ego is a skill that must be cultivated. In this sense, the act of composition becomes an act of self-cultivation. To become a better composer is to become more deeply attuned to the intersubjectively and relationally constructed nature of reality—to cast off the ego and enter into emptiness. Composition thus becomes a site for exploring the union of self and world, a practice through which their relationality is revealed. As we develop as composers, we come to understand that self and world are interpenetrating aspects of a single process, and that our artworks arise from this total interpenetration.

Music in mono-cultures and transcultural music

In the traditional Confucian view of music, as articulated in the Record of Music (樂記), musical practice was understood not only as something enabled by intersubjectivity but also as an active force in cultivating it—a means of shaping harmonious social relations and shared values. Music was thus seen as creating intersubjectivity. Lǐ Zéhòu (2010) writes that music "caused the natural human emotions to become socialized" and describes how it possessed the power to govern and "directly shape and mold humanized emotions" (26). Within this tradition, music was essential to the preservation of interpersonal and societal harmony: as Park (2015) notes, we "learn to synchronize our emotions to others through music" (127).

Unlike in many Buddhist contexts, where music was often considered subordinate to language (see "Music and Buddhist Monastics"), Mencius emphasized that rhythmic movement and sound affect people more deeply than words. "[W]ithout realizing it," he wrote, "one’s feet begin to step in time to them and one’s hands dance according to their rhythms" (Park 2015, 126). Music thus exerts an immediate, attunemental influence on the human being—one so profound that “moral emotions cannot be stopped” (Park 2015, 127).

The Confucian insight into the intersubjective value of music finds striking echoes in contemporary anthropological and psychological theories. Summarizing current research, Henrich (2020) notes that by "moving in step with others", as in ritual practices involving music, "the neurological mechanisms used to represent our own actions and those used for others’ actions overlap in our brains". This overlap, he explains, "blurs the distinction between ourselves and others, leading us to perceive others as more like us and possibly even as extensions of ourselves" (76). Furthermore, synchronous patterns in ritual "cause all participants to feel similarly", since synchrony generates a wealth of mimicry cues, and "mimicry is one of the tools we use to help us infer other people’s thoughts and emotions", thereby creating what Henrich calls a "virtuous feedback loop" (76–77).

While such effects may occur through any synchronized action, the rhythmic coordination of feet, hands, and voices in music infuses this synchrony with an especially unifying mood, attuning all participants in a shared affective field. To illustrate this, Henrich quotes the anthropologist Lorna Marshall’s description of a communal dance ritual:

"...Whatever their relationship, whatever state of their feelings, whether they like or dislike each other, whether they are on good terms or bad terms with each other, they become a unit, singing clapping, moving together in an extraordinary unison of stamping feet and clapping hands, swept along by the music." (Marshall 1999, 90, quoted in Henrich 2020, 77-78)

The Confucian tradition therefore placed great importance on the proper establishment of music. When the state instituted appropriate ritual music, "all members of the state [would] be able to share its historical and cultural value" (Park 2015, 127).

The Confucian theory of music rests on a profound insight into intersubjectivity: our world is a shared world in which attunement can be more effective than words. Yet, from the theoretical standpoint developed in this text—which takes the paradigmatic listener to be the contemporary concert-goer rather than the ritual dancer—it attributes too much agency to the music itself. Transposed into the modern context of more 'passive' listening, the Confucian belief that certain musical properties will affect all who encounter them in the same way—producing identical moral and emotional outcomes—sits uneasily with our own concert experiences. From this perspective, it seems improbable that music could work as effortlessly as The Record of Music suggests, automatically synchronizing people to the same moral sentiments and values.

We may be willing to accept that music can attune a group to a shared feeling during the duration of a ritual dance, yet we hesitate to claim that such attunement extends to moral transformation. Henrich might argue that our skepticism simply reflects the biases of a modern, Western, individualistic worldview. Yet it is also true that, as Kasulis (2018) notes, in Confucian literature "the descriptive is inextricably linked to the prescriptive" (353): the is and ought are conflated, and the seemingly descriptive statements in the Record of Music are best read as normative—as articulations not of how things necessarily are, but of how the authors believed they ought to be.

Yet the idea that music exerts a similar effect on different listeners is not unique to the Confucian tradition. In the rich tradition of Indian aesthetic theory, we find a striking acceptance that people necessarily experience a work’s rasa—the 'taste' or attunement evoked by art—in comparable ways. As Pollock (2016) observes,

"for Indian aesthetics, there really is no disputing in matters of taste, not because each reader has his own in accordance with the relativist-skeptical stance of modernity, but because all readers have, ideally, the same." (Pollock 2016)

Because all audience members are assumed to respond identically to a rasa, classical Indian aesthetics never developed a theory of how such responses are learned. Yet aestheticians recognized that not everyone possesses the same capacity to appreciate art. This variation, however, was not attributed to differences in education or cultural background but to differences in innate predisposition. Viśvanātha (c. 1350) wrote that certain individuals, by virtue of "merit acquired in a former existence", are born with a "superabundance of sensitivity" that renders them especially receptive to the savoring of rasa.

For Abhinavagupta, it is indeed possible to become a more refined audience member, but this is not achieved by learning a set of conventions for interpreting art. Rather, it is a process of becoming a clearer mirror—one capable of reflecting the meanings already inherent in the work. Abhinavagupta writes that a person "whose heart is by nature like a spotless mirror" and whose mind is "no longer subject to the anger, confusion, craving, and so on typical of this phenomenal world" will manifest rasa "with absolute clarity" (Pollock 2016). By "polishing" the mirror of the heart, the spectator heightens their sympathetic responsiveness to art and deepens "their capacity to recognize rasa" (Pollock 2016, 208).

Both the classical Confucian theory of music and the Indian rasa theory are deterministic to some degree: the former in assuming that meaning resides in the music itself, the latter in presupposing a society so homogeneous that everyone would enact the same meaning by virtue of shared values and experiences. If we are so inclined, we might look back with a certain nostalgia on such mono-cultures, imagining what it must have been like to live in a world where the reception of art posed no problem—where art was believed to move people directly, and where there was no possibility of hearing 'the wrong rasa'.

In contemporary times, we inhabit a multicultural rather than a monocultural landscape, with increasingly few shared cultural reference points. In response, many modern musicians have sought to move beyond culturally specific listening practices by engaging with what seem to be more universal or elemental materials—such as the aesthetics of 'materiality' or just intonation, focusing on 'sound itself' rather than on complex, culturally inherited tonal tropes. Others have turned toward accessibility, as in the popularity of minimalist music, which through static sound worlds and slow processes of repetition ensures that the music is immediately here and captures the listener’s attention. These approaches may foster a more immediate attunement to the listening modes their works invite, in contrast to the learned and sometimes arbitrary stylistic conventions characteristic of particular musical traditions.

In the pedagogical context, John Paynter similarly argued for a kind of universal approach when, in the 1980s, he critiqued how music education in British public schools—often centered on Western classical music—alienated students whose home environments differed from the 'majority culture' on which such education was based. Paynter observed that a more genuinely democratic environment emerged only when music education drew inspiration from the avant-garde: engaging with noise, graphic scores, free improvisation, and non-stylistically determined experimentation. He called for composition to be grounded not in inherited styles but in the "common ground we all occupy in being able to hear sounds at all" (1982, 114). By cultivating a musical expression 'indicative of no culture', he believed we might approach something that speaks to what is shared by all humans.

It may seem reasonable to assume that music grounded in such universalist values would more effortlessly attune listeners than other, more culturally specific forms. Much of Xenakis’s music, for instance—with its focus on raw energetic processes and its rejection of traditional harmonic language—might appear to have a more immediate impact than the intricate harmonic idioms of late Romanticism. Yet such music is by no means automatically graspable, for all modes of listening are constructions contingent upon complex cultural and historical processes. The belief that by emancipating music from tradition one can create something truly cross-cultural and universal remains, ultimately, a modernist fantasy.

Even music that appears to operate with the most 'elemental' materials—noise, texture, pure tone—presupposes listeners already capable of engaging with such sounds as music. The ability to hear these sounds aesthetically, rather than as meaningless or even unpleasant noise, is itself a culturally conditioned skill. Stockfelt’s notion of adequate listening is therefore as relevant here as in more overtly stylistic traditions: each musical practice, no matter how minimal or 'universal' it claims to be, establishes its own conditions for meaningful engagement. In this sense, the modernist project of creating culture-free music unwittingly creates new cultural conventions—new grammars of attention, new expectations of purity, austerity, and immediacy. What presents itself as a universal language of sound is thus only another historically situated mode of listening.

Recognizing this does not diminish the value of seeking communication across cultural boundaries; rather, it clarifies what such communication entails. To speak of modes of listening is not to describe how beings construct their own private versions of reality, nor to prescribe how they should construct their experiences. It is, instead, to speak of how we create—and are created by—this phenomenal world together, continually.

As Dōgen writes, "Mountains, rivers, and earth are born at the same moment with each person" (2013, 129). The natural world of mountains and rivers is not an individual projection; it is intersubjectively shared. Yet Dōgen reminds us that each person participates in its creation, that these mountains and rivers depend on each person's being. They come into existence with each person. At the same time, he cautions that "when a person is born, this person's birth does not seem to be bringing forth additional mountains" (2013, 114). The implication is subtle: individuals neither create nor fail to create the mountains and rivers. Both are true.

This is the nature of intersubjective enaction—a collective process that lacks any fixed ontological ground, the kind of relationality that Buddhist thinkers call emptiness. Modes of listening, too, are born in this way: neither wholly individual nor wholly collective, but arising through the mutual conditioning of listeners, sounds, and worlds. To think about them is therefore not to seek a solution to a problem, but to recognize the nondual and relational character of being itself. In this light, as Xenakis suggested, it truly is a 'non-problem'.

If you see the world clearly—its mountains, rivers, and sounds—you also see into the minds that co-arise with it.

Wildflowers bloom at the old garrison,

The travelers echo in the empty woods.

Much rain in spring on the wood plank houses;

Daylight soon turns to darkness in the mountain town.

Cinnabar Stream connects with the old Guo borders;

White Feather reaches to Jing Peak.

If you see the fresh scenery of this western hill,

You'll know the minds of Huang and Qi.

(Wang Wei, trans. P. Rouzer, 2020, 69)

Intimating Emptiness

Tuesday, July 10, 2018

Like rain from the mountain

Report Abuse

Labels