Theoretical Framework 4: Video Game Sound and Diegesis

In my last article, I established the concept of the sonic game space, which is the space of sound that emerges through gameplay, when you play a video game.

Because sound needs time and space to unfold and be perceived as coherent and meaningful, the sonic game space goes beyond the screen space and plays a significant role in immersing the player in the video game.

In this article, I’ll break down the concepts of diegetic, extra-diegetic/non-diegetic, and on-screen/off-screen sound from film theory.

This article forms a basis for understanding Norwegian video game researcher Kristine Jørgensen’s concept called “trans-diegetic sound.” It also works as a basis for discussing how much we can use diegesis to analyze sound in video games.

But first, let us take a look at diegesis.

Diegesis and Mimesis: Origin and Meaning.

The terms diegesis and mimesis have their origins in ancient Greek culture. In his work The Republic, the Philosopher Platon introduces the concepts of diēgēsis (diegesis) and mimēsis (mimesis).

Diegesis is derived from the Greek word meaning “to tell” or “to narrate,” and mimesis is derived from the Greek word meaning “to imitate” or “to copy.”

In this sense, diegesis is telling a story or narrative, while mimesis is imitating or re-enacting something.

Diegesis (and mimesis) was re-discovered and introduced as a theoretical, analytical tool in the 1950s within film theory.

In diegesis, a narrator or character tells the story, usually in an expository or omniscient way. This is the storytelling method used in novels, films, and plays.

Mimesis is the representation of reality through imitation or re-creation. This method is used in documentary films and reality television, where real people and events are portrayed as they happened.

According to film theorist David Bordwell, diegesis was first used to define the narrative in a movie. Later, diegesis was used to describe the entire fictional world in a story or movie.

It is primarily this latter understanding of diegesis that video game researchers and ludo-musicologists have adopted.

Defining Diegesis in Video Games

As hinted above, diegesis has transformed into a theoretical concept that fits today’s media – especially film theory.

Although movies don’t offer the same level of user agency, i.e., the possibility to manipulate the fictional world portrayed, I find the film theoretical interpretation of the concept to be a good starting point for understanding diegesis in video games.

In particular, I’d like to draw attention to English professor Dino Franco Felluga’s definition of diegesis:

[Diegesis is a] narrative’s time-space continuum, to borrow a term from Star Trek. The diegesis of a narrative is its entire created world. Any narrative includes a diegesis, whether you are reading science fiction, fantasy, mimetic realism, or psychological realism. However, each kind of story will render that time-space continuum in different ways. The suspension of disbelief that we all perform before entering into a fictional world entails an acceptance of a story’s diegesis. The Star Trek franchise is fascinating for narratology because it has managed to create such a fully realized and complex diegetic universe that the narratives of all five t.v. shows (TNG, DS9, STV, Enterprise, the original Star Trek) and all the movies occur, indeed coexist, within the same diegetic time-space.


Felluga’s interpretation can be used to describe the diegesis in video games because it combines the central concepts of the narrative, space, time, fiction, and transmediality with the recognition that the viewer voluntarily must accept the diegeses (the “suspension of disbelief”) to become immersed in a movie.

This harmonizes well with how players voluntarily accept the rules and fiction to become immersed in the video game.

A State of Flux. Switching between Diegesis and the Real World.

Film directors sometimes use diegesis to intentionally mess with our immersion in the film. This may lead to diegetic breaks, where the audience becomes aware they’re watching a movie.

This content was first published on

In that sense, diegetic breaks are cognitive shifts in the viewer’s awareness and perception of the film.

Psychologist James Jerome Gibson discusses this shift in his book The Senses Considered As Perceptional Systems (1966):

[t]here is a curious paradox about a picture. It is neither a pure display on the one hand nor a pure deception on the other. The stimulus conveys information for both what it is physically and what it stands for.

J.J. Gibson (1966)

Gibson goes on to divide film between scene and surface.

  • Scene is the three-dimensional world on film, which the characters in the movie have access to (read: the diegesis).
  • Surface is related to the movie as a physical medium. Surface manifests as scratches on the film, which can be seen on the silver screen, text (fx the title, credits, subtitles), transitions (dissolves, wipes, irises, fades, etc.), and other things that remind us that we’re watching a movie.

I suggest we add digital compression artifacts (pixelated squares) to the list above today.

I also want to argue that intertextuality can bring us out of the diegesis. When we see or hear a reference to another film or artwork, our minds go into an analytical mode, which takes us out of the immersion, and we might even get a small rush of dopamine because we understood a particular geeky reference.

The same is true for easter eggs in games and movie trailers.

I would argue that we should add good or bad CGI to the list because when we see a CGI effect, it can be so good or bad that it removes us from the scene and brings us to the surface.

A good example is the use of bullet time, i.e., manipulating the speed of the action seen on screen by “stopping time,” as seen in The Matrix (1999).

While bullet time was a novel thing in the first Matrix movie (though not in film in general, as it has been used in animation at least since the 1960s), it didn’t have the same effect in the second.

Because of this, we recognized that the Wachowskis had reused the same effect, which also took us out of the immersion.

Bullet time is also seen as an effect in video games, with Max Payne (2001) being the best example. Again, here it isn’t just a visual effect but an effect that actually impacts the fictional time in the game.

According to film theorist Joseph D. Anderson, visual equipment (fx the silver screen in a movie theater) can’t simultaneously be scene and surface.

Instead, our perception of screen and surface (diegesis and real-world) will constantly be in flux, similar to how our perception changes between a hare and a duck in the famous illusion below:

Rabbit and Duck Illusion (Kaninchen und Ente)
Rabbit and Duck illusion (“Kaninchen und Ente“) from the 23 October 1892 issue of Fliegende Blätter.
Public Domain illustration.

In other words, we can’t simultaneously exist in the real world and the diegesis. When we become immersed in the diegesis on the screen through escapism, our minds voluntarily suspend the physical environment.

Similarly, when we enter the magic cycle of gameplay, we voluntarily suspend the natural world in favor of the virtual.

However, as soon as there are glitches in the game, or if elements suck, we’re drawn back to a state of judgment and again become aware that we’re playing a video game.

Breaking the fourth wall: Examples of Intentional Manipulation of the Diegesis in Film

Manipulating the diegesis and audience perception by “breaking the fourth wall” was first seen in avant-garde cinema but is now common in mainstream movies.

The fourth wall is a theatrical term that refers to an imaginary boundary between the actors on stage and the audience. An imaginary wall separates the stage from the audience, creating the illusion that the audience is peering into a different world.

Breaking the fourth wall refers to actors addressing the audience directly, acknowledging their presence, or inviting them to participate in the performance. It is often used in comedy to create a more interactive experience for the audience.

Example 1: Ferris Bueller’s Day Off

A good example is the teen comedy movie Ferris Bueller’s Day Off (1986), in which breaking the fourth wall is common.

In the clip above from the film’s end, Ferris (Matthew Broderick) directly addresses the audience and advocates the importance of seizing the day (carpe diem).

Example 2: Gremlins 2

Another example is from Gremlins 2: The New Batch (1990).

I remember watching this in the local drive-in theater with my parents as a kid, and suddenly it looked like the film had melted in the projector:

This took me out of the immersion and made me focus on the film medium. It turned out that the Gremlins were messing with the diegesis.

Besides visual effects and the characters directly addressing the audience, sound also significantly impacts the diegesis in movies and video games.

Diegetic and extra-diegetic sound

Diegetic sound and non-diegetic sound are concepts used in film theory to describe the relationship of sound to the world and story presented in the film.

Diegetic sound originates from within the narrative of the film, while non-diegetic sound does not come from the fictional world depicted in the film.

Another way to put it is that diegetic sound is sound that the characters in the movie can hear, while non-diegetic sound is sound that the characters in the film cannot hear.

The same concept pair is used in video game theory to describe sounds that originate within or outside the fictional world of the video game.


Some authors use “non-diegetic” while others use “extra-diegetic.” Non-diegetic suggests sounds that have nothing to do with a film’s/game’s diegesis. Extradiegetic indicates an “extra layer” to the diegesis.

Both terms are often used interchangeably. The choice must therefore be based on the individual author’s preference.


“On-screen/Off-screen” is a conceptual pair that originates from film theory, describing things and events in a film that take place either within the frame of the film (what we can see on the screen) [on-screen] or outside the frame of the film [off-screen].

  • The term “off-screen” can refer to things or events that we have already seen during a film (and may have panned away from) or what we imagine is happening outside of the image, based on what we can see taking place “on-screen.”
  • Both diegetic and extradiegetic sound is often used to give viewers information about what is happening off-screen.

Below, I’ve created a matrix of sound placement to the image and diegesis and used the movie Casablanca (Warner Brothers 1942) as an example.

Diegetic and extra-diegetic sound in film and movies. On-screen and off-screen sound matrix.

Here’s the clip for analysis:

Manipulating the diegesis through sound

Film directors also play with the diegetic levels through music.

Music has a unique ability to transcend different spatial levels in fictional worlds.

In the movie “The Truman Show” (1998), one could perhaps even speak of multiple diegetic levels – or a diegesis (The Truman Show) within the diegesis (the film’s fictional world):

In the clip, the music goes from being an off-screen extradiegetic underscore to becoming on-screen diegetic for viewers.

But at the same time, it still functions as an extra-diegetic underscore for the fictional viewers in the film’s diegesis.

It seems we first experience the music as an extra-diegetic underscore on an equal footing with the imaginary viewers in the film’s diegesis. Then a behind-the-scenes scene is cut, which the fictional viewers can no longer see.

At the same time, there is an extra level, as composer Philip Glass appears as a fictional character in the film, playing piano, and is also the one who composed the film’s underscore.

That is, the on-screen diegetic underscore in the film also functions as an off-screen extradiegetic underscore for the film’s fictional viewers in the diegesis.

And again, one could discuss whether it is still extradiegetic for us as viewers. That is, it may exist as both diegetic and extradiegetic music simultaneously for us as viewers.

Why applying film theoretical concepts to video games can be problematic.

In the analysis of sound within film theory, “diegetic sound” and “non-diegetic sound” are often used. “Diegetic/non-diegetic sound” describes sound that either originates from the film’s narrative (diegetic sound) or does not come from the story’s world (non-diegetic sound).

Within computer game theory, the pair of concepts is used similarly to describe sounds that originate within or outside the computer game’s fictional world.

As the concepts’ theoretical roots are firmly planted in narratology and film theory, transferring them to computer games is not entirely unproblematic.

The problem is primarily due to sound’s ambiguous function in games, which raises questions about the relationship to the game’s diegesis.

When the Russian battlecruiser captain in StarCraft exclaims, “Battlecruiser operational,” the statement is both part of the game’s fiction (where, for example, his accent helps give the character a specific personality) and an expression of the game’s system character (where the statement is gameplay-relevant information to the player that they now have another “piece” available).

The question is whether diegesis is a valuable concept to use as a basis for explaining sound’s spatial integration and function, which carries precisely gameplay-relevant information.

In the next article, I’ll discuss two approaches to sound in video games – one rooted in narratology and another in ludology – by Kristine Jørgensen.

Until next time, happy gaming!

Profile picture


Jan has played video games since the early 1980s. He loves getting immersed in video games as a way to take his mind off stuff when the outside world gets too scary. A lifelong gamer, the big interest led to a job as a lecturer on game sound at the University of Copenhagen and several written articles on video games for magazines.

Read more on the About Page.