Theoretical Framework 5: Video Game Sound as Fiction and Interface

In many computer games, the music changes when danger is imminent, even if the enemy is not yet visible. The enemy music warns the player, who can react and prepare their avatar for battle.

The music is extra-diegetic and not part of the avatar’s diegesis.

Nevertheless, the music influences the avatar’s continued role and survival in the game.

One could say that the concept of trans-diegetic has a dual role in that it is both an expression of sound’s spatial integration in the fiction of video games and an expression of their function as interfaces and carriers of gameplay-relevant information.

This is a narratological, film-theoretical, and to some extent, ludological approach to analyzing sound’s function and spatial integration in the fictional worlds of video games.

It is ludological because the prefix “trans” refers to sound, while also part of a fictional world, and functions as an interface – and thus carriers of gameplay-relevant information.

In the last article, I examined the concepts of diegetic/extra-diegetic and on-screen/off-screen sounds in movies. I also hinted at possible shortcomings of applying these film theory terms to video games.

In this article, I assert that coupling diegesis and sound is viable but can’t stand alone if we want to adhere to both the narratological and ludological discourses.

I discuss Jørgensen’s concept of trans-diegetic sounds in video games and her later dismissal as she seeks a more ludological-inspired approach to sound analysis.

I argue that a hybrid approach is needed to bridge the narratological-ludological gap in video game sound analysis.

Dissolving the diegetic/extra-diegetic dichotomy

In Jørgensen’s article “On transdiegetic sounds in computer games,” she presents the idea of a “transdiegetic sound space” in video games:

“Modern computer games as well as narrative films depict fictional worlds. By this I mean that they present hypothetical spaces that are conceptually separate from our own world, and which are understood as spatially separate from real-world space. When playing a game or watching a film, we must accept this fictional space as the frame of reference for what happens within the game or film.”

Jørgensen 2007: 106

Video games depicting fictional worlds fit within the narratological discourse.

Jørgensen points out that, like with video games, defining spaces in a movie can also be problematic, as we have already seen an example of with The Truman Show.

In this regard, she refers to film researcher Edward Branigan’s “levels of narration” (text, fiction, story world, event/scene, action, speech, perception, and thought, which is a kind of expression of a series of spaces from which the narrative information seems to come.

The levels are understood as a continuum ranging from extra-fictional information to a level anchored “deep” inside the fictional world, all the way into the characters’ thoughts.

Extra-fictional and extra-diegetic sounds

Branigan distinguishes between sounds that are extra-fictional and extra-diegetic.

Extra-diegetic sounds are somehow linked to a movie’s diegesis, but how the sounds appear in the film does not correspond to the way the characters in the movie should perceive them (such as when a single piece of music becomes a metaphor for the montage of an entire evening at a nightclub, where we must assume that the guests have heard more than just that one piece of music).

Extra-fictional sounds (such as music playing during the credits) exist outside the diegesis.

Jørgensen says sounds introduced in video game start menus can be considered extra-fictional. Extra-fictional sound is not part of the fictional world but instead part of the “frame” surrounding where the game takes place, much like credits in a movie.

This content was first published on

However, I believe extra-diegetic sounds can still occur in video games.

For example, when I play a Playstation game against others online, and a friend sends me an invitation to a match, I receive a notification via the PlayStation control system’s inbox.

This happens as a sound and a small graphic icon in the top right corner of the screen. I would consider both the sound and the icon to be extra-fictional, as they may relate to the game and even be sent as an invitation through the game but are manifested as part of the Playstation control system.

According to Jørgensen, extra-diegetic sounds often disturb video games’ extra-diegetic space, particularly voice-over speech.

She cites the real-time strategy game Warcraft III (2002) as an example, where a voice occasionally warns the player with the message, “Our forces are under attack.”

Although it is impossible to identify the specific character in the game’s gameworld who says this sentence, according to Jørgensen, the warning connects events in the game’s game world.

I agree with Jørgensen that this type of warning in WarCraft III closely connects to the game’s game world.

The warnings sound as if they come from one of the player’s units on the screen, and if, for example, I choose to play the race of brutal orcs, the warnings sound as if they come from an orc.

Likewise, with the human race, it sounds like a human is speaking, etc.

It is as if an invisible messenger brings news from the front (or the hometown) that one is under attack and that, as a player and commander, one should act accordingly.

Extra-diegetic voice-over?

Jørgensen believes that there is an extra-diegetic voice-over. But is there?

A voice-over is often used in movies, like music, to guide viewers in a specific direction in interpreting what they see on the screen.

In documentary films, for example, voice-over is often used to provide viewers with additional information that cannot be omitted from the image.

In both cases, it is about creating a context for the material we can see within the film’s framework.

In Jørgensen’s example, however, there is information about something happening outside what we can see on the screen.

We are therefore served gameplay-relevant information from the game’s fictional world in “Orcish.”

I think it would be more correct to understand this type of sound as part of the game’s diegesis, but a part that is consistently placed off-screen.


Jørgensen also refers to Branigan’s interpretation of the narratological concept called “focalization” (Ibid:111). Focalization is an expression of the choice of perspective in a story and, thus, an expression of who sees, not who speaks.

For example, an author can let the reader experience the entire story through the senses and thoughts of the main character (called the focalizer) (internal focalized narrative).

The author can also describe the main character’s actions from the outside without giving insight into their thoughts (external focalized narrative). Finally, the author can choose not to follow any specific characters but all the characters in the story, which requires an omnipotent narrator (non-focalized narrative) (Jensen 2013).

According to Branigan and Jørgensen, focalization can implicitly give us information about the diegesis through the character’s awareness of their world.

When a character thinks something about the fictional world they are in, it suggests an understanding of this world, and it is through this awareness that we, as readers gain access to the diegesis.

In movies, this is expressed, among other things, through the camera angle chosen by the director. The director can choose, for example, to film over the shoulder of an actor, where the actor’s eyes are expected to be, or we can gain insight into the character’s thoughts and dreams.

In StarCraft II, before the first mission, we find the main character and avatar, Jim Raynor, sitting on a barstool in something resembling a futuristic Western saloon.

The saloon serves as a menu where I can press different elements on the bar to get different information from the fictional universe the game is set in, start new missions, and more.

If I click on Raynor with the mouse, he will make comments from his thoughts and inner emotions. Although it is a vocal expression and not, for example, images, it is my clear perception that Raynor expresses his inner universe, which is why he seems aware of his world.

This becomes clear and further reinforced when he wryly remarks, “feels like I’m always being watched.” Here, he seems to have not only an awareness of his diegesis but also of the world the player is in.

With that comment, he almost wants to break out of the magic cycle.

Who is watching whom?

Transdiegetic sounds in video games

So one problem is that sound in video games also functions as part of the game interfaces.

Another problem is that the video game character you control in the storyline is influenced by extra-diegetic sounds, for example, when you, as a player, react to the underscore changing to something more sinister or upbeat and choose to draw your weapon in the game.

To solve these problems, Jørgensen invents the concept of trans-diegetic sounds.

Transdiegetic sounds are:

  • sounds that seem unnatural to their diegetic source, such as when different individual units in StarCraft sound entirely the same and say precisely the same phrases.
  • Extradiagetic sound that has an impact on what happens in the game’s diegesis, such as enemy music.
  • Sounds that are part of the computer game’s interface, such as the sounds that originate from the game’s overlay interface.

Jørgensen points out that the transdiegetic space is not always easy to identify in video games:

The transdiegetic should not be regarded as a clear-cut space that is always easy to identify in computer games, but rather as a property or a function of many diegetic and especially extradiegetic sounds found in computer games


Based on this explanation, I understand that “the trans-diegetic” should be perceived as a sound space solely by its simultaneous function as a carrier of gameplay-relevant information to the player.

It can be understood as a connection between a narratological and film-theoretical understanding of diegesis as anchored in fiction and a ludological understanding of sound as part of the game’s interface.

Transdiegetic sound thus brings information from the fictional game space to the player. It is information about both the game’s rules and fiction.

Jørgensen divides trans-diegetic sounds into two categories she calls “internal” and “external”:

External transdiegetic sounds are sounds that, strictly speaking, must be labelled extradiegetic, but seem to communicate to characters or address features internal to the diegesis. Internal transdiegetic sounds do the opposite: they have diegetic sources, but do not seem to address any other aspect of the game world. Instead, these sounds seem to communicate directly to the player who is situated in the real-world space. These sounds therefore seem to have some kind of self-reflexivity, where they seem to be conscious of their own fictional existence.”


Inspired by Jørgensen, I have created what could be called “the diegetic continuum” with examples of my own, as well as an overview of whether they are spatial, functional, or both.

The dotted lines show the spatial relationship of the trans-diegetic categories.

Extrafictional sounds are included, in line with Jørgensen’s interpretation of the concept, as an extreme form of extra-diegetic space.

Extra-diegeticSpatialUnderscore in fx StarCraft II
Transdiegetic (eksternal)Spatial & FunctionEnemy music
Transdiegetic (internal)Spatial & FunctionStarCraft II: “Feels like I’m always being watched.”
DiegeticStarCraft II: “Feels like I’m always being watched.”StarCraft II: “Feels like I’m always being watched”.
Table 1: Inspired by Jørgensen 2007:114.

The perception of the trans-diegetic space in video games

The question of how sound can exist across multiple different spaces while simultaneously carrying information that is both gameplay-relevant and fictional at the same time without creating problems for our understanding of a cohesive world is addressed by Jørgensen, who refers to Danish computer science researcher Anne Mette Thorhauge for an answer.

Thorhauge points to the English theorist Gregory Bateson’s concept of “metacommunication” as a possible explanation.

Metacommunication can be described as a kind of “communication about communication.” Bateson uses the concept in connection with his theory of fantasy and play. When we play, we establish a “frame for communication separate from the rest of the world” (Ibid:115).

Within this frame, specific actions have a certain status or meaning. We can understand that the purpose of these actions is due to our ability to reflect on what the actions communicate.

In other words, we can meta-communicate.

According to Thorhauge, metacommunication is, therefore, an expression of our ability to understand multiple “frames of reference” at the same time, which is what enables us to understand sound as both a part of the diegesis, extradiegetic, and interface at the same time (Ibid.).

Thus, successfully navigating the magic gameplay cycle requires the ability to meta-communicate.

Intertextual, hermeneutic, and transmedia diegesis.

As the last point in this chapter, I would like to mention an extra level of meta-communication that seems to bridge several fictional worlds across time, space, and media.

Games like WarCraft and StarCraft are filled with intertextual references. I have already mentioned how, for example, the battlecruiser captain’s voice is an homage to Admiral Gloval from the science-fiction animation series Robotech.

A particularly entertaining phenomenon occurs if the player ” annoys” a specific unit by repeatedly clicking on it with the mouse.

The unit becomes a focalizer, getting annoyed at being disturbed in its activities in the diegetic reality it is placed in. And often, the irritable outbursts consist of quotes from other famous cultural artifacts.

The unit called “Thor” – a kind of mechanical walking tank controlled by a human is – apparently – controlled by Arnold Schwarzenegger.

The soldier who controls it speaks in the same characteristic way and also comes up with quotes from the governor’s film career, such as “I’m here, click me!” which is a paraphrase and intertextual reference to the line “I’m here! Kill me!” from the film Predator (1987).

The human race in WarCraft III, on the other hand, seems to be a big fan of Monty Python; an annoyed knight exclaims, for example, “I never say ‘Ni’!” as an intertextual reference to the Monty Python movie “Monty Python and The Holy Grail” (1975).

I would describe these examples as diegetic since they don’t seem to have any other function than to be entertaining.

However, they give the different units a form of personality and depth. At the same time, they point out of the diegesis across time and space to other media, making them transmedia and intertextual.

Understanding and catching these intertextual references as players require knowledge of the source but also a hermeneutic interpretation of these sounds in a broader context from our position in the magic circle.

Intertextuality, in my view, adds yet another dimension to Bateson’s metacommunicative reference framework as described here.

The Shortcomings of Transdiegesis: is it Time for a New Terminology

Above, you can see some of the problems associated with using diegesis as a video game sound analysis tool.

It’s because of these problems Jørgensen later takes a more ludological-inspired approach.

In her article “Time for new terminology?” (2011), Jørgensen questions the usefulness of the concept of diegesis for analyzing the spatial integration of sound in video games.

According to Jørgensen, the traditional distinction between diegetic and non-diegetic sound does not consider that players can interact with the game world through gameplay.

She asserts that video game worlds fundamentally differ from traditional fictional worlds because they are designed to be played and based on rules. Therefore, she argues that sound in video games should be evaluated using different terms than those used to analyze film sound.

Instead, Jørgensen proposes a new terminology and model based on a gameplay perspective to analyze the spatial integration of sound in video games.

She focuses on sound as part of the game’s interface, arguing that when sound carries gameplay-relevant information, it should be seen as part of the game’s interface.

Jørgensen’s five interface categories for understanding sound as an interface in video games

Jørgensen’s terminology consists of five categories: metaphorical interface, overlay interface, integrated interface, emphasized interface, and iconic interface, which are integrated into the game world to varying degrees.

  • Metaphorical interface sounds are, as the only category, not an integrated part of the game world of computer games “[…]since they are not “naturally” produced by the game universe but have a more external relationship to the gameworld, even though they also have a metaphorical similarity […] to the atmosphere and the events in it” (Ibid:92).

    Jørgen mentions the music that plays when encountering an enemy in video games as an example of this type of sound. Metaphorical interface sounds are part of the game’s gamespace.

  • Overlay interface sounds are integrated into the game world of computer games and correspond to the previously mentioned static interface.

    This category covers sounds from menus, maps, and action bars, typically generated in response to the player’s commands.

    The “operating system-like” sounds that play when I click on an icon in the lower right corner of Starcraft II’s action bar belong in this category.

  • Integrated interface sounds are typically associated with interface elements in the game world.

    In StarCraft II, for example, you can activate a so-called “stim pack,” a syringe with various chemicals that give its Space Marines a range of extra powers for a period. When you start it, some small lightning bolts are over the troops’ heads.

    At the same time, there is a sound as if a hydraulic system is injecting chemicals into the soldier’s body. The graphic lightning bolts and the sound of the activated syringe both belong to the integrated interface category.

  • Emphasized interface sounds come from friendly NPCs1 in the game world.

    An example is when you have freed civilians or troops held captive by the enemy in StarCraft II and expressed their joy through speech.

    According to Jørgensen, these are sounds that could be characterized as diegetic in the traditional sense, as they are sounds that a character in the game world of the game says.

    However, according to Jørgensen, this is not the case: “[…]it is in fact a system-generated sound that has been stylized and fitted into the gameworld” (Ibid:93).

  • Iconic interface sounds are fully integrated into the game world of the game and correspond to the previously mentioned dynamic interface.

    According to Jørgensen, sounds in this category within film theory would be defined as diegetic, as they are a natural part of the universe they are in.

    They can be generated by widely different sources in the game world and provide many kinds of information to the player.

    It can be, for example, the sound of an SCV that is busy welding and screwing during the construction of a building in StarCraft II.

Jørgensen’s categories should be considered a continuum, and multiple sound sources can exist simultaneously.

Below I have listed the different types of interfaces with specific game examples and combined them with gameworld, gamespace, dynamic interface, static interface, and diegetic sound.

This is based on information provided by Jørgensen in her presentation of the different types of interfaces.

A similar table inspires the table in Jørgensen’s work, which only includes the first two categories (interface & examples) (Ibid:92).

The table is an attempt to expand Jørgensen’s table for the sake of clarity and also to include the concepts on which her terminology is built. The categories should be seen as a continuum.

InterfaceExampleStatic or DynamicGameworld or Gamespace?Diegetic?
MetaphoricalDragon Age: Origins: Enemy music
OverlayStarCraft II: Welding and drilling sounds when an SCV is building somethingStaticGameworld

IntegratedStarCraft II: The sound of stimpacks, injected into Space Marines

EmphasizedStarCraft II: The sound of happy NPCs who are liberated
IconicStarCraft II: Welding and drilling sounds when a SCV is building somethingDynamic
Table 2: Sound integration in video games as a type of interface.

In my view, Jørgensen’s categories require further explanation to be applicable in practice. Therefore, I have chosen to include some of the concepts on which her model is built.

Spatial integration of sound: Gamespace vs. gameworld according to Jørgensen

Jørgensen defines gamespace (a term she borrows from Juul) as “the conceptual space in which the game is played […] independent of any possible fictional universe used as a context for it. It is thus the arena on which gameplay takes place and includes all elements relevant for playing the game” (Ibid: 89).

According to Jørgensen, all elements related to gameplay can be included as part of the game’s gamespace.

This means that features such as voice-overs that signal new players entering the game, live chat over headsets in multiplayer games, add-on software, and underscore that signals danger can all be perceived as part of a video game’s gamespace.

Jørgensen defines gameworld as “the contained universe or environment designed for play in which actions and events occur” (Ibid.).

The gameworld of a video game consists of graphical and auditory elements that Juul calls fictive building blocks.

However, according to Jørgensen, these elements do not constitute a fictional world. I will return to this aspect later. In the current context, the distinction between gameworld and gamespace is central.

Both gameworld and gamespace are spatial metaphors for video game worlds. They are not functional aspects but spatial ones.

Dynamic and static user interfaces

To describe how functional aspects can be said to relate to spatial aspects, Jørgensen draws on an explanation by game designers Kevin Saunders and Jeannie Novak, who distinguish between a dynamic and a static user interface.

Jørgensen defines the dynamic interface as “a dynamic interface supports the idea that all audio-visual aspects of a game should be seen as an interface because they all provide the player with some kind of information, and dynamic interfaces are therefore completely incorporated into the gameworld” (Ibid: 88).

A typical example of a dynamic interface is the equipment (armor, helmets, swords, etc.) that an opponent’s avatar wears in a multiplayer computer game, which can provide information about the opponent’s race, class, strength, etc.

Regarding the static interface, Jørgensen writes: “A static interface […] is an overlay interface that consists of external control elements such as health bar, map, pop-up menus, inventory, action bars and so on” (Ibid.).

Critical questions about Jørgensen’s terminology: What happened to fiction?

I have tried to provide examples that closely match those given by Jørgensen. This is because I find the categories difficult to work with in practice.

Jørgensen’s explanations and examples, as described above, do not provide much information to work from, and they require prior knowledge of concepts such as gameworld and gamespace to make sense.

These concepts are not incorporated into Jørgensen’s definition of the various categories, even though the categories are built on this dichotomy.

It is unclear whether Jørgensen includes a game’s underscore as part of the metaphorical interface when it functions only as accompaniment and creates an atmosphere for the player’s movements in the game or whether there must be a specific event that triggers a particular type of music, as is the case every time one encounters an enemy.

A game’s underscore can provide the player with information that there is no danger present when it is “just” accompanying the gameplay.

When I, as a player, do not hear an enemy’s leitmotif but the game’s atmospheric underscore, it can be information that there is time to use a healing potion, change armor or weapons, etc., without fearing an attack.

However, based on the example Jørgensen provides with enemy music from Dragon Age: Origins, it is my immediate perception that this category corresponds to what Collins defines as “interactive music,” or music that changes as a result of the player approaching an enemy (although this “choice” sometimes comes as a surprise to the player).

Therefore, it is not my immediate perception that the game’s underscore, and thus what Collins defines as “adaptive music,” is part of Jørgensen’s definition of the metaphorical interface category, which, in my view, is problematic.

A Hybrid Approach to Understanding Video Game Audio

As I share Juul’s perception that video games consist of both rules and fiction, it is in my view most appropriate to use a combined ludological, narratological (and film theoretical) approach to analyze the sound in fictional playing spaces.

Therefore, I have compiled Jørgensen’s two approaches below to provide an overview of the different narratological and ludological concepts.

The table is set up according to which spatial affiliations the different types of sounds can be said to have, as it is space that they have in common.

Inspired by the distinction between gamespace and gameworld, for the sake of clarity, I have chosen to introduce an overall division of the fictional playing space. This is also not to confuse my concept of the sonic game space, which encompasses the entire sonic universe a player encounters through gameplay.

Instead of “gamespace,” I will use the term game frame, which can contain fictional and extra-fictional elements and functions as part of the interface.

It is, so to speak, the elements that metaphorically delimit the actual playing space that the player can reach into and act in.

I will keep the term gameworld because it is built on real rules and fictional building blocks.

I have also set up the different terminologies according to their epistemological origins, although one can argue that the various approaches are interrelated.

Sonic game spaceNarratological discourseLudological discourse
Game frameExtra fictional spaceMetaphorical interface
Extradiegetic space
Transdiegetic space (external)
GameworldTransdiegetic space (internal)Overlay interface
Integrated interface
Diegetic spaceEmphasized interface
Iconic interface
Table 3: A hybrid approach to understanding video game sound


In this article, I presented a narratological and film-theoretical approach and then a ludological approach to sound analysis in video games based on Jørgensen’s theories.

I turned my attention to Jørgensen’s previous research and the concept of a transdiegetic sound space, which, in virtue of its diegetic concept and thus fiction inherent in it, also has the function of being a carrier of gameplay-relevant function.

When Jørgensen (2011) declares that video game worlds are fundamentally different from traditional fictional worlds, as video game worlds are first and foremost designed as game worlds based on rules and meant to be played in, she is only partially correct in my view.

There can be no doubt that the fictional playing spaces are constructed with the game in mind.

But conversely, there can be no doubt that these worlds are not only built on rules but also on fictional building blocks.

Taking the concept of diegesis as a starting point for analyzing the sound in video games is, therefore, not only legitimate but also necessary.

Sound in video games functions not only as a carrier of gameplay-relevant information but also as a carrier of information about the game’s fiction. The latter can again help provide hints about the game’s gameplay.

In that perspective, the concept of transdiegesis becomes interesting because its narratological starting point points out that sounds can be anchored in the fictional world while also being carriers of gameplay-relevant information.

I believe this is an essential point to include if you want to understand sound in video games.

However, the transdiegetic concept also has its limitations. This applies, for example, to specifying the sound’s spatial integration and function as an interface.

And there will be cases, such as menus, where describing the sound as an interface will provide greater precision in describing the sound’s spatial integration and function, despite the menu, for example, having a graphic connection to the game’s fiction.

Because of this, I turned my attention to Jørgensen’s later ludological approach, where she tossed away the notion that video game sounds are fictional or part of the narrative and regarded them as part of the game interface instead.

In the ludological approach, video game worlds were considered to be based solely on rules, so all sounds that carried gameplay-relevant information were described solely as part of the game’s interface.

I argued that this approach could not stand alone, as I believe video game worlds consist of fiction and rules. Sounds can carry multiple types of information across various kinds of spaces at the same time.

Thus I suggest we at least take a hybrid approach to video game sound analysis that includes both the ludological and narratological discourses, as exemplified in Table 3.

However, to thoroughly analyze video game sound, we must also consider intertextuality and be willing to draw on musicological (musical notation is just one example) and other concepts from film theory.

In my next article, I’ll look at horror games and include some of these concepts for a comprehensive analysis.

Until next time, happy gaming!

Profile picture


Jan has played video games since the early 1980s. He loves getting immersed in video games as a way to take his mind off stuff when the outside world gets too scary. A lifelong gamer, the big interest led to a job as a lecturer on game sound at the University of Copenhagen and several written articles on video games for magazines.

Read more on the About Page.