The DART-Europe E-theses Portal

(1)

HAL Id: tel-01219693

https://hal.archives-ouvertes.fr/tel-01219693

Submitted on 27 Oct 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Theory and Practice of Physical Modeling for Musical Sound Transformations : An Instrumental Approach to

Digital Audio Effects Based on the CORDIS-ANIMA System

Alexandros Kontogeorgakopoulos

To cite this version:

Alexandros Kontogeorgakopoulos. Theory and Practice of Physical Modeling for Musical Sound Trans- formations : An Instrumental Approach to Digital Audio Effects Based on the CORDIS-ANIMA Sys- tem. Modeling and Simulation. Institut national polytechnique de Grenoble; Université d’Athènes, 2008. English. �tel-01219693�

(2)

INSTITUT POLYTECHNIQUE DE GRENOBLE

ΕΘΝΙΚΟ ΚΑΙ ΚAΠΟΔΙΣΤΡΙΑΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ

N° attribué par la bibliothèque

|__|__|__|__|__|__|__|__|__|__|

T H E S E E N C O T U T E L L E I N T E R N A T I O N A L E

pour obtenir le grade de DOCTEUR DE L’IP Grenoble

et

de l’Université d’Athènes Spécialité : « Art Sciences, Technologies»

préparée au laboratoire Informatique Création Artistique

dans le cadre de l’Ecole Doctorale « Ecole Doctorale Ingénierie pour la Santé, la Cognition et l'Environnement» et au Μεταπτυχιακού Προγράµµατος Σπουδών του

Department of Informatics and Tellecommunications

présentée et soutenue publiquement par

Alexandros Kontogeorgakopoulos

le 9 Octobre 2008

TITRE

Theory and Practice of Physical Modeling for Musical Sound Transformations:

An Instrumental Approach to Digital Audio Effects Based on the CORDIS-ANIMA System

DIRECTEUR DE THESE CO-DIRECTEUR DE THESE

Claude CADOZ

Georgios KOUROUPETROGLOU

JURY

M. Jean-Claude RISSET , Président

M. Julius SMITH III , Rapporteur

M. Giovanni DE POLI , Rapporteur

Mme. Nadine GUILLEMOT , Examinateur

M. Sergios THEODORIDIS , Examinateur

(3)

(4)

ΕΘΝΙΚΟ ΚΑΙ ΚΑΠΟΔΙΣΤΡΙΑΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ

ΣΧΟΛΗ ΘΕΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ

ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΚΑΙ

INSTITUT POLYTECHNIQUE DE GRENOBLE

ECOLE DOCTORALE INGENIERIE POUR LA SANTE LA COGNITION ET L’ENVIRONNEMENT

ΑΠΟ ΚΟΙΝΟΥ ΔΙΔΑΚΤΟΡΙΚΗ ΔΙΑΤΡΙΒΗ

Theory and Practice of Physical Modeling for Musical Sound Transformations:

An Instrumental Approach to Digital Audio Effects Based on the CORDIS-ANIMA System

Αλέξανδρος Γ. Κοντογεωργακόπουλος

GRENOBLE ΚΑΙ ΑΘΗΝΑ ΟΚΤΩΒΡΙΟΣ 2008

(5)

(6)

ΔΙΔΑΚΤΟΡΙΚΗ ΔΙΑΤΡΙΒΗ

Theory and Practice of Physical Modeling for Musical Sound Transformations:

An Instrumental Approach to Digital Audio Effects Based on the CORDIS-ANIMA System

Αλέξανδρος Γ. Κοντογεωργακόπουλος

ΕΠΙΒΛΕΠOΝΤΕΣ:

Claude Cadoz Ingénieur de Recherche - HDR en Informatique Γεώργιος Κουρουπέτρογλου, Επίκουρος Καθηγητής ΕΚΠΑ

ΤΡΙΜΕΛΗΣ ΕΠΙΤΡΟΠΗ ΠΑΡΑΚΟΛΟΥΘΗΣΗΣ:

Γεώργιος Κουρουπέτρογλου, Επίκουρος Καθηγητής ΕΚΠΑ Σέργιος Θεοδωρίδης, Καθηγητής ΕΚΠΑ

Αναστασία Γεωργάκη, Επίκουρος Καθηγήτρια ΕΚΠΑ

ΕΞΕΤΑΣΤΙΚΗ ΕΠΙΤΡΟΠΗ

Ηµεροµηνία εξέτασης 09/10/2008 Jean-Claude RISSET

Directeur de Recherche Emérite CNRS Marseille

Σέργιος Θεοδωρίδης, Καθηγητής ΕΚΠΑ

Γεώργιος Κουρουπέτρογλου Επίκουρος Καθηγητής ΕΚΠΑ

Claude Cadoz

Ingénieur de Recherche HDR

Institute Polytechnique de Grenoble

Julius SMITH

Professor - Stanford University

Giovanni DE POLI

Professor - University of Padova

Nadine GUILLEMOT Professeur

Institute Polytechnique de Grenoble

(7)

(8)

(9)

(10)

st ra c t

(11)

(12)

Abs tra ct

This thesis proposes the use of physical modeling as a mean of designing musical sound transformations under the concept of instrumental interaction. We were seeking for a physical modeling paradigm that permits straightforwardly the exploration for new sonic possibilities based on sound transformation techniques. Hence the present essay is clearly oriented and devoted to musical creation.

This work provides new results and new discussions in musical signal processing and in the design of digital audio effects. The novelty is the introduction of the instrumental gesture in the audio signal processing algorithms. The same concept has been applied to Frequency Modulation (FM) synthesis algorithm.

Concretely the novel outcomes and the contribution of this study are:

× Some theoretical aspects concerning the exploit of physical modeling for musical sound transformations

× The proposition of a simple modular system architecture for the design of physical audio effect models based on a common visual block type programming environment

× The representation of CORDIS-ANIMA (CA) system by a number of helpful mathematical formalisms

× The redesign/re-definition of several well-known digital audio effects using CA system: filters, delay- based effects, distortions, amplitude modifiers. Especially in filter synthesis by CA networks, an algorithm has been applied which is based on the Cauer realization of electrical circuits.

× The redesign/re-definition of FM synthesis using CA system.

SU BJ EC T A RE A: Computer Music

KE YW O RD S: digital audio effects, physcial modeling, musical sound transformation, CORDIS ANIMA, signal processing, instrumental gesture, control, sound processing, mass-interaction formalism

(13)

(14)

Περίλη ψη

Η παρούσα διατριβή προτείνει την χρήση των τεχνικών φυσικής μοντελοποίησης για την σχεδίαση αλγορίθμων ψηφιακής επεξεργασίας ήχου προορισμένων για μουσική δημιουργία. Η βασική ιδέα στην οποία στηρίχτηκε είναι η παρουσίαση ενός νέου και καινοτομικού τρόπου ελέγχου αλγορίθμων επεξεργασίας ήχου βασισμένου στο πρότυπο αλληλεπίδρασης με τον όρο: αλληλεπίδραση οργάνου (instrumental interaction). Αναζητήθηκε ένας φορμαλισμός φυσικής μοντελοποίησης ο οποίος να επιτρέπει άμεσα την έρευνα για νέα ηχοχρώματα μέσω τεχνικών επεξεργασία ήχου. Συνεπώς η παρούσα έρευνα έχει σαφέστατο μουσικό προσανατολισμό.

Η εργασία παρέχει νέα αποτελέσματα και θέτει νέα ερωτήματα σχετικά με τη επεξεργασία ήχου για μουσική δημιουργία και για την σχεδίαση ψηφιακών ακουστικών εφφέ (digital audio effects). Καινοτομία αποτελεί η εισαγωγή χειροvομιών οργάνου (instrumental gesture) για το έλεγχο των αλγόριθμων επεξεργασίας ήχου. Η ίδια ιδέα εφαρμόστηκε και στον αλγόριθμο σύνθεσης ήχου διαμόρφωσης κατά συχνότητα FM (frequency modulation).

Συγκεκριμένα τα πρωτότυπα αποτελέσματα και η συμβολή αυτής της έρευνας είναι:

× Ορισμένα θεωρητικά πορίσματα που αφορούν την έρευνα σχετικά με την χρήση των τεχνικών φυσικής μοντελοποίησης για την σχεδίαση αλγορίθμων ψηφιακής επεξεργασίας ήχου προορισμένων για μουσική δημιουργία

× Η πρόταση μιας απλής αρθρωτής αρχιτεκτονικής για τον σχεδιασμό φυσικών μοντέλων ψηφιακών

ακουστικών εφφέ, βασισμένης σε ένα κοινότυπο γραφικό περιβάλλον προγραμματισμού μπλοκ τύπου.

× Η αναπαράσταση του συστήματος CORDIS-ANIMA και η σύγκρισή του με άλλους χρήσιμους φορμαλισμούς μοντελοποίησης.

× Ο σχεδιασμός και η προσομοίωση διαφόρων γνωστών εφφέ με το σύστημα CORDIS-ANIMA: φίλτρα,

εφφέ βασισμένα σε γραμμές καθυστέρησης, παραμορφώσεις καθώς και εφφέ που επηρεάζουν την δυναμική του ήχου. Ιδιαίτερα σημαντική συμβολή είναι η χρήση της τεχνικής Cauer για τη σύνθεση φίλτρων CORDIS-ANIMA.

× Ο σχεδιασμός και η προσομοίωση της σύνθεσης FM με το σύστημα CORDIS-ANIMA.

Θ ΕΜΑ ΤΙ ΚΗ Π ΕΡΙ Ο ΧΗ: Μουσική Πληροφορική

ΛΕ ΞΕΙ Σ Κ Λ ΕΙ ΔΙ Α: ψηφιακά ακουστικά εφφέ, φυσική μοντελοποίηση, μουσικός μετασχηματισμός ήχου,

CORDIS ANIMA, επεξεργασία σήματος, χειρονομίες οργάνου, έλεγχος, επεξεργασία ήχου, φορμαλισμός μάζας-αλληλεπίδρασης

(15)

(16)

to S te l aki s !

(17)

(18)

n o w le d g e m e n t

(19)

(20)

Aknowled g ement s

Those four years I spent at ACROE, in the University of Athens, at IRCAM, in McGill University and elsewhere during my “euro-libraries tour” have given me unforgettable and invaluable experiences.

Evidently, I have too many people that I would like to thank.

Claude Cadoz who believed in me and inspired me enormously during this adventurous research Giorgios Kouroupetroglou who did not stop giving me valuable advises about everything

Annie Luciani and especially Jean-Loups Florens for the help and all the interesting conversations we had. Of course, I cannot forget the whole ACROE team and especially Francois Poyer, Julien Castet, Olivier Tache, Nicola Castagne - for the nice moments we spent together...

Jean-Claude Risset, who transmitted to me his magic view of computer music Julius Smith, for his comments about my thesis and his incredible site

Vincent Verfaille for those interesting discussions we had about the musical sound transformation The jury who kindly accepted reading my PhD and honored me with its presence in my thesis defense

In addition, I would like to thank my family: thanks mama, dad and Noulikiti. You were always there when I needed you.

Special thanks goes to olivia (of course!), Rory, Gareth, Haralambos and all my friends.

ATPS

(21)

(22)

fa c e

(23)

(24)

Pref ac e

The present dissertation is the final work of my research at the Institut National Polytechnique de Grenoble and at the Department of Informatics & Telecommunications of the National & Kapodistrian University of Athens. It serves as documentation of my PhD study, which has been accomplished during the period 2004-2008, in order to obtain a common Doctorate diploma by both establishments. Since it was the first collaboration of that kind between the National & Kapodistrian University of Athens and other foreign establishments, a significant amount of energy, effort and time has been spent in order to attain it.

The research has been partly funded for two years by the MIRA program (Mobilité Internationale Rhône- Alpes - France). Within the context of this collaboration between the two mentioned institutes, a research program in a similar subject area for the physical modeling of Greek traditional musical instruments has been founded by the program ΑΝΤΑΓΩΝΙΣΤΙΚΟΤΗΤΑ (General Secretariat for Research and Technology, Ministry of Development – Hellenic Republic). However, the most important financial support has been provided by my parents. I cannot find words thank them.

As a PhD student, I have had the opportunity to follow a large number of courses at IRCAM (Institut de Recherche et Coordination Acoustique et Musique) in Paris and to spend a period in the Sound Processing and Control Laboratory at McGill University. The last part has been founded by the AIRES CULTURELLES program (Ministère de l’Enseignement Supérieur et de la Recherche - France).

I should underline that this PhD was a great challenge for me since it addresses a question that has never been posed before. I hope that it will become an inspiration for significant applications and research areas in the computer music domain.

Enjoy it!

(25)

(26)

tl in e

(27)

(28)

Outline

The dissertation is organized and structured as follows:

Chapter 1 presents some definitions of musical sound transformation and digital audio effects. The point of view is turned more on the design part. Among several taxonomies found in the literature a new one is proposed.

Chapter 2 and 3 expose an overview of the basic digital audio effect algorithms and their control. They cover exclusively previous research.

Chapter 4 introduces briefly the hypothesis and the context of our research: musical sound

transformation by means of physical modeling within the framework of instrumental interaction. Under the prism of gesture and physics our approach is presented and explained.

Chapter 5 is focused on the CORDIS-ANIMA system. CA models are studied and presented using several useful system representations like CA Networks, Finite Difference representations, Kirchhoff

representations, digital signal processing Block Diagrams, State Space internal descriptions, System Function input/output external descriptions and, Wave Flow Diagrams.

Chapter 6 examines some issues relative to the general design of CA audio effects. The system architecture is presented.

Chapter 7 studies a list of basic digital audio effects and proposes their CA version. The presented algorithms are: elementary mathematical operations, filters, amplitude modification, echo, comb filter, flanger, pick-up point modulation and clipping/distortion.

Annex A describes a simple way to dynamically modify the physical characteristics of CA physical models. With this general methodology called transfer characteristics biasing method, the instrumental interaction is not violated.

Annex B presents some primary ideas and results concerning the redesign/re-definition of FM synthesis several using CA system.

Conclusions are… conclusions. In the same chapter suggestions for future work are presented.

(29)

(30)

n te n ts

(31)

(32)

Cont en t s

ab st ra ct

ac k now le dg e me nt s pre f ac e

ou tl in e co nt e nts

1 M us ic a l S o un d Tr a nsf or m at io n s

1.1 What is Musical Sound Transformation?

1.2 What is an Audio Effect?

1.3 Taxonomy of Audio Effects

2 A n Ove rv iew of D ig it al A ud io Ef fe ct s A lg or it h ms 2.1 Time-Domain Models

2.1.1 Simple Operations 2.1.2 Filters

2.1.3 Delay-based Effects 2.1.4 Nonlinear Processing 2.1.5 Time-Segment Processing 2.2 Time-Frequency Models

2.2.1 Phase Vocoder 2.2.2 Wavelet Transform

2.2.3 Other Time-Frequency Techniques 2.3 Parametric/Signal-Models

2.3.1 Spectral Models 2.3.2 Source-Filter Models

3 C o nt ro l o f D i git a l A ud io E ff ec t A lg or it h ms 3.1 Gestural Control

3.2 LFO Control

3.3 Control based on Automation 3.4 Algorithmic Control

3.5 Adaptive / Content-based Control 4 P h ys ic al A ud io Ef fe ct Mo de ls

4.1 Digital Audio Effects and Physical Modeling: La raison d’être of this PhD research

1 3 5 7 13 15 15 15 17 19 20 23 23 24 24 24 24 25 27 29 30 31 31 31 35 37 ix xvii xxi xxv xxix

(33)

4.2 Digital Audio Effects and Instrumental Multisensory Interaction: La Nouveauté of this PhD . Research

5 C O R DI S-A N IM A Sys te m An a ly sis

5.1 CORDIS-ANIMA Network Representation

5.2 Finite Differences (FD) Representations and Finite Derivatives (FDe) Representation 5.2.1 Mass-Interaction Networks: System of FD and FDe equations

5.2.2 Multichannel FD and FDe Representations 5.2.3 N-dimensional Finite Difference Representation 5.2.4 From CA Networks to Partial Differential Equations 5.3 State Space Representation

5.4 Kirchhoff Representation - Electrical Analogous Circuits 5.5 System Function Representation

5.5.1 General System Block Diagrams 5.5.2 One-Ports

5.5.3 Two-Ports

5.5.4 Modal Representation

5.6 Digital Signal Processing Block Diagram Representation 5.7 Wave Flow Representation: Interfacing the DWG with the CA 6 C O R DI S-A N IM A P hys ic a l A u di o E ff ec ts Mo de l D es i gn

6.1 System Architecture 6.2 Components and Modules

6.3 Modules Interconnection and Construction Rules 6.4 User Interface

6.5 Simulations/Simulator

7 C O R DI S A N IM A P hys ic a l A ud i o E ff ec t M od el s 7.1 Elementary Signal Processing Operations

7.1.1 Unit Delay Element 7.1.2 Constant Multiplier 7.1.3 Adder and Subtracter

7.1.4 Memoryless Nonlinear Element 7.2 Basic Low-Order Time-Invariant Filters

7.2.1 FRO: Highpass

7.2.2 REF: Highpass/Lowpass

7.2.3 CEL: Bandpass/Lowpass/Highpass 7.3 Synthesis of High-Order Time-Invariant Filters

7.3.1 Cascade-Form and Parallel-Form CA Structures 7.3.2 String-Form CA Structures

7.4 Time-Variant Filters

38

41 46 48 49 50 52 55 59 62 66 67 68 69 72 73 75 79 83 84 85 89 92 97 99 100 101 101 102 102 102 103 104 107 108 110 114

(34)

7.4.1 Wah-Wah

7.4.2 Time-Variant Resonators: “Pressing-String” and “Sticking-String” Models 7.5 Amplitude Modifiers

7.5.1 Bend Amplitude Modification Model 7.5.2 Mute Amplitude Modification Model 7.5.3 Pluck Amplitude Modification Model 7.6 Delay-based Audio Effects

7.6.1 Delay Model 7.6.2 Comb Filter Models 7.6.3 Flanger Models

7.6.4 Spatialization and Pick-Up Point Modulation 7.7 Nonlinear Audio Effects

7.7.1 Nonlinearities without Memory: Waveshaping 7.7.2 Nonlinearities with Memory: Clipping An n ex A Tr a ns f er C h ar ac te ri sti cs Bi a si n g M et ho d

An n ex B F re qu e nc y M od u la ti on wit h i n t he fr a m ewo r k of P h ysi c al Mo de li n g B.1 Simple Frequency Modulation

B.2 Physical Frequency Modulation Model

B.3 Triangular Physical Frequency Modulation Model C on cl u si on s

Bi bl io gr ap h y

114 115 118 118 119 121 122 122 124 126 127 128 128 129 118 131 137 139 140 142 147 153

(35)

(36)

h a p te r 1 M u si c a l S o u n d T ra n sf o rm a ti o n s

(37)

(38)

Chap t er 1

Mu sic al So und Tran sfor m ation s

“Making a good transformation is like writing a tune… There are no rules.”

Trevor Wishart

1. 1 W h at is M us ic al S ou n d T r an sf or m at io n?

The idea of sound transformation refers to the process of transforming or modifying a sound into another one with different quality. A more musical oriented definition describes sound transformations as

“the processing of sound to highlight some attribute intended to become part of the musical discourse within a compositional strategy” (Glossary of EARS web site [EARS]). In the present research, we do not use this term in the sense of sound morphing.

Clearly, in this thesis we are interested in the “how to” in order to formulate a sound transformation:

the means, the methods, and the tools. Hence, our approach is always from the construction point of view. Figure 1.1 illustrates schematically where we focus our interest.

Figure 1.1 Our interest concerning sound transformations^**

Innovative musicians, charismatic sound engineers and pioneering researchers in the domain of music, audio and acoustics have written the history of musical sound transformations. Mechanical, acoustical, electromechanical, electromagnetic, electronic and digital systems were developed and used for this artistic^* purpose. We will present in historical order several important events in order to better reach and understand the concept and the nature of musical sound transformation.

The research on timbral development from one texture to another is evident in the 20^th century history of electronic and computer music. During the 20’s Varèse had already started searching for new sound qualities working with natural musical instruments only [Manning 2004]. The early works of Iannis Xenakis

**the term texture is employed in a very general way

*We study sound transformations always in a musical context.

(39)

are excellent examples of instrumental sound transformations. These transformations influenced a lot and motivated Trevor Wishart to start his investigations on audio effects [Landy 1991]. They were based on the mechanical manipulation of the sound propagation medium -all the possible parts of the vibrating system- and on the excitation mechanism.

The invention of the commercial gramophone record offered a conversion of time information into spatial information. However, this technique was not only used for storing sound information; soon after composers began to experiment with the recording medium and with the process of sound recording and reproduction. Fred Gaisberg back in 1890 was holding an opera singer by the arm and moving her closer or further away from the gramophone according the dynamics of the piece [Moorefield 2005]. A manual and acoustic dynamic processing of the recorded sound was achieved by this simple technique. Darius Milhaud carried several experiments on gramophones investigating vocal transformations during the period 1922 to 1927 [Manning 2004]. An excellent and more contemporary example of musical creation based on the manipulation of the recording support is the Disc Jockey (DJ) who emerged in the late sixties.

The whole performance is focused on the direct manipulation of the records: playing inverse, playing at different speeds, “scratching”, playing with many and different type of heads and scratching the surface with a sharp tool are some techniques used by experimental Djs.

Optical recording has also given an interesting support and encouraged similar musical experimentations.

Pfenninger in 1932 modified sounds by altering their shapes on the optical soundtrack. The introduction of the magnetic tape recorder in the studios after the Second World War gave new promises on sound transformation. Once more, the creative process was based on the manipulation of the support. The enhanced possibilities of tape gave birth to Musique Concrète in 1948. Even though the magnetic type systems did not permit visually the physical modifications of the waveform patterns, their editing and rewriting capabilities were significantly important to musicians for musical expression.

Analog and digital technology offered a different type of sound treatment. It was neither the sound propagation medium nor the recording support that it was manipulating but a proper mathematical representation of sound. Analog signal processing techniques have been used since the late 19 century with the invention of the Telharmonium. Most of the widely known audio effects like the phazer, wahwah, distortion and chorus were created with analog signal processing techniques and implemented with electronic circuits [Bode 1984]. Digital signal processing continued with the same idea and offered a more convenient and general framework for the conception, the design and the implementation of digital audio effects.

This research concerns musical sound transformations where the sound source (that is transformed) is separated from the sound processing system: the transformation and the sound generation mechanisms are detached and are not parts of the same object. All the previous sound transformation paradigms fall in this category except for the transformations in the Xenakis example. According to Landy [Landy 1991] this type of musical transformation can be referred as transformation through synthesis. In a

(40)

similar manner, we call the other case that we are interested in as transformation through processing^*. It is evident that the frontiers between sound synthesis and sound processing are ambiguous and this separation is more meaningful under a musical perspective than a functional or a technical one.

Figure 1.2 (a) transformation through synthesis (b) transformation through processing

Wishart, whose contribution on sound transformations is invaluable, considers its entire computer based sound manipulation procedures as musical instruments. Moreover, he regards his research as an indispensable part of its musical work [Wishart]. He says characteristically: “…In particular the goal of the process set in motion may not be known or even (with complex signals) easily predictable beforehand. In fact, as musicians, we do not need to “know” completely what we are doing (!!). The success of our effort will be judged by what we hear…” [Wishart 1994].

1. 2 W h at is a n Au di o E ff e ct?

We define an audio effect as the “How” part -the method- of a sound transformation through processing. This definition is more general than the common ones. For example Verfaille et al. define the audio effect as:

“…we consider that the terms “audio effects”, “sound transformations” and “musical sound processing” all refer to the same process: applying signal processing techniques to sounds in order to modify how they will be perceived, or in other words, to transform a sound into another sound with a different quality” [Verfaille Guastavino Traube 2006].

It is obvious that this does not coincide completely with our definitions for audio effect and sound transformation. In the last definition, an ambiguity probably occurs from the term signal processing. In general, signal processing concerns the techniques and methods for the manipulation of the mathematical representations of signals. Even the terms signal/mathematical representation of signal usually are considered equivalent. Therefore, signal processing is just an approach to design sound transformation and a particular category of audio effects.

* Landy categorized sound transformations into three categories: i) electroacoustically ii) through synthesis iii) through resynthesis. In the present thesis we consider that the categories i) iii) under the term transformation through processing

(41)

For example, it is uncommon to consider the spring reverb, guitar or even an electric network as a signal processing system. On the other hand, we may approach it, represent it and model it as a signal processing system. This distinction is not only theoretical. A helpful example is the acoustical instrument designer: the methods he applies to construct his instruments are not at all signal processing-based. As this thesis concerns more the design of audio effects than the analysis, this distinction is essential.

When we deal with the processing of musical audio signals by digital means, we use the term digital audio effect (DAFX is an synonim for digital audio effects). Someone could say that since we use information-processing systems for the musical transformation of sound, we employ directly signal- processing techniques. We should be prudent once more, as modern digital sound technology has made sound certainly more immaterial and more object-like. Its production is not any longer bound to instruments and instrumentalists: it can be manipulated with tools acting on its representations^*. On the other hand, this does not necessarily mean that we must necessarily employ a signal processing method, or much more importantly, a signal-thinking approach to modify a sound with a digital computer. Of course, we cannot deny that at a low level all these modifications are in fact digital signal processing procedures.

The definition of digital audio effects by Zoelzer is more general:

“Digital audio effects are boxes or software tools with input audio signals or sounds which are modified according to some sound control parameters and deliver output signals our sounds” [Zoelzer 2002].

We could rearticulate the last definition and describe the digital audio effects as digital systems/algorithms that modify incoming audio signals according to the available control parameters. The control signifies all the possible methods available to the user for accessing the control parameters:

graphical user interfaces, abstract algorithms, gestural interfaces, sound features etc. In figure 1.3, we illustrate a digital audio effect according to this definition. In chapter 4, we will propose another approach to digital audio effects that differs from the depicted one.

Figure 1.3 Digital audio effect and its control (from [Zoelzer 2002])

(42)

Clearly, the field of musical sound transformation by digital means is lying between science and art. It could be seen strictly scientifically as a branch of digital signal processing applied to sound signals.

However, as we have already stated the goal is musical and thus neither obvious nor clear as in the digital signal processing field or even in audio engineering. The mathematics and the rigorous scientific approaches are often more important for the analysis of the algorithms -when this is possible- than for their design. A characteristic example is the design of the reverberation algorithms that they have traditionally been invented through experimentation in the past [Dattorro 1997b].

1. 3 T ax o no m y o f A ud io E ff ec ts

Apparently, it is extremely difficult to identify, classify and name existing digital audio effects. This is more evident in the current market where there are no official standards and each manufacturer uses different parameter names and adds supplementary signal processing components to generate their own distinctive sound. Moreover if we try to analyze the unpublished work of Wishart we will soon get disappointed (impressed in fact!) as there is an impressive amount of novel undocumented algorithms that are not widely (academically or commercially) known^*.

Figure 1.4 Perceptual classifications of various effects. Bold-italic words: perceptual attributes, italic words: perceptual sub-attributes, other words: audio effects (from [Verfaille Guastavino Traube 2006])

*Fortunately all these procedures are available though the Composers Desktop Project [Endrich 1997][Composers Desktop Project]

(43)

This section will recall some of the classifications of audio effects and will propose a new one that is not necessarily linked with the concept of signal processing. Verfaille summarizes several classifications in the article [Verfaille Guastavino Traube 2006]. As he points out, these classifications are neither exhaustive, nor mutually exclusive. In figure 1.4 is illustrated his proposed perceptual classification of various audio effects.

From the designer point of view, maybe the most practical classification is a technological one proposed by Zoelzer [Zoelzer 2002] and illustrated in figure 1.5. We will use this type of classification in the next chapter for the presentation of the basic digital audio effects algorithms.

Figure 1.5 Classification of audio effects based on the underlying techniques [Zoelzer 2002]

A similar classification, but more general, is based on the input domain. Since the digital audio effect is a discrete-time system, it may be seen as an abstract mathematical operator that transforms the input sound sequence into another sequence. The input sequence is a coded representation of the sound signal. It could be a time-domain representation, a time-frequency representation, or a parametric representation.

Ti m e-d o ma i n pr oc ess i ng: Most digital audio effects are conceived and designed in the time-domain.

One of the most intuitive ways to create sound transformations is to cut the input streams, replay them and re-assemble them in different ways. All these may be done down to sample-accuracy. Filters, delay functions, reverberation algorithms are other examples of sound transformations that may be realized in the time-domain by elementary mathematical operators as multipliers, adders and delay lines.

Ti m e- Fr eq ue n cy- do m ai n pr oc es si n g: Time-Frequency processing permits working with a sound signal from both frequency and time viewpoints simultaneously. In 1946, Dennis Gabor published an article in which he explicitly defined a time-frequency representation of a signal [Arfib 1991]. Another interesting publication of the same period is the one of Koening Dunn and Lacy [Koenig Dunn and Lacy 1946]. Each point in this time-frequency representation corresponds to both a limited interval of time and a limited interval of frequency. In general, with time-frequency methods we project the signal (time

(44)

representation) onto a set of basis functions to determine their respective correlations, which give the transform coefficient. If these transforms are discrete, the time-frequency plane takes the form of a grid or a matrix of coefficients. In the fixed resolution case, as in the Phase Vocoder, the bandwidths of the basis functions/analysis grains are the same for all frequencies and they have the same length. This gives constant time-frequency resolution. In the multi-resolution case, as in the Wavelet Transform, the analyzing grains are of different lengths and their bandwidth is not constant. In this scenario, the wider sub-bands give better time resolution and vice versa.

P ar a me tri c- do m ai n p ro c ess i ng : Parametric or Signal-Model processing of audio signals concerns algorithms based on sound synthesis models: the signal to be modified is first modeled and then the model parameters are modified to achieve the desired transformation. All of the valuable knowledge in sound synthesis can also be applied in sound transformation. Hence as Risset states [Risset 1991], it would be of great interest to develop automatic analysis procedures that could, starting with a given sound, identify the parameter values of a given sound model that yield a more or less faithful imitation of sound. This is a hard problem for certain synthesis models. The most useful and widely known sound synthesis models are additive synthesis, subtractive or source-filter synthesis, frequency modulation and wave-shaping or non-linear distortion.

According to the discussion in chapter 1.1, we propose a new classification applied to all the types sound processing techniques or audio effects in three general categories [Kontogeorgakopoulos Cadoz 2007b]. This classification has a strong impact on the way you conceive, you control and you implement your audio effects. It is based on the following question:

What do you manipulate?

We distinguished the following interesting basic cases: The energy flux that carries the musical signal (table 1.1), the recording medium that stores the musical signal (table 1.2) and the symbolic expression that represents the musical signal (table 1.3). This classification should not be confounded with the technological one as it covers more than technological aspects. Once again, it is not a mutually exclusive classification.

Table 1.1 Propagation Medium Processing

Pr op a ga ti o n M ed iu m P ro ce ssi n g A physical manipulation is taking place on the medium where the sound signal propagates. We could use the term channel processing as well. Since the concept of channel comes from the information theory, it is directly related to the notion of language that could cause confusion. So we preferred the first term.

(45)

The Propagation Medium Processing does not necessarily concern mechanical or acoustical vibrating systems. An electrical network, an optical or an electromagnetic waveguide for example may consist of the propagation medium. The importance is that (a) the initial sound energy is propagating in a physical medium and (b) sound transformation is achieved physically through the manipulation of the medium (manipulation). The Talk Box is a typical audio effect of this category: the sound is conducted to your mouth and modified by the vocal tract resonances. The reverberation chamber or the reverberation plate or spring is classified in this category as well.

Table 1.2 Recording Medium Processing

Re co rdi n g Med i u m Pr oc ess i ng A physical manipulation is taking place on the support where the sound signal is recorded. Segmenting and rearranging the support, scratching it, altering the shapes that sound takes are several techniques to modify the musical signal.

The recording medium processing concerns every possible medium where the sound signal can be recorded and stored: phonograph or vinyl record, magnetic tape, optical soundtrack etc. The mastery of sound processing with such systems logically lead to the creation of Musique Concrète and, more generally to all electro-acoustic music. The two-turntable setup used by thousands of Djs nowadays around the world offers musical sound transformations by recording medium processing. Clearly, the analog sound storage medium has been proven more appropriate for this type of processing.

Table 1.3 Information Processing

I nf or m ati o n Pr oc es si ng

A symbolic manipulation is taking place on the sound signal that considered as a codified message, which is generally expressed by a mathematical function. A set of mathematical operations transforms it either in the continuous time domain (analog signal processing) or in the discrete time domain (digital signal processing).

(46)

Today, often the term audio effect is used for audio effects that belong only to this category. This is reasonable since the majority of audio effects are computer-based. The computer simulation techniques offered the possibility to redesign and implement the Propagation Medium Processing and the Recording Medium Processing audio effects with a computer.

(47)

(48)

a p te r 2 A n O v e rv ie w o f D ig it a l A u d io E ff e c t A lg o ri th m s

(49)

(50)

Chap t er 2

An O vervi ew of Digit al Audio Eff ec t Alg orith ms

In this chapter we will present a survey of digital audio effects. The subject is huge and the bibliography is “unlimited”. We will mention briefly only the most common effects with all the necessary bibliography.

In chapter 7, where we expose our designs, the references are covered with more details. We cite five compilations of digital audio effect algorithms that we found really helpful [Zoelzer 1997][Zoelzer 2002][Orfanidis 1996][Verfaille 2003][Wishart 1994]. Zoelzer and Orfanidis present the algorithms from a technological perspective. Verfaille uses a perceptual categorization and Wishart chooses a much more compositional approach without focusing in algorithmic details.

We will expose the algorithms according to the following simple classification: time-domain, time- frequency and parametric procedures. As we have already stated in chapter one, we try to follow the designer point of view.

2. 1 T i me -D o ma i n M od el s 2. 1. 1 Si mp le O pe ra ti on s

The very elementary signal operations such as addition/subtraction, multiplication by a constant and signal routing can hardly be considered as audio effects. However the basic operations of a mixing console are based on them [Roads 1996][Rumsey 1999]:

× Gain changing (redithering is employed in mixers when the sample resolution has to be shortened)

× Cross-fading

× Mixing

× Mute/Solo

× Input/Output selection

× Grouping facilities

× Auxiliary send/return

A good reference for those operations, analyzed at a more technical level, can be found in [Zoelzer 1997][Mitra 2001].

2. 1. 2 Fi lt ers

Every SISO (Single Input Single Output) digital system can be considered as a filter. The committee on Digital Signal Processing of the IEEE Group on Audio and Electroacoustics defined a filter as [Rabiner et al. 1972]:

(51)

A digital filter is a computational process or algorithm by which a digital signal or sequence of numbers (acting as input) is transformed into a second sequence of numbers termed the output digital signal.

Very often the term filter is employed exclusively for the family of linear time-invariant systems (LTI). In this case, the general time-domain representation of the filter is given by a linear constant-coefficient finite difference equation .

The most standard filter classifications are named according to their amplitude response: lowpass (LP), highpass (HP), bandpass (BP), bandreject (BR). In figure 2.1 we illustrate this filter classification [Dodge Jerse 1997]. Another classification is between the finite impulse response (FIR) filters where the autoregressive term disappears and the infinite impulse response filters (IIR) where the auto-regressive term appears in the difference equation. There are many reference books concerning the fundamentals of filtering theory [Oppenheim Schafer Buck 1999][Proakis Manolakis 1996]. A more music-friendly reference is [Dodge Jerse 1997].

Figure 2.1 Classification of filters according to their amplitude response: (a) lowpass (b) highpass (c) bandpass (d) band reject

The simplest filters are the one-zero, the one-pole, the two-pole and the two-pole/two-zeros filters [Smith 1995]. They will be will be presented with more details in chapter 7. Typical useful parametric filter structures for first-order highpass, lowpass filters, and second order highpass, lowpass, bandpass and bandreject filters can be found in [Zoelzer 2002].

Another very popular filter coming from the analog domain is the classical analogs Sallon & Key, the State Variable Filter [Tomlinson 1991] and the Moog filter [Moog 1965]. Digital versions of the first two filters can be found in [Dutilleux 1998] and for the third one in [Stilson 1996][Huovilainen 2004][Huovilainen Valimaki 2005][Fontana 2007]. The PhD thesis of Stilson is a good reference for all these vintage filters [Stilson 2006].

In computer music the filters is desired to be adjustable. This means that the filters should be “usefully”

and efficiently varied in real time. All of the previous filters that are suitable for computer music applications respect this requirement. A very classical time-variant filter effect is the wah-wah bandwidth [Dutilleux Zoelzer 2002b][Loscos Aussenac 2005] [Smith 2008].

(52)

2. 1. 3 De la y- b as ed Ef fe cts

The delay-based effects are probably the most common effects. The main functional unit is the digital delay line [Smith 2005][Dutilleux Zoelzer 2002]. Fractional delay lines are used when delays of the input signal with non-integer values of the sampling interval are necessary. Interpolation functions are applied between samples in order to achieve smooth delay length changes [Laakso Valimaki Karjalainen Laine 1996]. The delay line can also vary in length with time.

Another way to delay a signal is by passing it through an allpass filter i.e. filters with amplitude response one and arbitrary phase response [Mitra 2001]. A mathematical expression of a general transfer function of a finite-order, causal, unity gain allpass filter can be found in [Smith 2005]. The second order allpass sections are very useful filters due to their tunable characteristics.

Si mp le de l ay- b as ed a ud io ef f ect s

Typical delay-based effects realized with time-invariant delay lines are doubling, echo, slap back, tapped delay and the IIR/FIR/Universal Comb filter [Smith 2005][Dutilleux Zoelzer 200d][Dattorro 1997]. All these effects use a simple single delay -or a tapped delay- with different delay times from 1msec to several seconds in feed-forward or feedback configuration mixed with the original input.

When the delay line is time-variant we obtain several other interesting effects. The simple vibrato effect is just a delay modulated by a low frequency oscillator (LFO). If we use the same delay, without the LFO and with variable read-pointers / write-pointers we get the Doppler effect [Smith 2005]. The flanger effect is widely used by guitarists and is similar to vibrato except it uses the comb filter structure [Hartmann 1978][Disch Zolzer 1999][Dutilleux Zoelzer 2002d][Smith 2005][Huovilainen 2005]. The leslie effect has been simulated in [Smith Serafin Abel Berners 2002] [Disch Zolzer 1999]. An easy way to get the phaser effect is by using cascade second-order allpass sections [Smith 1984] or second order notch filters [Orfanidis 1996]. With the combination of a few modulated delay lines by random signals we get a chorus [Smith 2005][Dutilleux Zoelzer 200d].

Sp ati a l E ff e cts

According to the pioneering work of Schroeder, by combining many of the simple delay-based systems mentioned before, we can simulate the reverberation^* effects [Schroeder Logan 1961][Schroeder 1962]

[Schroeder 1970]. Schroeder introduced the comb filters and the allpass comb filters as basic

*Sound reverberation is a physical phenomenon occurring when sound waves propagate in an enclosed space. It is the result of the repeated reflections of radiated sound waves from the surfaces of the space. The direct sound and its echoes, which are delayed, attenuated and filtered replicas, give this effect that modifies the perception of the original sound by changing its loudness, its timbre and its spatial characteristics. For the study of the reverberation process many approaches have been followed and Deterministic and Stochastic models have been proposed [Blesser 2001][Gardner 1998][.

(53)

components for their simulation. Many other researchers continued in this direction and ameliorated his algorithms.

Moorer reconsidered Schroeder’s reverberator and made some improvements [Moorer 1979]. He increased the number of comb filters from 4 to 6 to effect longer reverberation times and inserted a one-pole lowpass filter into each comb feedback loop. Gardner based his reverberators on nested allpass filters [Gardner, 1992][Gardner 1998]. A similar reverberator to Gardner’s reverberator is the Dattoro’s reverberator [Dattorro 1997b].

Gerzon proposed feedback delay networks for reverberation in 1972 [Gerzon 1971]. J. Stautner and M.

Puckette suggested a similar structure for reverberation based on delay lines interconnected in a feedback loop by means of a matrix [Stautner Puckette 1982]. In the same direction, Jot developed a systematic design methodology [Jot Chaigne 1991][Jot 1992].

Smith proposed digital waveguide reverberators in 1985 [Smith 1985]. Recently those networks have been used for the simulation of spring reverberators [Abel Berners Costello Smith 2006]. The wave digital mesh and the finite difference schemes are also used widely for artificial reverberation [Van Duyne Smith 1993] [Savioja Backman Jarvinen Takala 1995] [Bilbao 2007].

Good general references for reverberation algorithms are [Gardner 1998][Rocchesso 2002][Rocchesso 2003][Smith 2005][Zoelzer 1997].

Acoustics were always a part of the musical performance space and share a very long history with the auditory arts. A sound is decomposed by the human auditory system in the following perceptual components [Blesser 2001]: (a) the identity of source, (b) its spatial location and (c) an image of the space. The image of the space contains the reverberation that provides a sense of the size and the materials of the enclosing space and the human echolocation that detects walls, objects and objects at reasonable distances. The spatial location gives the position of the source, its acoustic field and its width. A special case concerns the movements of the sound sources that are detected as changes in direction, distance and by the Doppler effect.

A variety of techniques exist for the simulation of the spatial location of a sound. The most standard ones are the panorama and the precedence effect for a stereo loudspeaker setup or headphones [Rocchesso 2002]. Audio engineers, in the mixdown process, use both techniques all the time. Distance rendering algorithms are also common which are based on the intensity of the incoming sound, the ratio of the reverberated to direct sound and the modification of the high frequencies in the sound [Dodge Jerse 1997] [Chowning, 1971]. Other advanced techniques for 3D sound with headphones or with many loudspeakers can be found in [Rocchesso 2002][Rocchesso 2003].

(54)

2. 1. 4 No n li n ea r Pr oc es si ng

The audio effects category with the name nonlinear processing includes all the time-based digital audio effect algorithms that cannot be considered linear^*: dynamic processing, simulation of nonlinear amplifiers, distortion-type effects, and amplitude modulation algorithms. All these systems create frequency components that are not present in the input signal.

Digital signal processing studies in general linear time-invariant systems (LTI). However there are several methods for the analysis and modeling of systems with nonlinearities and the domain is very wide and large. Two main categories are the nonlinear systems with and without memory [Dutilleux Zoelzer 2002].

In the domain of digital audio effects, a standard modeling approach is to consider the nonlinear system as a black box. A method for estimating nonlinear transfer functions for nonlinear systems without memory is presented in [Moeller Gromowski Zoelzer 2002]. For systems with memory, the Volterra and Wiener theories can be used [Schattschneider Zoelzer 1999].

Another more common modeling approach is based on the “digitization” of analog nonlinear processing systems [Smith 2007]. In this case an “optimal^**” numerical method has to be employed to express in discrete time the continuous-time mathematical equations that describe each of the analog signal processing circuit elements (resistors, inductors, capacitors, operational amplifiers, diodes, and transistors) [Yeh Abel Smith 2007b][Huovilainen 2004].

Probably the most important nonlinear processing units in a modern recording studio are the dynamic range controllers: compressors, expanders, limiters and noise gates. Until this point of our survey, all the types of digital audio effects, such as filters, reverbs and flangers were designed to make an obvious modification in the sound; dynamic range controllers do not! Hence, often only if you hear the original dynamic range of an input sound and compare it to the modified version will the effect be noticeable.

According to [Dutilleux Zoelzer 2002]: “Dynamics processing is performed by amplifying devices where the gain is automatically controlled by the level of the input signal”. The system performs essentially an automatic level control. Its functional components are the level measurement that can be a peak or a RMS measurement, the static curve, which is the relationship between the input level and the weighting level, and the gain factor smoothing that controls the time-responses of the system [McNally 1984]

[Zoelzer 1997][Dutilleux Zoelzer 2002]. In the systems diagram, the lower path consisting of all the previous processing blocks to derive the gain factor and to multiply the input signal (actually a delayed version of the input signal in order to compensate the delay of the lower path) is usually called the side chain path.

We give some simple definitions of those effects taken as they appear in [Orfanides 1996]:

*the definition of a linear system can be found in any signal processing book (for example [Lathi 1998])

**some important criteria for the choice of the optimal numerical method are the accuracy, the stability, the aliasing and the complexity

(55)

× Compressors are used mainly to decrease the dynamical range of audio signals (so that they fit into the dynamic range of the playback system, “ducking” background music and “de-essing” for eliminating excessive microphone sibilance).

× Expanders are used for increasing the dynamic range of signals (noise reduction, sustain time of an instrument reduction)

× Limiters are extreme forms of compressors that prevent signals form exceeding certain maximum thresholds.

× Noise gates are extreme cases of expanders that infinitely attenuate weak signals (remove weak background noise)

Recent publications concerning the digital emulation of analog companding algorithms are [Peissig Haseborg 2004] [Schimmel 2003].

A discussion on the topic of simulation of valve amplifier circuits can be found in [Dutilleux Zoelzer 2002]. The triodes and the pentodes can be modeled as memoryless functions. From the measures, we observe that they provide asymmetrical (triode) and symmetrical (pentode) soft clipping. Other more analytical references are [Schimmel 2003][Karjalainen et al. 2004][Keen 2000]. Valve amplifiers are used as amplifiers, as preamplifiers for microphones or in other effect devices such as the dynamic range controllers presented before.

Terms like overdrive, distortion, fuzz and buzz are used to describe similar effects to distorting the w aveform of audio signals. The easiest memoryless way to design distortion-type effects is by waveshaping [Schaefer 1970][Arfib 1979][LeBrun 1979][De Poli 1984][Fernadez-Cid Quiros 2001]. In chapter 7 we will present several transfer characteristics found in the relative literature. A simulation of a distortion and an overdrive guitar pedal has been reported in [Yeh Abel Smith 2007a][Yeh Abel Smith 2007b].

Amplitude and ring modulation can be used to alter periodically the amplitude of a sinusoidal signal.

[Oppenheim Willsky Young 1983][Dutilleux 1998] [Dutilleux Zoelzer 2002c]. For a low-frequency carrier signal and an audio signal as a modulator, we obtain the tremolo effect: a cyclical variation of the input signal amplitude. Other amplitude modulation-type effects using the single side-band modulator are reported in [Disch Zolzer 1999][Wardle 1998].

2. 1. 5 Ti m e-S e gm e nt Pr oc es si n g

Time-Segment processing of audio signals in general concerns algorithms that can be decomposed in three stages:

× An analysis stage where the input signal is divided into segments of fixed or variable length

× A processing stage where simple time domain algorithms are applied on these segments.

× A synthesis stage where the processed segments merge by an overlap and add procedure.

(56)

Even before the years of digital computing these techniques where implemented by electromechanical devices. Denis Gabor built one of the earliest electromechanical time/pitch-scaling devices in 1946 using optical sound recording [Gabor 1946][Roads 1991]. The first method for time or pitch scaling of an audio signal using a tape recorder appeared in 1954 on a modified tape recorder used for speech - not many years after the diffusion of the tape recorder by the end of World War 2 [Fairbanks, Everitt, Jaeger 1954][Laroche 1998]. Other similar electromechanical devices followed, like Springer’s apparatus [Springer 1955] or Phonogene [Poullin 1954]. In 1950 Pierre Schaeffer founded the Group de Recherches Musicales (GRM) in Paris and with Pierre Henry he begun the musical experiments based on the manipulation of concrete recorded sounds. Musique Concrète has made intensive use of splicing of tiny elements of magnetic tape.

The time/pitch scaling methods based or electromechanical devices inspired the time domain techniques for time and pitch scale modification of audio signals and was transposed for the first time in the digital domain by Lee [Lee 1972][Laroche 1998]. Also, in 1978 when the GRM received its first computer many processing techniques transferred in the digital domain under the Studio 123 software [Geslin 2002].

Ge n er al G ra n u la r M et ho ds a nd G ra n u la ti on E ff ec ts

Dennis Gabor proposed the idea of a granular representation of sound in 1947. Xenakis in 1971 had been inspired by Gabor to compose music in terms of grains of sound. His work motivated Roads and Truax among others to perform granular synthesis of sound using computers.

Granular synthesis constructs a sound by means of overlapping time-sequential acoustic elements. It is actually a family of techniques based on the manipulation of sonic grains [Roads 2001]. The control of the temporal distribution of the grains may be the synchronous where the grains are triggered at fairly regular time, or asynchronous. Also, the grain can be derived from natural sounds or from a sound synthesis model. Many audio signal processing methods may be grouped within the common paradigm of granular techniques, like the Short Time Fourier Transform, the Gabor Transform, the Wavelet transform, the Pitch-Synchronous Granular Synthesis, the FOF and the VOSIM method [Cavaliere & Piccialli 1997]

[De Poli & Piccialli 1991].

A very interesting possibility of the granular technique is the time granulation of sampled sounds - the granulation effects. In 1988 Barry Truax programmed several granular algorithms in real time [Truax 1988]. Trevor Wishart also designed, developed and used in his pieces various granular-type sound transformations. Many of his algorithms are presented with examples extracted from his compositions in his book Audible Design [Wishart 1994]. A short list of them is: granular reordering, granular reversal, granular time-stretching/pitch-shifting, zigzagging, shredding, looping and iteration, progressive looping, multi-source brassage, chorusing and all the effects based on wavesets like waveset distortion, waveset interleaving, waveset substitution etc.

(57)

Brassage or time shuffling – a term that includes several granular techniques - is based on the micro- splicing technique used widely, and for years, in Musique Concrète. In 1980 Bernard Parmegiani suggested at the Groupe de Recherches Musicales (GRM) that this technique could be done by computers. A simple algorithm of brassage can be found in [Dutilleux, De Poli, Zolzer 2002].

Ove rl a p-A dd Met h od s f or Ti m e S h ift i ng / Ti m e St re tc hi n g

There are many ways to perform time shifting and time stretching with the granular technique. For example we can affect the duration of the signal without changing the pitch by cloning or omitting grains. In a similar way we can shift the pitch and conserve the duration using the above technique and changing appropriately the sampling rate. These techniques were first explored in 1968 [Otis, Grossman and Cuomo 1968][Roads 1991] but they gave poor results. More sophisticated techniques, based once again on the time-segment processing paradigm, followed later. For simplicity we will call all these time shift / time stretch techniques as overlap-add methods. The most famous are the SOLA and the PSOLA method.

The basic idea of the SOLA algorithm (Synchronized OverLap Add) originally proposed by Roucos and Wilgus [Roucos & Wilgus 1985], consists of the decomposition of the input signal into equal-length successive segments of relative short duration (N=10msec to 40msec) and then re-positioning them with a time shift. Effort must be taken to avoid the discontinuities that appear at time instants where segments are joined together. In the case of SOLA we apply fade-in and fade-out on the overlapping blocks starting on the point where the two overlapped segments are maximally similar [Dutilleux, De Poli, Zolzer 2002]. Thus it is necessary to compute the similarity between the overlapping parts. The cross- correlation function is the most standard technique. For pitch-scale modification we combine this algorithm with resampling

The PSOLA (Pitch-Synchronous Overlap and Add) method [Moulines & Charpentier 1990] is a slight variation of the algorithm described above. It uses the pitch information from the incoming signal to avoid pitch discontinuities. The length of the analysis time-segments is adjusted according to the local value of the pitch. With the PSOLA algorithm we can preserve the formants. If prior to the overlap and add operation the short time segments are resampled, the formant will be modified accordingly. Thus we can change the formant position in the signal without affecting the pitch and the duration of the sound.

If we lower the sampling rate by a factor γ we raise the formants by the same factor [Laroche 1998].

A simple description of the algorithm can be found in [Dutilleux, De Poli, Zolzer 2002].

A similar approach to PSOLA algorithm is Lent’s algorithm [Lent 1988][Bristow-Johnson, 1995].

Variations of the PSOLA algorithm can be found on [Peeters 1998].

Many other algorithms exist in the family of overlap-add methods for time and pitch alteration of an audio signal except the widely used SOLA and PSOLA. All of them are based on the same idea and differ only in the choice of the segment durations, splicing and weighting windows. Some of them are the Synchronized Adaptive OverLap-Add SAOLA algorithm [Dorran ,Lawlor, Coyle 2003a], the Peak Alignment

(58)

OverLap–Add PAOLA [Dorran ,Lawlor, Coyle 2003b], the Subband Analysis Synchronized OverLap-Add SASOLA [Tan & Lin 2000], the Waveform Similarity OverLap-Add WSOLA [Verhelst & Roelands 1993], the Variable Parameter Synchronized OverLap Add VSOLA [Dorran & Lawlor 2003c] and the Transient and Voice Detecting Zero Crossing Synchronous OverLap-Add TvdZeroXSola [Pesce 2000].

2. 2 T i me -F re q ue nc y M od el s 2. 2. 1 P ha se Vo c od er

The vocoder first appeared as a voice coding technique with the name channel vocoder in 1939 by Dudley. Analog vocoders like those of Dudley’s, Moog’s, Bode’s are of this type. Flangan and Golden first described the phase vocoder in 1966 [Flanagan & Golden 1966]. The signal was represented by a sum of sine waves modulated in frequency and amplitude. The basic objective is the separation of temporal from spectral information.

Two complementary viewpoints of the phase vocoder exist: the filterbank interpretation and the Fourier transform interpretation [Dolson 1986][Portnoff 1976][Allen & Rabiner 1977]. The Fourier transform interpretation is equivalent with the Gabor transform although this approach was developed later. It can be seen as a STFT with a Gaussian function for the window [Arfib 1991][Arfib & Delprat 1993].

An important number of audio effects can be developed by the phase vocoder. Good references for those effects are [Arfib, Keiler, Zolzer 2002][Wishart 1994][Laroche & Dolson 1999][Smith 2007].

Below we name a few of them.

Filtering can be achieved by multiplying every frame by a filtering function in the frequency domain.

Before this multiplication, we have to zero pad the windowed input signal and the filtering function in the time domain to avoid the aliasing effects of the circular convolution. The time scale modification algorithm, or time stretching, consists of providing a different synthesis grid from the analysis grid.

Three pitch-shifting algorithms based on resampling and time-stretching can be found in [Laroche 1998].

In the robotization effect we put zero phase values on every STFT frame before reconstruction. If the phase of the STFT values took random values, we will get a whisper-like effect. It is also interesting to randomly change the magnitude and keep the same phase. The denoising effect is a frequency depended dynamic controller. Using a non-linear transfer function on the analysed sound we modify the intensities of the input’s frequency components. The phase is kept as it is, while the magnitude is processed to attenuate the noise. The mutation effect reconstructs a sound from the STFT of two sounds.

Diagrammatic descriptions for many other digital audio effects like spectral shifting, spectral freezing, spectral shaking, spectral undulation, spectral interpolation, spectral blurring and spectral interleaving can be found in [Wishart 1994].