Phonographic record sound extration by image processing

(1)

Department of Computer Science University of Fribourg, Switzerland

PHONOGRAPHIC RECORD SOUND EXTRACTION BY IMAGE PROCESSING

Thesis

Submitted to the Faculty of Science, University of Fribourg (Switzerland) to obtain the degree of Doctor Scientiarum Informaticarum

Sylvain STOTZER from

B ¨uren an der Aare (BE) and Geneva

Thesis N^◦1534 Imprimerie St-Paul, Fribourg

2006

(2)

the proposal of:

Prof. B´eat Hirsbrunner, University of Fribourg, Chairman Prof. Rolf Ingold, University of Fribourg, Thesis Supervisor

Prof. Martin Vetterli, Swiss Federal Institute of Technology Lausanne (EPFL) Prof. Ottar Johnsen, University of Applied Sciences of Fribourg

Prof. Fr´ed´eric Bapst, University of Applied Sciences of Fribourg

Fribourg, September 26, 2006

The Thesis Supervisor: The Dean:

Prof. Rolf Ingold Prof. Titus Jenny

(3)

Acknowledgements

I would personally like to acknowledge all those who helped me in the development and completion of this research.

First, I would like to thank my thesis advisors, Professors Ottar Johnsen, Rolf Ingold and Frédéric Bapst for their guidance in my studies and my work. They shared with me their expertise and offered fruitful discussions and valuable insights into the different topics required for this research. Their motivation and support was very helpful.

I want to thank Professor Martin Vetterli for accepting to be my fourth commit- tee member. I was very pleased that he accepted to read my work.

Many famous inventions and discoveries started with a crazy idea. I would like to acknowledge Stefano Cavaglieri and Pio Pelizzari, who got the idea to take pictures of records to archive them and who shared this idea with us.

Many people took part in this research project. I am particularly grateful to C´edric Milan and Christoph Sudan for their involvement and their achievements in developing the hardware tools used in this work. Many thanks also to Thierry Fumey for sharing with us his expertise in photography. Finally I would like to thank all the students, assistants and professors who worked on the VisualAudio project for semester work, diploma or bachelor thesis.

My colleagues at both the Ecole d’ing´enieurs et d’architectes and the University of Fribourg also oﬀered interesting discussions and helped with practical matters, for which they all deserve thanks.

This research would not have been possible without the support of all my family.

I particularly thank my wife, Ula, for her patience, understanding and encourage- ment. I would never have been able to achieve this work without her love and unconditional support. And ﬁnally, I would like to thank my daughter Sarah who let her father work during long hours and who learned very early to say ”Dad’s gone to work”...

(4)

(5)

Abstract

The phonographic record was the only way to store sounds until the introduction of magnetic tape in the early 50’s. Therefore there are huge collections of phonographic records, for example in radio stations and national sound archives. Such archives include pressed discs, which were produced in mass by record companies for commercial distribution, as well as direct cut discs obtained by the direct recording with often a great cultural value and available only as single copies. Many records are deteriorating with time. Worse, many records are in an advanced stage of decay and would be destroyed by the movement of the stylus from even the best turntables. Thus, we risk loosing an important cultural heritage, if no alternative playback system is developed.

In record players, a needle follows the position of the groove and converts it into an electrical signal corresponding to the sound. This means that the radial displacement of the groove contains the sound information. Thus the sound information is visible and it is then possible to extract it by image processing techniques. These observations lead to the VisualAudio project, which proposes an optical system to extract the sound from phonographic records. The VisualAudio concept proposes ﬁrst to take the record in picture, in order to have a photographic copy of the record and of the sound. This photographic sound copy can be stored for long term archiving. Then the ﬁlm is digitized using a specially designed rotating scanner, and the image is processed in order to extract the recorded sound. Thus this system can play records without deteriorating them and it is also able to play severely damaged records.

This work focuses on the image processing parts of the VisualAudio project. The image acquisition system is thoroughly studied to understand the image formation process as well as all kinds of degradations, which may aﬀect the ﬁnal sound quality.

Based on this analysis, a groove model is proposed, in order to develop dedicated image extraction and signal correction methods. The whole system is then evaluated to point out the strengths and weaknesses of the VisualAudio sound extraction process.

(6)

(7)

R´ esum´ e

Le disque phonographique était le principal support utilisé pour enregistrer et conserver du contenu sonore jusqu’à l’introduction de la bande magnétique au début des années 50. Ainsi il existe actuellement d’importantes collections de disques phonographiques dans les archives nationales, les stations radio et les phonothèques. Ces archives possèdent des disques pressés, produits de fa¸con industrielle pour la distribution commerciale, ainsi que des disques gravés, obtenus par l’enregistrement en direct de conférences ou d’émissions radios. Les disques gravés sont ainsi des copies uniques, qui ont souvent une grande valeur culturelle. Mais beaucoup de disques se détériorent avec le temps, et certains sont tellement dégradés, qu’ils se décomposeraient au seul contact d’une aiguille de tourne-disque. Nous risquons ainsi de perdre une importante partie de notre patrimoine culturel, si nous ne trouvons pas un nouveau système de lecture pour ces disques.

Dans les tourne-disques, une aiguille suit le parcours du sillon le long du disque et convertit le déplacement latéral en un signal électrique correspondant au son. Cela signifie que le déplacement latéral du sillon contient l’information sonore. Ainsi le contenu sonore est visible et il est donc possible d’extraire le son par des techniques de traitement d’image. Ces observations sont à la base du projet VisualAudio, qui propose une méthode optique pour extraire le contenu des disques phonographiques.

L’idée du projet VisualAudio consiste à prendre tout d’abord une photographie du disque, afin d’en conserver une copie pour un archivage à long terme. Ce film est ensuite numérisé à l’aide d’un scanner rotatif con¸cu spécialement dans le cadre de ce projet, et l’image numérique est analysée afin d’en extraire le contenu sonore. Ainsi VisualAudio permet d’extraire le son des disques sans contact avec leur surface. Ce système est également capable d’extraire le son de disques gravement endommagés.

Ce travail de thèse se concentre sur la partie traitement d’image du projet Vi- sualAudio. Le système d’acquisition d’image est tout d’abord analysé en détail, afin de comprendre le processus de formation d’image ainsi que tous les types de dégradations qui peuvent affecter la qualité de l’image et du son extrait. Sur la base de cette analyse, nous proposons un modèle de sillon qui est utilisé pour développer les méthodes d’extraction du son et de correction du signal. Le système est ensuite

´

evalué afin de comprendre les avantages et inconvénients du système de lecture et d’archivage VisualAudio.

(8)

(9)

List of Figures

1.1 The VisualAudio concept . . . 3

1.2 Lofargram . . . 8

2.1 Phonautograph . . . 14

2.2 Stereo recording . . . 15

2.3 Records top views . . . 15

2.4 Records proﬁle views . . . 18

2.5 RIAA pre and de-emphasis curves . . . 23

2.6 Acetate record with shrinkage of the recording layer . . . 26

3.1 Photographic D-log E curve . . . 31

3.2 Camera design . . . 35

3.3 Light reﬂection angle . . . 36

3.4 Acquisition part of the VisualAudio scanner . . . 38

3.5 Transmission system of the scanner . . . 39

3.6 Scanner acquisition . . . 41

4.1 Light reﬂection on the groove . . . 44

4.2 Sample of a groove acquisition . . . 45

4.3 Picture of record with scratch . . . 53

4.4 Record with fungus . . . 53

4.5 Samples of image degradations . . . 54

4.6 Disc with shrinkage of the recording layer . . . 54

4.7 Pepper fog . . . 56

4.8 Noise standard deviation vs signal to noise ratio . . . 59

4.9 The light reﬂected to the lens by an unmodulated groove . . . 62

4.10 Groove cut views of the light reﬂection . . . 63

4.11 Top view of the record and light source . . . 64

4.12 Indirect lightening . . . 65

4.13 Indirect light simulation . . . 65

4.14 Indirect light on a photography . . . 66

4.15 Light section illuminating an unmodulated groove . . . 67

4.16 Planes deﬁnition for the scanning . . . 68

4.17 Camera shift along the Y-axis . . . 69

4.18 Camera rotation in the XZ-plane . . . 70

4.19 Aligned and misaligned camera . . . 71

(16)

4.20 Sound signal with a misaligned camera . . . 71

4.21 Film oﬀ-axis . . . 74

4.22 Sampling with ﬁlm oﬀ-axis . . . 75

4.23 Basic luminance model . . . 77

4.24 The groove model . . . 78

4.25 Linear and non-linear part of the transition . . . 79

5.1 The trace and groove width show important variations . . . 83

5.2 Asymmetric traces . . . 84

5.3 Ring images gradients . . . 84

5.4 Pseudo-code of the groove extraction algorithm . . . 86

5.5 Samples of groove intensity proﬁles . . . 87

5.6 Line of acquisition . . . 89

5.7 Coarse scale edge detection . . . 91

5.8 Edge detection by local thresholding . . . 92

5.9 LMS edge detection: point selection . . . 93

5.10 LMS edge detection: point selection . . . 94

5.11 Candidate selection . . . 98

5.12 Ranges around the traces . . . 100

5.13 Orientation of the record spiral on the acquired image . . . 102

6.1 Acquisition performed with non-homogeneous illumination . . . 109

6.2 Acquisition performed with homogeneous illumination . . . 109

6.3 spectra for wow correction . . . 113

6.4 Corrupted pixel map . . . 119

6.5 Small degradations map . . . 122

6.6 Large degradations map . . . 123

6.7 2-passes trace extraction . . . 125

6.8 Shrinked record’s acquisition zoom . . . 130

6.9 Magniﬁed view of a 40×40 pixels ring image acquisition . . . 131

7.1 Frequency response of the digital diﬀerentiator ﬁlter . . . 135

7.2 Analog and digital RIAA de-emphasis curves . . . 138

7.3 Digital combined (derivative and RIAA) ﬁlter response . . . 140

7.4 Multistage resampling . . . 141

8.1 Measures of the ST D between consecutive samples . . . 149

8.2 Normalized diﬀerence between two consecutive acquisitions . . . 152

8.3 Spectra of signals acquired with diﬀerent lenses . . . 154

8.4 ST D₁ measured with diﬀerent opening times . . . 155

8.5 ST D of unmodulated grooves with diﬀerent opening times . . . 155

8.6 Spectrum of acquisitions with varying opening times . . . 156

8.7 Spectrum of 78 rpm acquisitions with different scanning frequencies . 157 8.8 Spectrum of 33 rpm acquisitions with different scanning frequencies . 158 8.9 ST D of unmodulated grooves with different scanning frequencies . . 159

8.10 SN R₁ and T HD₁ measured with diﬀerent light levels . . . 160

(17)

LIST OF FIGURES XIII

8.11 SN R_c measured on twelve groove circumvolutions . . . 161

8.12 SN R_q measured over six consecutive groove circumvolutions . . . 162

8.13 Quality of the top and bottom edges . . . 163

8.14 Pepper fog on the inner and outer tracks . . . 165

8.15 Spectrum of the inner and outer tracks of a 33 rpm . . . 166

8.16 ST D measured on the inner and outer tracks . . . 167

8.17 Spectrum of the inner and outer tracks of a 78 rpm . . . 168

8.18 Acquisition of a sweep with diﬀerent edge detection methods . . . 171

8.19 Spectra for diﬀerent methods of signal correction . . . 173

8.20 Shrinked record, which content was extracted with VisualAudio . . . 173

8.21 Mould record, which content was extracted with VisualAudio . . . 174

8.22 Local view of the acquisition of a record aﬀect by mould . . . 175

(18)

(19)

List of Tables

1.1 UNESCO Survey of Endangered Audiovisual Carriers . . . 2

2.1 Audio recording history . . . 14

2.2 Equalization chart for 78 rpm records . . . 21

2.3 Equalization chart for 33 rpm records . . . 22

2.4 Record characteristics . . . 26

2.5 Groove geometry . . . 27

2.6 Reading stylus properties . . . 27

2.7 Wavelength on the record . . . 28

2.8 Dynamic and frequency ranges . . . 28

2.9 Signal to noise ratio performance . . . 28

3.1 Groove width on the ﬁlm . . . 32

3.2 Films gamma and edge standard variation . . . 33

3.3 Comments about the ﬁlms . . . 34

3.4 Light reﬂection angle . . . 37

3.5 Image and audio sampling frequencies . . . 41

3.6 Eﬀective sound sampling frequencies . . . 42

3.7 Integrated areas according to the radial position . . . 42

4.1 Properties and resolution of the 420 mm lens . . . 46

4.2 Scanning blur with DOF = 10 µm . . . 47

4.3 Scanning blur with DOF = 30 µm . . . 47

4.4 Sampling blur with a 10× magniﬁcation optics . . . 49

4.5 Sampling blur with a 4×magniﬁcation optics . . . 50

4.6 Maximum noise standard deviation allowed . . . 60

4.7 Variation of theSN R andT HD for camera rotations in the XY-plane 73 7.1 Eﬀective audio sampling frequencies . . . 140

7.2 Resampling to 44.1 kHz . . . 142

7.3 Resampling to 48 kHz or 96 kHz . . . 142

8.1 Description and references of the records used for the evaluation . . . 146

8.2 Description of the various tracks used for the evaluation . . . 146

8.3 Measurement domains for the SN R, T HD and ST D . . . 150

8.4 Real magniﬁcation of the lenses . . . 153

(20)

8.5 SN R and T HD measurements for signals acquired with diﬀerent

lenses . . . 153

8.6 Peak and harmonics of signals acquired with diﬀerent lenses . . . 153

8.7 SN R₁ measured on acquisitions performed using diﬀerent scanning frequencies . . . 157

8.8 ST D₁ measured on the four edges of a groove . . . 163

8.9 SN R_total for the outer and inner edges . . . 164

8.10 ST D₁ for the outer and inner edges . . . 164

8.11 Harmonics for the outer and inner edges . . . 164

8.12 SN R₁ measured on the inner and outer tracks . . . 165

8.13 T HD₁ measured on the inner and outer tracks . . . 166

8.14 ST D₁ measured on the inner and outer tracks . . . 166

8.15 SN R with and without camera calibration . . . 167

8.16 Edge detection methods used in the current evaluation test . . . 169

8.17 SN R_total with diﬀerent edge detection methods . . . 169

8.18 SN R₁ with diﬀerent edge detection methods . . . 169

8.19 T HD_total with diﬀerent edge detection methods . . . 169

8.20 ST D₁ with diﬀerent edge detection methods . . . 170

8.21 Acquisition statistics using diﬀerent edge detection methods . . . 170

8.22 SN R_total with diﬀerent methods of signal correction . . . 172

8.23 SN R₁ with diﬀerent methods of signal correction . . . 172

8.24 T HD_total with diﬀerent methods of signal correction . . . 172

8.25 Ratio of the recorded sound duration to the processing and scanning time . . . 176

8.26 SN R1 measured on acquisitions with diﬀerent bit depth . . . 177

8.27 Signal to noise ratio measured according to the NAB standard . . . . 178

8.28 Comparison of the signal to noise level . . . 179

(21)

Chapter 1 Introduction

1.1 The VisualAudio project

The sound is propagated in the space by air pressure variations. Thus in the ﬁrst phonographic sound recorders, a diaphragm vibrated to these changes in air pressure.

This diaphragm was connected to a cutting stylus and the diaphragm’s movements were then written as a modulated groove on a record. At playback, a needle followed the groove and transmitted its movements back to the diaphragm, which produced air pressure variations again.

The phonographic recording technology was introduced in the late nineteenth century and was then widely used during the twentieth century. Thus sounds archives now own large collections of phonographic records, including direct cut discs, which were used to record speeches and conferences as well as to edit and mix sound recordings for radio stations, and pressed records, which were produced in mass for commercial distribution. Sound archives face numerous problems with phonographic records. To get an overview of these problems, the IASA (Interna- tional Association of Sound and Audiovisual Archives) carried out a ”Survey of Endangered Audiovisual Carriers” in 2003 at the request of the UNESCO (United Nations Educational, Scientific and Cultural Organization). They asked 2093 institutions in 184 countries to classify their audio carriers in three categories. This survey, which results are summarized in Table 1.1, is not intended to be an accurate, scientific piece of research; but it gives an interesting picture of the various audio carriers rate of decay [1]. This survey confirms the conclusions from practical expe- rience and from past surveys: acetate direct cut discs are the audio recordings most at risk. This risk is compound by the fact that the vast majority of acetates are unique recordings, and that substantial numbers of acetate discs are being lost each year because the final stage of the decay is unpredictable and catastrophic. Thus there is an urgent need to transfer the content of all these records on a new audio carrier. Moreover, decaying records are no more playable with usual turntables and require a contactless reader and an automated system to follow the groove in case of important surface damages.

Digitizing records with high quality turntables is a satisfying solution to preserve the content of many records that are still in good condition; but the transfer time

(22)

Audio carriers Nb of institutions / Nb of items

In good condition

Giving some concerns

Obviously decaying Cylinder recordings 20 / 43965 14.65 % 58.73 % 26.62 %

Shellac discs 41 / 614935 95.06 % 4.93 % 0.02 %

Direct cut discs 23 / 60332 2.84 % 35.03 % 62.13 %

Vinyl discs 55 / 1855120 88,43 % 11,56 % 0,01 %

Magnetic tapes 49 / 2161941 76,94 % 21,28 % 1,78 %

Recordable CDs 48 / 193062 86,94 % 9,95 % 3,11 %

Audio CDs 52 / 1128400 95,36 % 4,58 % 0,06 %

R-DAT digital tapes 29 / 198477 45,40 % 27,85 % 26,75 %

Table 1.1: Rate of decay classiﬁcation for some audio carriers, based on the UNESCO ”Survey of Endangered Audiovisual Carriers”. These results must be considered with caution, given the low rate of response, the loosely deﬁned categories, the variable number of items and some results variations with former surveys [1].

vary considerably according to the record’s state of conservation. According to the IASA ”Guidelines on the Production and Preservation of Digital Audio Objects”, the average transfer time for a 3 minutes 78 rpm record is of 45 minutes, including the time to ﬁnd the correct settings for the equipment and the choice of the reading stylus based on the recording analysis [2]. But while a complex transfer may easily take 20 hours, the mass saving of records in good conditions with experienced people take only around 10 minutes per face, according to the Swiss National Sound Archives. However, digitizing analog audio recording (such as phonographic discs) is a controversial topic: the analog to digital transfer is lossy and people always debate to know what are the minimal requirements (in terms of bit depth and sampling frequency) for a digitizing to restore a perfect sound [3]. This debate incites some sound archivists to wait for newer, more powerful technologies for digitizing, at the risk of getting some of their records decaying with time. Another interesting conclusion of the UNESCO survey is that the most endangered carriers are not necessarily the oldest: as presented in Table 1.1, some of the digital media used in the last twenty years give more concerns than the shellac records produced between 1890 and 1950 [1].

To bypass the analog/digital debate, Stefano Cavaglieri, who works at the Swiss National Sound Archives, wanted to keep an analog copy of a record. Since the groove modulations are visible on the record surface, he proposed to take a photograph of a disc, in order to keep an analog copy of the recorded sound. The problem was then to be able to playback the photograph to reproduce the sound. This idea gave birth to the VisualAudio concept, which is presented in Figure 1.1 and consists in three steps [4, 5, 6]:

1. An analog picture of each side of a disc is shot. The photographic film must have a high spatial resolution and be as large as possible (about 1:1), in order to catch the finest details of the groove. This process can be done quickly. The film is cheap, and can be stored for a long time (more than 100 years). That way, the sound information is preserved in case the original discs deteriorate.

2. When one wants to recover the sound, the ﬁlm can be digitized using a specially designed rotating scanner.

(23)

1.1 The VisualAudio project 3

3. The sound must then be extracted from the digital image. This requires image processing techniques to extract the radial displacement of the groove, which contains the sound. Additional processing are applied to detect cuts and to correct other defects. Digital signal processing must be applied to the groove signal to extract the sound.

The VisualAudio process is contactless, and can then be used on any records, even broken or decaying. But the photographic sound storage system presents also some other advantages. On an optical record reader, the groove position must be evaluated very accurately, requiring high image resolution that can be reached only with a microscope lens. Unfortunately the groove depth exceeds the microscope depth of field, and many discs have some warping which is even more important than the groove depth. Using the intermediate photography step allows us to work with a larger depth of field while imaging the disc, but ensures that the image to be digitized (the film) fits in the reduced depth of field required by the microscope’s optics. Time is also critical issue in such an archiving system: the challenge is to save a large amount of records quickly, before their complete physical destruction.

Taking a picture of the discs is a quick way to store a copy of the sound content in its current stage of conservation. The sound extraction could then be done on demand, without time pressure.

Figure 1.1: The VisualAudio concept.

Thus the main advantages of the VisualAudio system compared to conventional turntables can be summarized as follows:

• The record is preserved with an analog copy (the ﬁlm).

• The analog copy can be stored for long term archiving.

• No need to use an autofocus system as the photography depth of ﬁeld is much larger than the microscope lens depth of ﬁeld.

• The archiving time (time to take the photograph) is low and thus well adapted for mass archiving.

(24)

• The archiving time is considerably lowered for damaged records.

• The optical playback system is not intrusive and does not damage records.

• The optical playback system is able to read records that are currently considered as unreadable with conventional turntables.

Based on this concept, three diploma works have been performed at the Uni- versity of Applied Sciences of Fribourg: [7, 8, 9]. During these works, the students took a few high resolution pictures of records and built a ﬁrst scanner prototype to digitize these photographic ﬁlms. They were able to extract some sound samples out of these images, and thus they showed the feasibility of the VisualAudio concept.

Based on these preliminary works, four objectives were deﬁned for the VisualAu- dio project in order to build a fully operational system:

1. Eﬃciently transform a collection of discs into a ﬁlm archive.

2. Scan the ﬁlms with high resolution.

3. Accurately extract the groove position out of the image.

4. Extract good quality sound from record pictures.

To reach the first two objectives, new hardware have been designed. A dedicated camera has been built to match the specific requirements for disc pictures and a new scanner has been developed, based on the flaws and knowledge acquired with the previous prototype. These hardware devices are fully described in [10] and will be further presented in this work.

The last two objectives are the main topics of this thesis and are detailed in the following section.

1.2 Objectives of this thesis

The current work focuses on the image processing and sound extraction parts of the VisualAudio project, in order to get the best and most faithful sound quality.

Extracting the sound from photographs of disc records is an original process which has not been studied up to now. Moreover, and to the best of our knowledge, the VisualAudio was the first project which uses 2D image processing to extract sound from phonographic records. Thus this thesis presents a new approach, which first requires a theoretical analysis. Therefore the first objective of this work is to fully analyze the image acquisition system, which includes the photography and scanning processes. This analysis encompasses the following steps:

• Investigate the phonographic record technology and point out the groove characteristics.

• Analyze the resolution and the distortions produced by the picture taking stage, including the eﬀect of the light source and the optics.

(25)

1.3 Related works 5

• Study the resolution and the properties of photographic ﬁlms.

• Analyze the distortions produced at scanning. This includes the eﬀect of the light source, the optics as well as geometric distortions and mechanical perturbations.

This theoretical analysis points out the VisualAudio system’s limitations and establishes the necessary foundations to design the operational camera and scanner prototypes. It is also used to develop a groove model, which is the base for the image processing algorithms. The resulting model must represent the evolution of the groove information from the physical disc to the extracted sound.

Image processing algorithms are then developed to extract the groove accurately and to correct the degradations, based on the groove model and on the theoretical analysis. Denoising and sound restoration are well known research domains, as presented in [11, 12, 13, 14], and several high quality denoising software and hardware are commercially available [15, 16]. Therefore, this work doesn’t focus on classical sound restoration methods, which consider the sound as a one dimensional signal;

but it tries to concentrate on the speciﬁc part of this sound extraction application, which is the two dimensional information provided by the image. We try to reduce the eﬀect of the damages, noise and uncertainty using the contextual information available on the image.

The final objective is to perform practical experiments with the available camera and scanner, in order to validate the model. Mass testing must then be performed to validate the process for many different records showing different types of degradations. The final sound quality depends on the record stage of conservation. Therefore the aim is to reach good sound quality for records in good conditions and to get understandable speeches for transcription. These experiments will then determine the reachable quality and usability of the VisualAudio system.

1.3 Related works

The VisualAudio project proposes to make analog copies of old records by the mean of photography, in order to read these records contactless. Thus this process basically consists in two things: to store sound on a film and to extract a signal out of an image. Several works have already performed in these two fields: Subsection 1.3.1 describes some existing applications which store sounds on film and Subsection 1.3.2 provides an overview of applications, which aim is to accurately extract a signal out of an image. Subsection 1.3.3 describes the existing contactless optical techniques that have been proposed up to now to read mechanically recorded sounds.

1.3.1 Sound on ﬁlm

Eﬀorts to store sounds by the mean of photography have already been performed a long time ago: in 1878, E. W. Blake published a paper on ”A Method of Recording Articulate Sounds by Means of Photography”. He photographed the vibrations of a

(26)

microphone diaphragm by means of a mirror which reflected a beam of light. In 1902, W. Duddell filed a patent application covering a method of variable area recording and reproducing, under the title of ”An Improved Phonograph”. These were the beginning of the sound-on-film technology which will later be used for the movie soundtracks. Two techniques were basically used for optical soundtracks: variable density and variable area. For the variable density, a shutter was used to vary accurately the amount of light that reached and exposed a moving film. Variable area tracks were generated by exposing the film with a constant light source and a gate whose aperture was modulated by the audio signal. Variable area became the standard in the movie format since the 1930’s [17, 18, 19, 20].

The width of the soundtracks recorded area is 1.9 mm and the film moves at 0.8 meter per second. For a quick comparison with a phonographic record in terms of resolution, the moving speed of the film soundtrack is about twice the speed of the outermost groove of a 78 rpm record and the amplitude of the audio signal is twelve times larger than on a record. This means that the soundtracks reading system requires less accuracy than needed for the VisualAudio project, and that soundtracks are also less affected by the noise produced by the graininess of the film.

Digitizing of soundtracks leads to image processing techniques, which are similar to the ones presented in the present work: a sound signal is extracted out of an image containing modulated edges. For example, Technicolor proposes an image restoration technique for movie soundtracks, resulting in better audio reproduction quality than usual reading and sound restoration techniques [21].

It should be noticed that the ﬁlms used for recording variable area optical soundtracks present the same properties than the ﬁlms chosen for the VisualAudio project:

black and white orthochromatic ﬁlms with a high contrast, for example: Agfa Sound ST9 or Kodak EASTMAN EXR Sound Recording Film 2378.

1.3.2 Signal extraction by image processing

A 1D signal can be basically stored either in the grey level variation of a 2D image or in the modulation of a line over the image. Several applications need to extract such a signal out of an image. The next two paragraphs provide two examples: bar code reading, where the signal is extracted from the grey level variations of a 1D image, and Lofar lines detection, where the signal is stored in the modulation of a line on a 2D image. These examples also show the importance of the a priori knowledge of the acquired image and the contained signal properties.

1.3.2.1 Bar codes

The bar code reading is a typical application, which needs to extract a signal stored in the varying grey levels of a 1D image. Bar codes are a ﬁnite sequence of parallel light and dark lines of variable width, where the information is encoded in the width of the dark stripes. The basic structure of bar codes contains start and stop patterns, as well as one or two check characters. Thus the bar code reader must retrieve the

(27)

width of the black lines in order to reconstruct the original code. Knowledge of the bar code basic structure also provides a great help to read and validate the extracted signal.

Standard decoding techniques are based on classical edge detectors, such as zero crossings of the second derivative. Using the detected edges, the width of each stripe is estimated separately to find the narrowest width, and then to map the ratio of all stripe widths over the narrowest into an integer sequence [22]. While the standard supermarket scanners are equipped with mirrors and retrieve signals from multiple angles, handheld scanners are much more subject to image degradation: images may be affected by the ambient light or blurred, depending on the distance between the reader and the surface where the bar code appears. Thus there was a need for more powerful recognition algorithms, using some more a priori knowledge of the signal to extract. Fortunately, bar codes are functions that have a very specific form:

square signals with constant amplitude. Knowledge of this form is highly useful to elaborate more sophisticated techniques.

Esedoglu estimated the unknown parameters of its deblurring kernel using global information contained in the observed signal: the square pulse form as well as the constant amplitude property of the bar code. His model is well suited for recovering bar codes from very blurred and noisy images [23]. Joseph et al. based their recognition algorithm on peak detection instead of edge detection. They stated that for the speciﬁc case of blurred bar codes images, the waveform peaks are a more reliable feature than edge information, due to their tolerance to convolution. In case of blur, while the edges from close stripes will interact, the peaks will not be aﬀected [24].

1.3.2.2 Lofar lines detection

Lofar is an acronym for Low Frequency Analysis and Recording. The objective of passive sonar system is to detect the presence of signals emitters in underwater acoustic ﬁelds. Such systems record acoustic signals using an array of hydrophones.

These signals are beamformed, spectrum analyzed and result in an image of frequency power versus time, commonly referred as lofargram. Signal emitters are characterized by ﬂuctuating curves in the lofargram, which are called spectral lines.

By extracting the spectral lines on a lofargram, it is possible to determine the acoustic source of the sound. But there is a tradeoﬀ involved in this processing: lofargram images are very noisy, but the lines must be detected with very ﬁne structure to get a good localization. Di Martino et al. presented a three steps algorithm to detect Lofar lines [25]:

• Edge detection

• Region location

• Line tracing process

As the images are very noisy, the edge detection step is performed on highly ﬁltered images to ensure a good detection, at the expense of a poor localization. This allows detecting regions of interest, where an accurate line tracing process is further applied.

(28)

(a) (b)

Figure 1.2: (a) Lofargram image containing two spectral lines and (b) extracted spectral lines [25].

The line tracing and gap bridging process is based on the low frequency particularity of the signals: the curvature presents only slow variations and continuous segments are close to each other. A lofargram and the extracted patterns are displayed on Figure 1.2.

1.3.3 Phonographic optical playback systems

The ﬁrst sound recording and reproducing methods consist in mechanically engraving a modulated groove in a storage medium, which can be either a cylinder or a disc. However, the mechanical playback may damage some old fragile carriers.

Moreover, many mechanically recorded media are subject to physical degradations and are therefore no more playable. Thus there is a real need for optical playback techniques. Such techniques can be classiﬁed in three categories: the methods in- volving an optical device to follow the groove, the 2D image processing methods and the 3D techniques which acquire a surface map of the recorded medium, where the depth information can also be used to extract the sound.

It should be noticed that all of the below mentioned projects directly scan the recorded medium, extract the sound from the image and output a digital sound ﬁle.

None of them provides an intermediary analog storage medium such as proposed in VisualAudio.

1.3.3.1 Optical stylus

In a first category of systems, a mechanical device is used to follow the groove, but with a much lower weight than a pickup stylus. This low mass stylus reflects light out of the groove to a measurement tool, and the sound is then extracted from the measured position of the light spot. Poliak used an optical fiber to follow the groove from old records [26]. He reached good quality results: for modern records, the background noise is of the same order of magnitude than with a classical system. But operator’s interventions are still needed to position the optical fiber on crackled records and to clean the dust that pile on the optical stylus. Poliak’s optical fiber system is currently in use at the Swiss National Sound Archives and the Radio Suisse Romande. Petrov also worked with an optomechanical method, using

(29)

a laser interferometer system to read Edison cylinders [27]. Working at low rotation speed required consequent mechanical and anti-vibrations devices, but resulted in important noise reduction.

In a second category of project, the optical stylus is contactless. The Syracuse University Library Radius Project developed an optical playback system for cylinder recordings using a light beam instead of a mechanical stylus: an interferometer gets a precise measurement of the Doppler frequency shift caused by the vertical modulation of the groove in the cylinder [28].

Optical turntables using a laser beam to follow the groove of phonographic records are also commercially available [29].

All of the above mentioned techniques still follow one circumvolution of the groove at a time, which is not a suitable technique for broken or severely damaged discs, where the groove has large discontinuities.

1.3.3.2 Image processing

The second optical method to read mechanically recorded sounds is to acquire a 2D high-resolution image of a medium, to process the image in order to measure the groove displacement and to extract the sound. One of the advantages of the image processing method is that an acquired image contains several circumvolutions of the groove, and thus it is possible to automate the correction of larger discontinuities of the groove. It should be noticed that such image processing methods are not applicable to the vertical cut grooves, as the depth information is (almost) absent on a 2D image.

Several personal work and students projects using desktop scanners are described on the internet [30, 31, 32, 33]. They all used a desktop scanner to acquire the record images, which generates several problems:

• Due to the size of the scanner, the record must be digitized in several slices to get the whole record. Therefore the extracted sound from the diﬀerent slices must be later realigned and processed to remove the overlapping parts between the slices.

• The image must be warped to transform the circular slices into rectangular images. Another way to perform this step is to transform the X-Y coordinates of the extracted sound signal into the polar coordinates.

• The used desktop scanners have a too low resolution: 1600 to 2400 dpi. To get a good audio quality, a resolution of about 1 µm is needed, which would mean 25000 dpi.

The groove detection algorithms proposed in these works are pretty similar and can be summarized in the following steps:

• Acquisition of the record image slices using the desktop scanner.

• Image warping to transform the X-Y into polar coordinates.

(30)

• The groove positions are located by the maximal intensity pixels at each sampling time.

• Track following: a groove is built of consecutive samples in the time direction, which lay in a close neighborhood in the radial direction.

• Alignment of the tracks resulting from the diﬀerent slices. The ﬁnal extracted sound quality of these projects is pretty low, which is mainly due to the non- adapted hardware

The ﬁnal extracted sound quality of these projects is pretty low, which is mainly due to the non-adapted hardware.

PrestoSpace is a project which objective is to provide technical solutions and integrated systems for digital preservation of all types of audiovisual collections [34]. An optical phonographic record player has been developed as part of the PrestoSpace project. The principle of this system is to illuminate the walls of the groove through a condenser that provides an angle-dependent lighting color. A CCD camera captures the resulting images, where the orientation of the groove is represented by color coding. The advantages of this system is that it requires low resolution (about 10 um), and that it captures the groove orientation and therefore does not need any numerical diﬀerentiation to get the recorded sound. However, this system seems to be very sensitive to dusts and groove deformation.

Fadeyev et al. used methods derived from their work on instrumentation for particle physics to recover audio data from discs and musical cylinders. They developed a 2D imaging system to extract sound out of the bottom of the groove from monaural phonographic records. They work with a rotating scanner and an auto focus system to stay in the limited depth of field required by the microscope video zoom. This system results in good quality sound extraction, and important efforts are now underway to decrease the processing time to around 10 minutes and to develop a final tool for mass digitizing [35].

1.3.3.3 3D method

Fadeyev et al. also introduced a 3D surface measuring technique, using confocal scanning microscopy [36]. This results in a surface map, which is used to detect the groove position as follows:

• Minima candidates are identiﬁed as data points which were not higher than the four nearest points.

• These candidates are removed, using the inherent data periodicity of the known groove structure.

• A parabolic ﬁt is applied on the deepest point of the ”valleys” of the map.

• The list of ﬁt minima is ﬁltered using the inherent data periodicity of the known groove structure.

(31)

1.4 Overview of this dissertation 11

The downside of this 3D technique is the long acquisition time, which may last up to one day for one side of a record. But it also presents several advantages as the resulting sound extraction quality is better than the 2D systems, and as it allows reading of vertical cut discs and cylinders, which is not possible with a 2D technique.

Further investigations in this domain are underway to reach better accuracy in the mechanical displacements and lower processing time.

Lutz et al. enhanced this 3D technique and adapted it for the special case of dictation belts, which grooves present relatively low excavations [37].

1.4 Overview of this dissertation

This thesis is composed of nine chapters. Chapter 2 gives an overview of the phonographic recording technology and describes the disc and groove characteristics that are useful for the current work. A detailed analysis of the VisualAudio acquisition system is given in Chapter 3. This description focuses on the physical devices and on the sampling process. It encompasses the camera, the photographic ﬁlm and the scanning stage. Chapter 4 provides then an in-depth analysis of the imaging chain in terms of resolution and image degradations. This analysis results in the deﬁnition of a groove model. Based on this model, Chapters 5 and 6 present the image processing algorithms, which perform the groove extraction, respectively the signal corrections. Once the groove has been correctly extracted from the image, the detected positions are transformed in a sound signal, as explained in Chapter 7.

Chapter 8 is an evaluation of the extraction process and of the VisualAudio system.

Finally, the conclusions drawn from our experiments are discussed in Chapter 9.

(32)

(33)

Chapter 2 Phonographic recording

This chapter describes the phonographic technology. Subsection 2.1 ﬁrst presents a brief history. The diﬀerent technological aspects of the phonographic recording are then presented in the following subsections. This description is not intended to be exhaustive; however it points out the main features that are of interest for an optical sound extraction process. Interested readers are referred to [38] and [39] for more details about the phonographic technology.

2.1 History

The phonographic recording technology was progressively introduced in the nineteenth century. In 1857, Léon Scott created the first machine to record sound: the phonautograph (Figure 2.1). This device only made a visual image of the sound waves on a cylinder, but was not able to play or reproduce sounds. Thus the sound reproduction technology really started in 1877, when Edison recorded human voice for the first time on a tinfoil cylinder phonograph, and played it back using the same device. Berliner introduced the gramophone in 1887, and used discs instead of cylinders to store sounds. Discs were more resistant, more economical to produce and easier to store; thus they gradually overtook the cylinder as the dominant medium for sound recording.

Although the phonograph denotes Edison’s cylinder player, it is commonly used as a generic term for any early sound reproducing machine and Emile Berliner’s gramophone is then considered a type of phonograph. Therefore, we decide to use the term ”phonographic” to describe the disc recording technology in this work.

Phonographic records played a major role in the sound recording history, as it was the only means to store sound until the popularization of the magnetic tape in the late 40’s. Thus direct cut disc recording was the only means to archive radio broadcasting, conferences and speeches until the 50’s. The LP (Long Play) 33 rpm (rotations per minute) still remained the most popular audio media until the commercial introduction of the compact disc in 1982. Table 2.1 brieﬂy presents the more important dates and facts in the audio recording and phonographic history.

(34)

Figure 2.1: Leon Scott’s phonautograph: a large horn captures the sound and transmits these vibrations are to a needle, which engraves a modulated groove on a rotating cylinder.

1857 L´eon Scott creates the phonautograph

1877 Edison makes the ﬁrst recording on a tinfoil cylinder phonograph

1887 Berliner develops the gramophone, and introduces the disc as a recording media 1925 Electrical ampliﬁcation is introduced for recording and playback

1925 First 33 rpm (rotations per minute) records

1929 Transcription on 33 rpm records for the use of radio stations 1929 Use of vinyl for record manufacturing

1931 Blumlein patents the stereo recording method 1935 Introduction of the magnetic tape recording

1947 Commercialization of the magnetic tape recording on a large scale 1948 Introduction of the microgroove and LP (Long Play) 33 rpm 1949 Introduction of the 45 rpm microgroove

1954 Standardization of the equalization process (RIAA) 1972 First digital recording

1982 Commercial introduction of the compact disc

Table 2.1: Important dates and facts in the phonographic and audio recording history [40, 41].

2.2 Mechanical recording

Mechanical sound recording techniques store the signal as the groove modulation over the surface of a carrier, which can be a cylinder, a disc or a strip.

The groove modulation of monophonic records is either vertical or lateral. Ver- tical recording, which is also called hill-and-dale, was used mainly for cylinders, but only by a few record manufacturers (Edison, Diamond Disc, Path´e...)[2]. It rapidly appeared that lateral recording presents less distortion and thus a better sound quality. Therefore the vertical recording has been abandoned and it will not be further described in the current document.

For stereophonic records, the two channels must be stored separately on the same groove using a combination of vertical and lateral recording. The quality diﬀerence between the lateral and vertical recording system led to a mixed system called 45/45 system: the groove has a triangular shape and each groove’s wall has a 45 angle with the record surface. Left channel signal is stored in the displacement of the groove inner wall perpendicularly to its plane, while the right channel signal is stored in the outer wall displacement (Figure 2.2). Signals that are in phase in both

(35)

2.2 Mechanical recording 15

Figure 2.2: Stereo groove cut view: the left channel signal is stored in the displacement of the inner wall of the groove perpendicularly to its plane, while the right channel signal is stored in the outer wall displacement.

(a) (b)

Figure 2.3: Top views of (a) a 78 rpm monophonic record and (b) a 33 rpm stereophonic microgroove record. While the monophonic groove has a constant width and is only radially modulated, both stereophonic groove’s walls are modulated resulting in a groove width variation.

channels produce lateral motion of the groove, and out of phase signals produce vertical motion. With the 45/45 system, mono and stereo cartridges and records are compatible: while mono records can be played with stereo playback cartridge, it is not recommended to use stereophonic records with monophonic cartridges, as their resistance to the groove vertical motion may damage the stereo groove.

The inability of mono turntables to track properly the vertical component of the groove, results in distortion of the lateral signal. Thus the vertical component of the low-frequencies is attenuated to limit the vertical movement. Figure 2.3 shows top views of monophonic and stereophonic records.

A few other engraving formats have been developed, but were not widely used, as the quadraphonic sound format for example, which encoded two channels in the displacement of each groove’s wall [42].

(36)

2.3 Recording process

First records were acoustical recording, which means that the recording process converted the acoustic energy of the sound directly into the groove modulation.

The ﬁrst acoustical recording devices were quite simple, such as the phonautograph presented on Figure 2.1: it consisted of a light membrane carrying a stylus, which was cutting a record. A horn was used to concentrate the sound on the membrane.

Thus the membrane vibrated under the inﬂuence of a sound and the vibration was transmitted to the stylus that moved the groove perpendicularly to the stylus direction. The playback device worked the opposite way: the stylus movements were transmitted to the membrane, which vibrations rendered the recorded sound. There is no ampliﬁcation in such process, therefore all the energy of the sound is contained in the groove modulation. This imposes large groove amplitude which limits the track duration on each face of the record.

Acoustic records had a limited frequency bandwidth due to the mechanical limitations of the recording process and to the shape of the horn. The shape of the horn also produced resonances that led to great variations in the frequency response of the system. Diﬀerent horn shapes have been developed in order to control the eﬀect on the recorded sound [43].

Electrical recording was introduced in 1925 after the invention of the microphone and the ampliﬁer. Electrical recording devices use a transducer to convert the mechanical vibrations of the air into corresponding electrical signals. This signal is transmitted to a cutting stylus that transforms the electrical pulses into lateral stylus vibrations to modulate the groove. At playback, the pickup converts the lateral stylus displacements into electrical pulses. These pulses are ﬁnally transformed into air sound waves by the loudspeakers. Lots of disturbances are avoided by the electrical recording in comparison to the acoustical one, as most of the process is done on the electrical form of the vibrations and no more the mechanical form [44].

2.4 Disc manufacturing

There are mainly two kinds of record manufacturing: direct cut and pressed records.

Direct cut discs were used for studio recording, to record events like speeches and conferences and to edit and mix sound recordings for radio stations. Thus each direct cut record was available as a single copy. The advantage of the direct cut disc is that they can be directly played back after recording, without having to wait for an industrial reproduction of their content. Thus they are also called instantaneous records.

The records manufactured in mass production are also called pressed records.

They are basically manufactured in ﬁve steps:

1. A direct recording is cut on a lacquer record. Record is metalized.

2. A metal negative master is made by electroforming.

3. A positive master is made from the negative one to obtain a mold.

(37)

2.5 Record materials 17

4. This mold serves to generate pressing masters or ”stampers”.

5. Stampers are used to press the shellac or vinyl records.

The metal negative master could be directly used to stamp out positive records in plastic; but since it is the only existing copy of the original, it is used instead to produce the stampers [44, 45].

2.5 Record materials

Several kinds of materials and compounds have been used to manufacture records.

While pressed records are manufactured with resistant materials to avoid surface wear and allow multiple playbacks, the instantaneous records are made of softer material, which is a compromise between ease of engraving and long playback life.

The next three subsections brieﬂy describe the three most used materials, which are the shellac and the vinyl for pressed records and the acetate for direct cut discs.

2.5.1 Shellac

Shellac records were produced during the ﬁrst half of the 20^th century. Shellac is an organic material collected from the secretions of an Asian insect called Coccus Lacca.

In spite of their name, shellac records contain only approximately 15% of shellac.

They are in fact made of a compound of shellac, ﬁllers, binder, lubricant, colorants...

These ingredients were added in order to reach the best audio quality requirements and to optimize the automated record manufacturing process. Although shellac itself is resistant to mould and fungus growth, the organic materials used in the compound are susceptible to fungus attack. Thus there is a very wide range of shellac records quality, and stored shellac records may behave in several manners as their chemical compositions diﬀer [46, 47].

2.5.2 Acetate/lacquer

These records consist of a strong base, of either glass or aluminum, covered with a layer of cellulose nitrate lacquer. The softening agents, such as castor oil or camphor, are added in lacquer to insure better cutting properties, as the acetate records were mainly used for direct cut recording. The lacquer color was usually black, but one can ﬁnd green, yellow or transparent records. Acetate records were mainly used between 1930 and 1955, but the BBC (British Broadcasting Corporation) used instantaneous discs as late as the 1970s [1].

Acetate records have a much ﬁner graininess than the shellac records, which considerably increases the sound quality at playback. Unfortunately, acetates are the most fragile records in comparison to the shellac and vinyl: the diﬀerent rates of expansion between the layers and the ongoing contraction of the lacquer create stress on the layers and shrinkage of the lacquer coating [2].

(38)

(a) (b)

Figure 2.4: Proﬁle views of (a) a 78 rpm monophonic shellac record coarse groove and (b) a 33 rpm stereo microgroove vinyl record.

2.5.3 Vinyl

Since the ’40s, the raw material used for the manufacture of pressed records is the polyvinyl chloride (PVC). Vinyl records are also a compound of PVC, stabilizers, colorant, ﬁllers and additives. The PVC has a ﬁner graininess than the shellac.

Thus they produce less surface noise and the rotation speed of the records could be reduced. Vinyl is the most stable material used for records manufacturing: it is resistant to fungus and unaﬀected by high humidity levels.

2.6 Recording speed

Many diﬀerent recording speeds have been used at the beginning of the phonographic technology. This was not only due to the lack of standardization, but also to the mechanical drive motors that could not always be locked to a speciﬁc speed. Thus each manufacturer established its own standard, and recording speed ranged from 60 rpm (rotations per minute) to at least 100 rpm.

Record types may be roughly classiﬁed in two categories: coarse groove (78 rpm and a few 33 rpm) and ﬁne groove or microgroove (16, 33 and 45 rpm), which are also displayed on Figure 2.4. The most widely used recording speeds are described in the next subsections [48, 49].

2.6.1 78 rpm

Early speeds of rotation varied widely, but by 1910 these speeds gradually settled around the standard of 78 rpm. Several standards were used in this 78s family, depending on the motor gear ratio and power supply (50/60 Hz) for example: 71.29 rpm, 76.59 rpm, 78.26 rpm, 78.8 rpm and 80 rpm. All these records were later called 78s or coarse groove, to distinguish them from the newer rotating speeds and ﬁner grooves. A 30 cm 78 rpm record could holds up to 5 minutes audio content per side.

(39)

2.6 Recording speed 19

2.6.2 33 rpm

The first 33 rpm were manufactured in 1925. The need for longer audio content came about with the advent of early sound films. In the early days of film, the audio was recorded on a separate record, and a reel of film in those days, could run for about 11 minutes. Simply dropping the record recording speed down to 32 rpm would enable the full length film audio to be recorded non-stop on one side.

Ultimately, 33¹₃ rpm was agreed on as the ﬁnal modern standard as it could be yield with gear reduction with either 50 Hz or 60 Hz power supply. Even if the eﬀective speed is 33¹₃ rpm, these records are usually referred to as 33 rpm.

The use of vinyl for record manufacturing came in 1929. Working with a finer grain material allowed cutting finer grooves, which were called microgrooves. With the use of the microgroove and of the variable groove spacing technology (which vary according to the signal amplitude), it was possible to record up to 25 minutes per side. These records are often called LP (Long Play), microgroove or fine groove records in opposition to the earlier coarse groove records.

Thus the stereo, microgroove and 33 rpm technologies were all developed in the early 30’s, but the ﬁrst stereo LPs appeared only in 1948, under the commercial pressure of the tape, which was introduced in the late 40’s.

2.6.3 16 rpm

The 16 rpm disc appears towards the 50’s. These records rotate in fact at 16 ²₃ rpm, which is half the rotational speed of 33 rpm records. This low speed was used to maximize continuous recording times: the low speed combined with large sized record (40 cm diameter) could yield up to about 30 minutes per side for coarse groove or almost 60 minutes for microgroove records. Unfortunately, low speed means more surface noise and therefore less bandwidth (up to 3 kHz), thus these discs were mainly used for spoken words content: transcriptions, language courses, books reading for visually impaired people [50].

2.6.4 45 rpm

The 45 rpm was introduced in 1949 due to a commercial policy: the decision was to have a high-quality ﬁne groove 5 ¹₂ minutes single record. Since all of the variables have been deﬁned (with certain assumptions about the bandwidth and tolerable distortion), the rotation speed was easy to determine: 45 rpm. However some people still pretend that the choice of the 45 rpm speed is only due to the fact that 78 minus 33 equals 45!

Since the 45 rpm record surface is large enough and to get a better sound quality, the 45 rpm discs are usually recorded with a 3 dB gain in comparison to the 33 rpm.

Phonographic record sound extration by image processing

PHONOGRAPHIC RECORD SOUND EXTRACTION BY IMAGE PROCESSING

Acknowledgements

Abstract

R´ esum´ e

Contents

List of Figures

List of Tables

Chapter 1 Introduction

1.1 The VisualAudio project

1.2 Objectives of this thesis

1.3 Related works

1.3.1 Sound on ﬁlm

1.3.2 Signal extraction by image processing

1.3.3 Phonographic optical playback systems

1.4 Overview of this dissertation

Chapter 2

Phonographic recording

2.1 History

2.2 Mechanical recording

2.3 Recording process

2.4 Disc manufacturing

2.5 Record materials

2.5.1 Shellac

2.5.2 Acetate/lacquer

2.5.3 Vinyl

2.6 Recording speed

2.6.1 78 rpm

2.6.2 33 rpm

2.6.3 16 rpm

2.6.4 45 rpm