• Aucun résultat trouvé

How to ask without speech? On quantifying zero-evidence speech

N/A
N/A
Protected

Academic year: 2022

Partager "How to ask without speech? On quantifying zero-evidence speech"

Copied!
7
0
0

Texte intégral

(1)

How to ask without speech?

On quantifying zero-evidence speech

“The Privacy ZEBRA”

Andreas Nautsch

Picture taken in

Heidelberg Zoo, 2020

2021-05-21

(2)

Context: VoicePrivacy 2020 Challenge

● Task: audio pseudonymisation ⇒ modify raw audio

○ Voice biometrics should fail

“Same person or different person?”

○ Speech recognition should work

“What was said?”

● Metric: Zero-Evidence Biometric Recognition Assessment

2

Natalia Tomashenko

Brij M.L.

Srivastava

Xin Wang

Emmanuel Vincent

Andreas Nautsch

Jose Patino

Massimiliano Todisco Paul-Gauthier

Noé Jean-François

Bonastre Junichi

Yamagishi

Nicholas Evans

(3)

Intuition: benefit to decision making?

● Motivation in forensic sciences

○ What is the benefit of evidence reporting to decision making?

○ How to validate?

● Empirical cross-entropy (ECE)

○ Less uncertainty with evidence than without?

● Strength-of-evidence: likelihood ratios

○ Which decision is more supported?

3 Figure based on wikimedia.org

Prosecutor P(voice | same)

Defendant P(voice | different) Decision maker: judge/jury

Prior uncertainty

(4)

Textbook: empirical cross-entropy (ECE)

● The ECE step-by-step

○ Prior entropy in making yes/no decision

○ Posterior entropy based on scores/evidence

○ !! Issue: no theoretical foundation for reference likelihoods

○ Remedy: cross-entropy, law of large numbers &

priors are external to the classifier

Ramos & Gonzalez-Rodriguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, in Proc. Odyssey, 2008 4

Ramos, Franco Pedroso, Lozano-Diez, Gonzalez-Rodriguez: Deconstructing Cross-Entropy for Probabilistic Binary Classifiers, Entropy 20(3), 2018

ϴ = { A: “same person” , B: “different person” }

P: reference probability space

Set of scores

Strength-of-evidence “the classifier”

score:

P(voice | same) / P(voice | different)

(5)

Disclosure: worst-case?

● Motivation: privacy for the individual; not for the average only

● Analogue from forensic sciences to privacy preservation

○ Prosecutor & defendant in a tug of war ⇒ i.e. strength-of-evidence

○ Decision maker: the adversary ⇒ i.e. what is the worst case?

● Categorical tags

Categorical scale of privacy disclosure (adapted from forensic sciences)

5 Willis et al.: Guideline for evaluative reporting in forensic science, European Network of Forensic Science Institutes (ENFSI), 2016

(6)

ZEBRA framework, an example

● VoicePrivacy 2020 Challenge

audio pseudonymisation

○ Task: speech recognition should work — voice biometrics should fail

○ Unprotected data: state-of-the-art voice biometrics

○ B1: DNN baseline

○ B2: signal processing baseline

6

Expected privacy disclosure (population)

Worst-case privacy disclosure (individual)

Categorical tag

https://gitlab.eurecom.fr/nautsch/zebra

(7)

The Privacy ZEBRA

Références

Documents relatifs

Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose

Part two deals with word level and lexical search: it presents work done (1) on the feasibility of word stress detection (primary stress, ertiniation of its magnitude,

The stress information was obtained by means of a model of rhythm perception, which aims at finding strong acoustic events inside the speech signal. A small but

It is thus interesting to test the influence of MPEG audio coding algorithms on speech recognition performance.. Moreover, MPEG audio coding supports a variable bitrate, which allows

d(Ψ src , Ψ tgt ) = hΨ src , Ψ tgt i (8) In the context of voice casting, the advantage of the multi-label scoring is double : first, the multi-label scoring can be used to

This paper presents a preliminary study on the classifi- cation of expressive speech for the elaboration of a voice casting system in the context of video games, and exploits

This thesis investigates practices in healthcare and how healthcare professionals adapt in relation to Automatic Speech Recognition (ASR) as the algorithm is

Participants of the task are asked to train a speech tok- enizer – LVCSR or subword – on a collection of public data (see section 2) in Vietnamese language.. Each participant must