Sunday, November 18, 2007

Max Headroom in the phonetics lab

This report has been circulating, in slightly less detailed form, on the BBC. It describes researchers 'reading off' speech signals in a coma patient. Sadly, but not unexpectedly, the BBC story is a trimmed-down version of the New Scientist story (already under 700 words), with a few quotes from other brain researchers thrown in for good measure, and, of course, filled with suggestions that the man in the coma is about to be turned into Max Headroom. This is not entirely true.

First of all, from the NS article, it looks like the Boston University Speech Lab is responsible for this research, but I can't find any reference to the project on their website. The description is of an invasive technique, implanting electrodes directly in poor MH's brain. (His real name seems to be Eric, but one would think they'd have changed it for the purposes of reporting the research anyway, so I'm going to stick with MH.) Sadly, their computer can't recognize 'N-N-New Coke' in the output of this little bundle of neurons, but it can apparently distinguish /u/, /o/, and /i/, and do so with 80% accuracy. Evidently they tell MH to 'think hard about saying /u/' and he complies in his comatose state. Until the details come out, I'll be a bit skeptical about what's really going on.

Nevertheless, what is there on the BU speech lab website is interesting: first, brain imaging data for speech production, most notably the apparent location in the brain of certain articulatory signals. Second, a computer model which they use to situate what they think these groups of neurons are doing. I won't evaluate either one, but if we believe their brain maps, and if we believe they've stuck the electrodes in the right place, then it seems like they are really reading off something like articulatory information.

This is somewhat interesting once you realize that the actual form that articulatory information takes is still up for debate. On the one hand, there's the fairly obvious theory that, when we speak, we just send instructions to the articulators. Of course, if sounds are stored in this articulatory format, then perception must involve something like the Motor Theory of speech perception, which means that you have some (presumably built-in) hardware for matching up speech sounds that you hear with the gestures that produced them. This is how you match up the sounds you hear with stored forms, which just tell you what to do.

On the other hand, a goal-based theory of production (subscription to Journal of Phonetics required) says something like the reverse. The information you need to send to the motor system is (mostly) a bunch of acoustic targets. You might also have articulatory targets, but the key thing is that you can just send 'I want a low f2' to the low-level system and it will work it out automatically, presumably, again through some built in mapping. So when we map out what parts of the brain are lighting up when we say /i/ etc, we shoul consistently see things corresponding to these more abstracted features. If we don't, then we can't tell whether the goal-based theory is right.

Of course, this is not so easy to tell for /i/, since the mapping between acoustics and articulation for the vowel space is fairly trivial, but if we had enough brain data we should in principle be able to tell; do the bits that light up for particular sounds in production seem to correlate with acoustics or articulation? Clearly, the articulatory signals have to be there, but if we can't see the acoustics, we don't have any reason to believe in the goal-based theory. This is factoring out any methodological concerns, of course, which I take it are acute with brain imaging. But it would be interesting to look (and if anyone wants to hunt through the stuff on the BU Speech Lab web site and try and find data bearing on this question feel free).


meagan louie said...

Ok, this gets into stuff I really don't know much about, but I'm all for exposing my lack of knowledge (well, theoretically I am...):

How does the first theory take into account the fact a single person speaking English can produce [r] (pretending that's upside-down) in different ways? At least I think that I do both bunching and turning my tongue tip up to produce [r]...

Would there be a different articulatory command for each production of the same phoneme?

ewan said...

I don't know that much either but yes, this would be equivalent to any kind of variation. That's easy for a theory that just says 'do this' because then you can say 'do this or this or this.' My superficial understanding of the goal based theory is that it's a little trickier there. How do you account for people who only have one [r] or the other?

Presumably, these people must have a slightly more specific goal in mind, so that the motor system will decide, 'Ah, I need to sound like a bunched [r]!' But the difference between the two [r]'s is barely perceptible - so these people must also have a very keen ear to arrive at that conclusion.

On the other hand, a theory that sends specific gestures for speech motor control, rather than soft goals, would not allow underspecified gestures to be sent to motor control, so not only could you get variation (for people who had multiple gestural versions of /r/ available), but you could get someone who had stored only one or the other possible gesture as the articulation of /r/.

One problem with this theory, I suppose, would be that people seem to choose which version of a phoneme they'll used based on phonetic environment ('articulatory ease'). If people are choosing one [r] over another for articulatory ease, we have a modularity problem - but maybe that's okay.

meagan louie said...

Hmmm, I was kind of thinking that the goal-based one could just have a command like "I want an F3 that drops, do that however you want, bunching or tongue-up-turning, whatever." And then people could do one, or the other, or both.

I wonder if speech adaption would provide any evidence as to which theory of production is better - for example, people that get braces/retainers sound weird when they first get them (er, at least I did), but then they adjust to having all that metal in their mouth, and then they sound normal again. So then this period of sounding weird would either be something like a person

i) learning how to produce the right acoustic targets with their newly modified oral tract, or

ii) reprogramming their articulatory commands to accommodate their modified oral tract

I'm too lazy to figure out if these would have different predictions, but if it actually is a case of i), something interesting is that the information gleaned from this learning seems easily lost. I used to be able to sound normal wearing my retainer. This was when I had to wear it all the time, but now that I don't, I sound weird again when I try to speak with it.