First, the problem of the lack of a unified theory that could appropriately describe the whole range of emotions is acknowledged

REPORT SUMMARY

" Review of Existing Techniques for Human Emotion Understanding and Applications in Human-Computer Interaction "

First, the problem of the lack of a unified theory that could appropriately describe the whole range of emotions is acknowledged. Classifying emotions in a few primary categories ('pure emotions') seems inherently limited. The approach of characterising emotions along a few continuous dimensions (e.g., negative vs positive and weak vs strong affect) is preferred for its generality and flexibility. Prediction of the possible actions following emotionnal states may be at the core of the meaning of emotions, and is highly desirable for any system aiming to genuinely interact with humans.

A few acoustic features of the speech signal (F0, energy content, speech rate and spectral measures) have been identified that can lead to good rates (50%, comparable to humans') of classification of utterances into a relatively wide range of emotions (about 15). Using more diverse speech material than those used so far is a priority; greater ecological validity and a variety of situations more representative of those encountered in real life are required for the efficiency and robustness of the system.

Systems using dynamic images of face expressions give the most promising results, as opposed to those using static images. However, the set of emotions they use is very limited and needs to be extended.

Some more information could also be integrated for a better judgment of emotions (such as physical context or recognised words), and possibly explicitely sought by directly questioning the human agent.

Following the above, it can be concluded that developing artificial emotion detection systems ideally involves co-ordinated treatment of the following issues.

1 Signal analysis for speech

There is prima facie evidence that a wide range of speech features, mostly paralinguistic, have emotional significance.
Work is needed on techniques for extracting these features.
Techniques based on neural nets have been extensively used at this level, and could be used more to set parameters within classical algorithms.
There would probably be gains if the extraction process could exploit relevant linguistic information - phonetic or syntactic.

2 Signal analysis for faces

There is prima facie evidence that a range of facial gestures have emotional significance.

The static approaches which are best known in psychology do not transfer easily to machine vision in real applications.

Dynamic approaches have produced promising results, but their psychological basis is largely unexplored, and they have not been tested on a large scale.

3 Effective representations for emotion

Describing emotion in an exclusive sense (i.e. cases of 'pure' emotion) is very different from describing emotion in an inclusive sense (i.e. emotionality as a pervasive feature of life); and conceptions suggested by the first task do not transfer easily to the second.
A range of techniques are potentially relevant to representing emotion in an inclusive sense, including continuous dimensions and schema-like logical structures.
Ideally a representation of emotion should not be purely descriptive: it should also concern itself with predicting and/or prescribing actions.
Ideally representations of emotion should be capable of modification through experience, as developmental and cross-cultural evidence indicate human representations are.

4 Appropriate intervening variables

Human judgements of emotion may proceed via intervening variables - referring to features of speech, facial gestures, and / or speaker state - rather than proceeding directly from the signal.
The ability to describe these intervening variables in symbolic terms opens the way to explaining and reasoning about emotion-related judgements.
Allowing suitable intervening variables to emerge through experience is a form of a recurring challenge to computational theories of learning.

5 Acquiring emotion-related information from other sources

Contemporary word recognition techniques probably support the detection of words which have strong emotionally loadings in continuous speech.
Information from behaviour and physical context are certainly relevant to emotional appraisal, and could be obtained in at least some contexts.
Active acquisition of information about emotionality is clearly a possibility to be considered - e.g. asking "are you bored with this task?".

6 Integrating evidence

Numerical methods of integrating evidence can generate good identification rates under some circumstances.
In other circumstances it seems necessary to invoke logical techniques which examine possible explanations for observed effects, and discount them as evidence for X if explanation Y is known to apply - i.e. inferences are causal, abductive, and cancellable.

7 Emotion-oriented world representations

Cognitive theories highlight the connection between attributing an emotion and assessing how a person perceives the world in emotionally significant terms - as an assembly of obstacles, threats, boring, attractive, etc...
Developing schemes which represent the world in emotion-oriented terms is a significant long term task which may lend itself to subsymbolic techniques.
The task may be related to the well known that the meanings of everyday terms have an affective dimension.