" Review of Existing Techniques for Human Emotion Understanding and Applications in Human-Computer Interaction "

First, the problem of the lack of a unified theory that could appropriately describe the whole range of emotions is acknowledged. Classifying emotions in a few primary categories ('pure emotions') seems inherently limited. The approach of characterising emotions along a few continuous dimensions (e.g., negative vs positive and weak vs strong affect) is preferred for its generality and flexibility. Prediction of the possible actions following emotionnal states may be at the core of the meaning of emotions, and is highly desirable for any system aiming to genuinely interact with humans.

A few acoustic features of the speech signal (F0, energy content, speech rate and spectral measures) have been identified that can lead to good rates (50%, comparable to humans') of classification of utterances into a relatively wide range of emotions (about 15). Using more diverse speech material than those used so far is a priority; greater ecological validity and a variety of situations more representative of those encountered in real life are required for the efficiency and robustness of the system.

Systems using dynamic images of face expressions give the most promising results, as opposed to those using static images. However, the set of emotions they use is very limited and needs to be extended.

Some more information could also be integrated for a better judgment of emotions (such as physical context or recognised words), and possibly explicitely sought by directly questioning the human agent.

Following the above, it can be concluded that developing artificial emotion detection systems ideally involves co-ordinated treatment of the following issues.

1 Signal analysis for speech

2 Signal analysis for faces

3 Effective representations for emotion

4 Appropriate intervening variables

5 Acquiring emotion-related information from other sources

6 Integrating evidence

7 Emotion-oriented world representations