Presentation adaptation for multimodal interface systems: Three essays on the effectiveness of user-centric content and modality adaptation

Heck, Melanie

PDF
Dissertation.pdf - Veröffentlichte Version
Download (4MB)

URN:	urn:nbn:de:bsz:180-madoc-642882
Dokumenttyp:	Dissertation
Erscheinungsjahr:	2023
Ort der Veröffentlichung:	Mannheim
Hochschule:	Universität Mannheim
Gutachter:	Becker, Christian
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Betriebswirtschaftslehre > Wirtschaftsinformatik II (Becker 2006-2021)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	multimodal interfaces , adaptation , user context , context acquisition
Abstract:	The use of devices is becoming increasingly ubiquitous and the contexts of their users more and more dynamic. This often leads to situations where one communication channel is rather impractical. Text-based communication is particularly inconvenient when the hands are already occupied with another task. Audio messages induce privacy risks and may disturb other people if used in public spaces. Multimodal interfaces thus offer users the flexibility to choose between multiple interaction modalities. While the choice of a suitable input modality lies in the hands of the users, they may also require output in a different modality depending on their situation. To adapt the output of a system to a particular context, rules are needed that specify how information should be presented given the users’ situation and state. Therefore, this thesis tests three adaptation rules that – based on observations from cognitive science – have the potential to improve the interaction with an application by adapting the presented content or its modality. Following modality alignment, the output (audio versus visual) of a smart home display is matched with the user’s input (spoken versus manual) to the system. Experimental evaluations reveal that preferences for an input modality are initially too unstable to infer a clear preference for either interaction modality. Thus, the data shows no clear relation between the users’ modality choice for the first interaction and their attitude towards output in different modalities. To apply multimodal redundancy, information is displayed in multiple modalities. An application of the rule in a video conference reveals that captions can significantly reduce confusion. However, the effect is limited to confusion resulting from language barriers, whereas contradictory auditory reports leave the participants in a state of confusion independent of whether captions are available or not. We therefore suggest to activate captions only when the facial expression of a user – captured by action units, expressions of positive or negative affect, and a reduced blink rate – implies that the captions effectively improve comprehension. Content filtering in movies puts the character into the spotlight that – according to the distribution of their gaze to elements in the previous scene – the users prefer. If preferences are predicted with machine learning classifiers, this has the potential to significantly improve the user’ involvement compared to scenes of elements that the user does not prefer. Focused attention is additionally higher compared to scenes in which multiple characters take a lead role.