Search

Many research topics in natural language processing (NLP), such as explanation generation, dialog modeling, or machine translation, require evaluation that goes beyond standard metrics like accuracy or F1 score toward a more human-centered approach. Therefore, understanding how to design user studies becomes increasingly important. However, few comprehensive resources exist on planning, conducting, and evaluating user studies for NLP, making it hard to get started for researchers without prior experience in the field of human evaluation. In this paper, we summarize the most important aspects of user studies and their design and evaluation, providing direct links to NLP tasks and NLP-specific challenges where appropriate. We (i) outline general study design, ethical considerations, and factors to consider for crowdsourcing, (ii) discuss the particularities of user studies in NLP, and provide starting points to select questionnaires, experimental designs, and evaluation methods that are tailored to the specific NLP tasks. Additionally, we offer examples with accompanying statistical evaluation code, to bridge the gap between theoretical guidelines and practical applications.

This paper describes and evaluates the use of a head-mounted display (HMD) for the teleoperation of a field robot. The HMD presents a pair of video streams to the operator (one to each eye) originating from a pair of stereo cameras located on the front of the robot, thus providing him/her with a sense of depth (stereopsis). A tracker on the HMD captures 3-DOF head orientation data which is then used for adjusting the camera orientation by moving the robot and/or the camera position accordingly, and rotating the displayed images to compensate for the operator's head rotation. This approach was implemented in a search and rescue robot (RAPOSA), and it was empirically validated in a series of short user studies. This evaluation involved four experiments covering two-dimensional perception, depth perception, scene perception, and performing a search and rescue task in a controlled scenario. The stereoscopic display and head tracking are shown to afford a number of performance benefits. However, one experiment also revealed that controlling robot orientation with yaw input from the head tracker negatively influenced task completion time. A possible explanation is a mismatch between the abilities of the robot and the human operator. This aside, the studies indicated that the use of an HMD to create a stereoscopic visualization of the camera feeds from a mobile robot enhanced the perception of cues in a static three-dimensional environment and also that such benefits transferred to simulated field scenarios in the form of enhanced task completion times.

Search Results

Refine search

Refine search

Actions for selected content:

2 results

How to do human evaluation: A brief introduction to user studies in NLP

Design and evaluation of a head-mounted display for immersive 3D teleoperation of field robots

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

2 results

How to do human evaluation: A brief introduction to user studies in NLP

Design and evaluation of a head-mounted display for immersive 3D teleoperation of field robots