“What if the algorithm mimics our mistakes?” Constraints and effects of data annotation for AI

Special report: “Ethics of AI” field research

Constraints and effects of annotating AI training data

No 240, 2023/4, pages 111 to 144

English

Machine learning algorithms are trained by using input data that has previously undergone manual annotation, allowing the models to subsequently identify significant elements in the databases. Despite being a laborious and often unnoticed task, this imperceptible digital labor plays a crucial role in establishing a reference “truth” for AI which, in turn, significantly influences the algorithmic outcomes. This article draws on the insights from a case study focused on the creation of a tool designed for the automatic anonymization of judicial decisions at the French Supreme Court, to explore the modalities and impacts of data annotation. The fieldwork, grounded in ethnographic observations and interviews, underscores the diversity of skills deployed by annotators. The article demonstrates how representational and moral systems influence this activity, thereby shaping the functioning of AI.

AI
Annotation
Classification
Algorithms
Data
Work

Go to the article on Cairn-int.info