The automated description of audiovisual archives : NeuralTalk, a video captioning model applied to the archive of the spanish radio and television corporation

Objective: To assess the deep learning capability of a video captioning model for automated image description in a television archive.

Methodology: Our proof of concept tested an ad hoc video-captioning model in three iterations between June 2016 and January 2017. In the first and second iterations the model was used to analyse a selection of content from the archives of the Spanish Radio and Television Corporation (RTVE) and the descriptions it generated were evaluated to determine the model’s success rate, i.e., how close it came to providing human-like image descriptions. In the third iteration and before the content was analysed, the model was trained using deep learning techniques to optimise the results.

Results: The results indicate that the model has potential, although further development will be required to customise its use in television archives.