CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning

1University of Alberta,

Abstract

Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share gameplay videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs.

Sample Outputs

A horse in the air

Video identified by our approach with the bug query 'A horse in the air' form Red Dead Redemption 2.

A person stuck in a barrel

Video of 'A person stuck in a barrel' from The Elder Scrolls V: Skyrim.

A car in a vertical position

Video identified by our approach with the bug query 'A car in a vertical position' form Grand Theft Auto V.

A car stuck in a tree

Video identified by our approach with the bug query of 'A car stuck in a tree' from Grand Theft Auto V.

Live Demo

If the Huggingface demo below is not loading properly for you, please visit this link .

Media

BibTeX

@INPROCEEDINGS {9796271,
        author = {M. Taesiri and F. Macklon and C. Bezemer},
        booktitle = {2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)},
        title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},
        year = {2022},
        volume = {},
        issn = {},
        pages = {270-281},
        abstract = {Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share game-play videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},
        keywords = {training;visualization;social networking (online);computer bugs;transfer learning;games;software},
        doi = {10.1145/3524842.3528438},
        url = {https://doi.ieeecomputersociety.org/10.1145/3524842.3528438},
        publisher = {IEEE Computer Society},
        address = {Los Alamitos, CA, USA},
        month = {may}
        }