A horse in the air
Video identified by our approach with the bug query 'A horse in the air' form Red Dead Redemption 2.
A new benchmark dataset for evaluating large multimodal models.
Our project focuses on harnessing the power of large pre-trained models, known as foundation models, to enhance the search and analysis of gameplay videos. By utilizing the advanced capabilities of models such as CLIP, we effectively retrieve relevant gameplay videos based on their content, bypassing the need for external metadata. This approach provides a more efficient and structured method for navigating the abundant repositories of gameplay videos available online. With potential applications in game testing, bug analysis, and event detection, our work aspires to enrich the toolset available to developers. Furthermore, we introduce the GamePhysics dataset, consisting of 26,954 videos from 1,873 games, collected from the GamePhysics section on the Reddit website. Explore our website to learn more about our innovative project and its impact on advancing the gaming industry and enhancing developer resources.
Video identified by our approach with the bug query 'A horse in the air' form Red Dead Redemption 2.
Video of 'A person stuck in a barrel' from The Elder Scrolls V: Skyrim.
Video identified by our approach with the bug query 'A car in a vertical position' form Grand Theft Auto V.
Video identified by our approach with the bug query of 'A car stuck in a tree' from Grand Theft Auto V.
If the Huggingface demo below is not loading properly for you, please visit this link .
@INPROCEEDINGS {9796271,
author = {M. Taesiri and F. Macklon and C. Bezemer},
booktitle = {2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)},
title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},
year = {2022},
volume = {},
issn = {},
pages = {270-281},
abstract = {Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share game-play videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},
keywords = {training;visualization;social networking (online);computer bugs;transfer learning;games;software},
doi = {10.1145/3524842.3528438},
url = {https://doi.ieeecomputersociety.org/10.1145/3524842.3528438},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {may}
}
@article {TBA,
author = {M. Taesiri and F. Macklon, S. Habchi, and C. Bezemer},
title = {Leveraging Contrastive Pretrained Models for Gameplay Video Retrieval Tasks},
year = {2023},
}