Cvgaming
Control in games via webcam, mediapipe and gradient boosting
Install / Use
/learn @alexsmirnov6/CvgamingREADME
Play video games using webcam and CV
1. A little bit about idea of the project
The world knows a lot of controllers for playing video games through the movement of the body and its individual parts. But there is no application capable of replacing these controllers with a regular webcam, which almost every PC user has.
Given the interest in such projects from both players and developers, the implementation of this idea seemed very interesting to me. Let's go directly to its analysis.
2. Project structure
The entire cycle of implementing the game control mechanism through the camera can be divided into 3 stages:
- User data collection;
- Training ML models on the collected data;
- Assigning keys for the keyboard and mouse emulator, testing the work of trained models; We will repeat this cycle for two games: the classic Mario platformer and the Punch a Bunch boxing simulator. We will control the first game with our palms, and the second with our whole body
3. User data collection
First, we will designate the poses that we want to recognize to activate certain commands (about them in paragraph 5). For the Punch a Bunch game initialize 4 poses: basic stand, block, right and left hook. To play Mario, initialize 5 poses. For the right hand: a state of rest and jump, for the left hand: a state of rest, forward and backward movement
Now let's get down to the data collection itself. Let's go through the list with poses, for one minute we will collect data for each of them and record the results in a .csv file. We will give the user 5 seconds between each pose to change it.
For convenience, I implemented a simple interface: we can see the time remaining to collect data on the current pose, name of current pose and the path along which the data recording.

4. Training models
I use gradient boosting and random forest models built into keras. Linear models and fully-connected neural networks can also be used.
5. Using trained models
To interact with the game, you first need to assign a specific set of actions to each of the poses that will be performed in this pose. To do this, I created a dictionary of the format pose - request. The request consists of nested lists, each of which corresponds to a specific action. One nested query consists of three elements: the device that we are emulating (keyboard or mouse), the action that we want to perform, and the parameter to this action (in our case, the button that we want to hold or release). All this can be seen in the script.
I also wrote a function to decode these requests, consisting almost entirely of if-else constructs (it is also in the script)
Next, using the existing code for getting and outputting labels, we will substitute a trained model, a dictionary of binds and a query decoding function into it. Everything is ready!
Here is an example of the work:

6. Summing up the results
In my opinion, this project has a great potential for development. I plan to continue its development. If it is interesting to you and you want to join, I will be glad of your support
Related Skills
node-connect
334.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.1kCommit, push, and open a PR
