AirsimDRL
Autonomous UAV Navigation without Collision using Visual Information in Airsim
Install / Use
/learn @sunghoonhong/AirsimDRLREADME
Deep Reinforcement Learning for Airsim Environment
Quadrotor Self-Flight using Depth image
NOTE
It is a capstone project for undergraduate course. It did work when I tried, but there were many trial and errors. I'm sorry that I didn't consider any reproducibility (e.g. random seed).
Check 1 min madness
Environment
Link to download executable
NOTE: These executables can be run only on Windows OS.
How To Use
Execute the environment first. If you can see the rendered simulation, then run what you want to try (e.g. python td3_per.py)
Description
Unreal Engine 4
- Original environment
- Vertical column
- Horizontal column
- Window
- Vertical curved wall
- Different Order of obstacles environment
- Window
- Horizontal column
- Vertical curved wall
- Vertical column
- Different type of obstacles environment
- Horizontal curved wall
- Reversed ㄷ shape
- ㄷ shape
- Diagonal column
Parameter
- Timescale: 0.5 (Unit time for each step)
- Clockspeed: 1.0 (Default)
- Goals: [7, 17, 27.5, 45, 57]
- Start position: (0, 0, 1.2)
Reset
Respawn at the start position, and then take off and hover.
It takes about 1 sec.
Step
Given action as 3 real value, process moveByVelocity() for 0.5 sec.
For delay caused by computing network, pause Simulation after 0.5 sec.
Done
If a collision occurs, including landing, it would be dead. If x coordinate value is smaller than -0.5, it would be dead. If it gets to the final goal, the episode would be done.
State
- Depth images from front camera (144 * 256 or 72 * 128)
- (Optional) Linear velocity of quadrotor (x, y, z)
Action
-
Discrete Action Space (Action size = 7)
Using interpret_action(), choose +/-1 along one axis among x, y, z or hovering. -
Continuous Action Space (Actions size = 3)
3 real values for each axis. I decided the scale as 1.5 and gave a bonus for y axis +0.5.
Reward
- Dead: -2.0
- Goal: 2.0 * (1 + level / # of total levels)
- Too slow(Speed < 0.2): -0.05
- Otherwise: 0.1 * linear velocity along y axis
(e.g. The faster go forward, The more reward is given. The faster go backward, The more penalty is given.)
Agent
- Recurrent DQN
- Recurrent A2C
- Recurrent DDPG
- Recurrent DDPG + PER
- Recurrent TD3 + PER (BEST)
Result
<img src="/save_graph/result_Best Record.png" height="200"> <img src="/save_graph/result_Get Goal Prob..png" height="200">Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.1kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

