PedestrianCrossingIntention

Prediction of the intention of pedestrians to cross the street or not, using Graph Neural Networks and the coordinates of their skeleton that was previously generated using Openpose in the JAAD dataset.

Generate Convert Improve

Install / Use

/learn @abel-gr/PedestrianCrossingIntention

About this skill

Quality Score

0/100

README

Pedestrian Crossing Intention

Prediction of the intention of pedestrians to cross the street or not, using Graph Neural Networks and the coordinates of their skeleton. We made our own dataset of crossing and not-crossing scenarios using CARLA simulator.

Spatial-Temporal Graph Convolutional Network

I initially designed a first spatial model using the GCNConv graph convolutional operator, with an input size equal to 3, which are the 3 features of each node (the 3 coordinate axes of each joint). However, the results only improved a bit, and training it more produced a high overfitting.

To get good results I had to design a Spatial-Temporal model. What I did was, for a set of n frames with their n skeletons, predict if in a future frame the pedestrian will perform the action of crossing or not. That is, from the movements and trajectory of the pedestrian during n frames (grouped using a sliding window), predict whether he or she will cross the street or not in the near future.

The layers of the model are detailed in the following table:

| Layer | Input shape | Output shape | |:------------------------------|:-------------------|:--------------------| | Recurrent part: | - | - | | |--> GConvGRU | [-1, 26, 3] | [-1, 26, 3] | | |--> Dropout (0.3) | [-1, 26, 3] | [-1, 26, 3] | | |--> ReLU | [-1, 26, 3] | [-1, 26, 3] | | End of recurrent part | - | - | | Reshape | [-1, 26, 3] | [-1, 78] | | Linear | [-1, 78] | [-1, 39] | | Dropout (0.3) | [-1, 39] | [-1, 39] | | ReLU | [-1, 39] | [-1, 39] | | Linear | [-1, 39] | [-1, 19] | | Dropout (0.3) | [-1, 19] | [-1, 19] | | ReLU | [-1, 19] | [-1, 19] | | Linear | [-1, 19] | [-1, 2] | | Softmax | [-1, 2] | [-1, 2] |

The input to the network contains, for each frame, 26*3 elements because in CARLA there are 26 different joints, and we input the 3D coordinates of them.

Results

CARLA dataset

Final results when classifying our CARLA dataset of pedestrian skeletons using the Spatial-Temporal Graph Convolutional Network that uses 5 frames to define the temporal dimension.

Metrics

| Metric | train | val | test | |:------------------|:------:|:------:|:-------| | Accuracy | 0.9171 | 0.8194 | 0.8834 | | Balanced accuracy | 0.9177 | 0.8183 | 0.8857 | | Precision | 0.9334 | 0.8103 | 0.9207 | | Recall | 0.9078 | 0.8014 | 0.8638 | | f1-score | 0.9204 | 0.8058 | 0.8913 |

Videos

Train videos

Test videos

JAAD dataset

Final results when classifying JAAD dataset of pedestrian skeletons using the Spatial-Temporal Graph Convolutional Network that uses 87 frames to define the temporal dimension.

Videos

Train videos

Test videos

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>