Synthetic Crowds Dataset


Simulation is a powerful tool to easily generate annotated data, and a highly desirable feature, especially in domains where learning models need to be trained. Machine learning and deep learning approaches are extremely data-hungry, and real-world data are often not sufficient in satisfying this requirement. Despite the initial skepticism of a portion of the scientific community, the potential of simulation has been largely confirmed in many application areas, and the recent developments in terms of rendering and virtualization engines, have shown a good ability also in representing complex scenes. This includes environmental factors, such as weather conditions and surface reflectance, as well as human-related events, like human actions and behaviors.

We present a human crowd simulator and its associated validation pipeline. We show how the simulator can generate annotated data, suitable for vision-based tasks, as crowd counting, detection, and tracking, as well as behavior-oriented tasks like trajectory prediction and anomaly detection.

Simulator and Dataset

Most real-world datasets are often used for specific tasks since they provide limited ground-truths. Leveraging our simulator, we can provide ground truth that is compatible with multiple tasks both on the behavioral side, such as trajectory prediction, and anomaly detection, as well as on the appearance side, like crowd counting, human pose estimation, people detection and segmentation.

Our simulator for crowds focuses on meeting the requirements for both the behavioral fidelity and visual fidelity.

On the behavioral fidelity side, a crowd simulator has to manage the crowd movements on both macro and micro perspectives. The macro crowd behavior consists in reproducing patterns, which are typical of the crowd as a whole, such as the emergent behavior of people going to the same direction, forming lines, or the crowd following social rules, such as walking along a pathway. The macro rules can change depending on cultural factors, such as the perception of the personal boundaries, which can be different across different continents.

On the other hand, micro crowd behavior focuses on the individual, and deals with the avoidance of obstacles and other people in the crowd; this involves the personal sphere (e.g. shyness, aggressiveness) and it is driven by the current circumstances (e.g. being in a hurry to catch a bus).

As for the visual fidelity, representing a crowd of humans means dealing with appearance and motion features that closely match what a human observer would see in the real life, modeling the environment around the crowd, introducing also weather and light changes throughout the day.

Ideally, synthetic data should be as photo-realistic and close to the real world as possible. In case of RGB data, that means working with the fine detail of light and object shaping. Reaching the level of accuracy required for synthetic videos and images to be indistinguishable for a human observer from their real counterparts is very costly and only top-tier video-games can afford that level of detail. Video-games have been used to gather useful data. However, video-games usually lack a sufficient behavioral fidelity, with pedestrians and vehicles in the scenes going through predefined paths with little behavioral realism.


You can find the paper ""here"".

Please cite:




You can download the dataset here.