The UrbanStreet dataset used in the paper can be downloaded here [188M] . It contains 18 stereo sequences of pedestrians taken from a stereo rig mounted on a car driving in the streets of Philadelphia during rush hours. The image resolution is 516x1024. Ground-truth is provided in the form of pedestrian segmentation masks for the left view. All pedestrians larger than 100 pixels in height are labelled every 4 frames (0.6 seconds) in each video sequence. The video below shows ground-truth label samples. References: Two Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang, and Jianbo Shi in ECCV 2012 paper | poster

Related datasets