CARLA-BSP (Binary Single Pedestrian) is a pedestrian crossing/non-crossing dataset created as a part of the ARCANE project.

@misc{wielgosz2023carlabsp,
      title={{CARLA-BSP}: a simulated dataset with pedestrians}, 
      author={Maciej Wielgosz and Antonio M. López and Muhammad Naveed Riaz},
      month={May},
      year={2023},
      eprint={2305.00204},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

The dataset contains almost 400 videos, 900 frames long @ 30 FPS. Videos have 1600×600 resolution. The pedestrian may not be visible in all frames (sometimes something may obstruct the view from the camera, and the pedestrian is not visible at all, even though the related skeleton points are correct). Semantic labels corresponding to video frames are available as *.apng files.

Download

The corresponding CSV file contains only a subset of frames. For each video, the frames with any of the skeleton coordinate inside the frame boundary or pedestrians visible in semantic segmentation were determined. Only that data was left in the data.csv file to keep it relatively small. Therefore, it is crucial to check to which video frame a particular data row corresponds. This information can be found in the frame.idx column.

Age	Gender	# of videos	# of frames	# of crossing frames
adult	female	88	49951	17820
	male	109	60775	22924
child	female	100	55755	23038
	male	99	58846	28082
Total		396	225327	91864

Basic dataset stats

The videos are captured using randomly spawned pedestrians that are supposed to go through a nearby trajectory waypoint. In general, there are two possible scenarios:

a pedestrian immediately starts crossing the street,
a pedestrian walks in parallel to the road.

Sometimes the pedestrians get stuck, e.g., when they reach the pavement. Occasionally, they are spawned in the middle of the street instead of on a sidewalk. Sometimes there are rendering artifacts, missing (black) video frames, colors are coded wrong, etc. This version of the dataset was not cleaned to eliminate such instances. Therefore you may encounter such videos in the set.

Column name	Description	Example value
id	Scene identifier	094b9fe1-babe-48d5-bd17-3ba3185690c5-0
world.map	Map used to generate the clip	/Game/Carla/Maps/Town04
camera.idx	Camera index for data in this row; each clip can potentially have multiple cameras (currently, only a single one is used)	0
camera.width	Width of the captured frame	1600
camera.height	Height of the captured frame	600
camera.recording	Path to the captured recording, relative to dataset file	clips/094b9fe1-babe-48d5-bd17-3ba3185690c5-0-0.mp4
camera.semantic_segmentation	Path to the captured semantic segmentation, relative to dataset file	clips/094b9fe1-babe-48d5-bd17-3ba3185690c5-0-1.apng
camera.transform	[x, y, z, pitch, yaw, roll] as extracted from carla.Transform	‘[−5.6229, 316.1439, 1.1189, −9.4047, −65.5020, 0.0000]’
pedestrian.idx	Pedestrian index for data in this row; each clip can potentially feature multiple pedestrians (currently, only a single one is used)	0
pedestrian.model	Blueprint name of the spawned walker	walker.pedestrian.0001
pedestrian.age	Spawned walker age (used to retrieve the blueprint name)	adult
pedestrian.gender	Spawned walker gender (used to retrieve the blueprint name)	female
pedestrian.spawn_point	[x, y, z, pitch, yaw, roll] as extracted from carla.Transform	‘[0.3317, 308.9236, 0.4300, 0.0000, 0.0000, 0.0000]’
frame.idx	Clip frame index for data in this row; each clip has multiple frames; frames indices match frames in the recording file	0
world.frame	The frame number as retrieved from the simulation; may not be continuous as some frames could have been skipped in recording (e.g., due to timeouts)	9
frame.pedestrian.transform	[x, y, z, pitch, yaw, roll] as extracted from carla.Transform depicting pedestrian world position in this frame	‘[0.3306, 308.9260, 0.9515, 0.0000, 114.8754, 0.0000]’
frame.pedestrian.velocity	[x, y, z] as extracted from carla.Vector3D depicting pedestrian velocity vector in this frame	‘[−0.0328, 0.0709, 0.0000]’
frame.pedestrian.pose.in_frame	Is any joint of the pedestrian visible in the frame?	True
frame.pedestrian.pose.in_segmentation	Is any part of the pedestrian visible in semantic segmentation?	True
frame.pedestrian.pose.world	List of 26 joint transforms in the world coordinates in the format of [x, y, z, pitch, yaw, roll]. Please see the CARLA Skeleton page for details.	‘[[ 0.3284, 308.9307, 0.0315, 0.0000, 24.8751, 89.9962 ], [ 0.3262, 308.9266, 1.0834, −0.3833, 22.5587, 91.0798 ], [ 0.3304, 308.9183, 1.1901, 0.0551, 21.5080, 91.7672 ], [ 0.3409, 308.8913, 1.3531, 0.4753, 19.9551, 91.1696 ], [ 0.3754, 308.9134, 1.5020, −3.6991, 8.6236, 95.4971 ], [ 0.4833, 308.9297, 1.4949, −77.1929, 22.7891, −80.1452 ], [ 0.5425, 308.9556, 1.2332, −73.6844, 89.9181, −139.3493 ], [ 0.5427, 309.0256, 0.9941, −2.6864, −53.1221, 167.7393 ], [ 0.3419, 308.8837, 1.5485, 0.8279, −157.0359, 77.1548 ], [ 0.3317, 308.9111, 1.6377, 2.2554, −154.0686, 74.4913 ], [ 0.3249, 309.0070, 1.7104, −2.2605, 25.9578, 96.4855 ], [ 0.2658, 308.9782, 1.7130, −2.2605, 25.9578, 96.4855 ], [ 0.2979, 308.8853, 1.5013, −1.5655, −152.7219, −93.7213 ], [ 0.2008, 308.8352, 1.4983, 76.9049, 16.1328, 105.7521 ], [ 0.1371, 308.8182, 1.2369, 72.4123, −65.7691, 30.6981 ], [ 0.1064, 308.8882, 0.9997, 2.2517, 80.1024, −25.2115 ], [ 0.2587, 308.8813, 0.9966, 5.2301, 40.1852, −97.5098 ], [ 0.2449, 308.9321, 0.5368, 2.7539, 40.1287, 98.4941 ], [ 0.2776, 308.8831, 0.1057, 0.4839, 43.3462, −159.9944 ], [ 0.2029, 308.9626, 0.0708, −4.2481, 134.8538, 90.5368 ], [ 0.1711, 308.9946, 0.0556, −0.7624, −135.1572, −166.2830 ], [ 0.4048, 308.9420, 0.9955, −4.0937, 11.0556, 85.5929 ], [ 0.3875, 308.9619, 0.5335, −0.7111, 10.8526, −77.3653 ], [ 0.4259, 308.8781, 0.1082, 1.4688, 6.0679, −156.9226 ], [ 0.4157, 308.9846, 0.0676, −5.1784, 94.5061, 91.1217 ], [ 0.4125, 309.0294, 0.0517, −0.8878, −175.3521, 12.7913 ]]’
frame.pedestrian.pose.component	List of 26 joint transforms in the format of [x, y, z, pitch, yaw, roll] relative to the actor’s pivot. Please see the CARLA Skeleton page for details.	‘[[ 0.0000, 0.0000, −0.9200, 0.0000, −90.0002, 89.9962 ], [ −0.0028, 0.0037, 0.1319, −0.3833, −92.3167, 91.0798 ], [ −0.0121, 0.0034, 0.2386, 0.0551, −93.3674, 91.7672 ], [ −0.0410, 0.0052, 0.4016, 0.4753, −94.9203, 91.1696 ], [ −0.0355, −0.0354, 0.5505, −3.6991, −106.2519, 95.4971 ], [ −0.0660, −0.1401, 0.5434, −77.1929, −92.0862, −80.1453 ], [ −0.0675, −0.2048, 0.2817, −73.6844, −24.9572, −139.3493 ], [ −0.0041, −0.2343, 0.0426, −2.6864, −167.9974, 167.7393 ], [ −0.0484, 0.0075, 0.5970, 0.8279, 88.0888, 77.1548 ], [ −0.0191, 0.0052, 0.6862, 2.2555, 91.0560, 74.4913 ], [ 0.0707, −0.0289, 0.7589, −2.2605, −88.9175, 96.4855 ], [ 0.0695, 0.0368, 0.7615, −2.2605, −88.9175, 96.4855 ], [ −0.0284, 0.0467, 0.5498, −1.5655, 92.4027, −93.7213 ], [ −0.0330, 0.1559, 0.5468, 76.9049, −98.7427, 105.7521 ], [ −0.0216, 0.2208, 0.2854, 72.4123, 179.3556, 30.6981 ], [ 0.0548, 0.2193, 0.0482, 2.2517, −34.7730, −25.2115 ], [ −0.0155, 0.0840, 0.0451, 5.2301, −74.6902, −97.5098 ], [ 0.0364, 0.0752, −0.4147, 2.7539, −74.7467, 98.4941 ], [ −0.0219, 0.0661, −0.8458, 0.4839, −71.5292, −159.9944 ], [ 0.0817, 0.1005, −0.8807, −4.2481, 19.9784, 90.5368 ], [ 0.1241, 0.1158, −0.8959, −0.7624, 109.9674, −166.2830 ], [ −0.0219, −0.0741, 0.0440, −4.0937, −103.8198, 85.5929 ], [ 0.0034, −0.0667, −0.4180, −0.7111, −104.0229, −77.3653 ], [ −0.0888, −0.0664, −0.8433, 1.4688, −108.8075, −156.9226 ], [ 0.0122, −0.1020, −0.8839, −5.1784, −20.3693, 91.1217 ], [ 0.0542, −0.1178, −0.8998, −0.8878, 69.7725, 12.7913 ]]’
frame.pedestrian.pose.relative	List of 26 joint transforms in the format of [x, y, z, pitch, yaw, roll] relative to the transform of the previous joint in the kinematic tree. Please see the CARLA Skeleton page for details.	‘[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 89.9962 ], [ −0.0037, −1.0519, −0.0027, −2.3165, 0.3835, 1.0681 ], [ 0.0000, −0.1065, −0.0113, −1.0588, −0.4185, 0.6883 ], [ 0.0000, −0.1621, −0.0340, −1.5650, −0.3723, −0.5853 ], [ 0.0412, −0.1486, 0.0060, −11.2187, 4.4801, 3.5672 ], [ 0.1093, 0.0000, 0.0000, 8.3887, 73.7163, −157.8598 ], [ 0.2696, 0.0051, 0.0000, −15.8555, −3.6001, 6.7385 ], [ 0.2491, 0.0000, 0.0000, 79.2443, −163.8402, 14.0588 ], [ 0.0000, −0.1952, −0.0115, −3.0347, −178.7574, 168.2823 ], [ 0.0000, −0.0935, −0.0087, 3.2087, −0.7351, −2.7639 ], [ −0.0329, −0.0952, −0.0661, −0.0268, −179.9978, 170.9779 ], [ 0.0329, −0.0952, −0.0661, −0.0268, −179.9978, 170.9779 ], [ −0.0412, −0.1486, 0.0060, −7.2962, 178.7470, −2.3412 ], [ 0.1093, 0.0000, 0.0000, −6.1137, 104.4920, 25.6830 ], [ −0.2696, −0.0051, 0.0000, −19.5905, −5.2835, 5.7515 ], [ −0.2491, 0.0000, 0.0013, 77.0075, 160.9265, −34.7577 ], [ −0.0791, 0.0876, −0.0143, 17.4371, −6.2079, 169.7054 ], [ −0.0198, −0.4622, 0.0127, 0.3794, −2.4476, −164.0003 ], [ −0.0273, 0.4342, 0.0056, 3.5172, 1.7688, 101.4749 ], [ −0.0001, −0.1145, −0.0045, −15.7630, −91.5994, −89.3997 ], [ 0.0461, 0.0118, 0.0000, 89.7643, 73.1561, −178.8763 ], [ 0.0791, 0.0876, −0.0143, −11.3999, 4.0120, −6.3385 ], [ 0.0198, 0.4622, −0.0127, 0.0574, −3.3882, −162.9684 ], [ 0.0273, −0.4342, −0.0056, 5.1449, 1.0837, −79.4769 ], [ 0.0001, −0.1145, −0.0045, −17.8534, −88.5057, −90.9408 ], [ 0.0461, 0.0118, 0.0000, 89.7659, −105.1966, −177.2250 ]]’
frame.pedestrian.is_crossing	Is the pedestrian considered to be crossing the street in this frame	False
frame.camera.pose	List of 26 joint positions as projected to 2D from the current camera perspective; values are in pixels but may be outside the frame or NaNs. Please see the CARLA Skeleton page for details.	‘[[ 1013.4298, 264.5229 ], [ 1017.1473, 170.7271 ], [ 1017.3837, 161.0205 ], [ 1017.1865, 146.1832 ], [ 1021.5769, 132.5240 ], [ 1030.2338, 133.2859 ], [ 1034.8846, 157.1492 ], [ 1038.1151, 178.8281 ], [ 1017.5922, 128.2966 ], [ 1018.8466, 119.9257 ], [ 1024.4340, 112.6378 ], [ 1018.3950, 112.4053 ], [ 1014.3293, 132.5611 ], [ 1004.3577, 132.8564 ], [ 997.7958, 156.7637 ], [ 998.7059, 178.3826 ], [ 1009.3145, 178.5858 ], [ 1009.5127, 220.1185 ], [ 1007.4017, 257.8045 ], [ 1006.5015, 261.9123 ], [ 1006.0143, 263.6995 ], [ 1023.3165, 178.6749 ], [ 1021.3909, 220.2290 ], [ 1017.5359, 256.9405 ], [ 1022.8544, 261.4956 ], [ 1025.1930, 263.3291 ]]’

Description of the data contained in the dataset. Numbers were rounded to four decimal places.