CARLA-BSP

Example frame from the dataset

CARLA-BSP (Binary Single Pedestrian) is a pedestrian crossing/non-crossing dataset created as a part of the ARCANE project.

@misc{wielgosz2023carlabsp,
      title={{CARLA-BSP}: a simulated dataset with pedestrians}, 
      author={Maciej Wielgosz and Antonio M. López and Muhammad Naveed Riaz},
      month={May},
      year={2023},
      eprint={2305.00204},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

The dataset contains almost 400 videos, 900 frames long @ 30 FPS. Videos have 1600×600 resolution. The pedestrian may not be visible in all frames (sometimes something may obstruct the view from the camera, and the pedestrian is not visible at all, even though the related skeleton points are correct). Semantic labels corresponding to video frames are available as *.apng files.

The corresponding CSV file contains only a subset of frames. For each video, the frames with any of the skeleton coordinate inside the frame boundary or pedestrians visible in semantic segmentation were determined. Only that data was left in the data.csv file to keep it relatively small. Therefore, it is crucial to check to which video frame a particular data row corresponds. This information can be found in the frame.idx column.

AgeGender# of videos# of frames# of crossing frames
adultfemale884995117820
male1096077522924
childfemale1005575523038
male995884628082
Total39622532791864
Basic dataset stats

The videos are captured using randomly spawned pedestrians that are supposed to go through a nearby trajectory waypoint. In general, there are two possible scenarios:

  1. a pedestrian immediately starts crossing the street,
  2. a pedestrian walks in parallel to the road.

Sometimes the pedestrians get stuck, e.g., when they reach the pavement. Occasionally, they are spawned in the middle of the street instead of on a sidewalk. Sometimes there are rendering artifacts, missing (black) video frames, colors are coded wrong, etc. This version of the dataset was not cleaned to eliminate such instances. Therefore you may encounter such videos in the set.

Column nameDescriptionExample value
idScene identifier094b9fe1-babe-48d5-bd17-3ba3185690c5-0
world.mapMap used to generate the clip/Game/Carla/Maps/Town04
camera.idxCamera index for data in this row; each clip can potentially have multiple cameras (currently, only a single one is used)0
camera.widthWidth of the captured frame1600
camera.heightHeight of the captured frame600
camera.recordingPath to the captured recording, relative to dataset fileclips/094b9fe1-babe-48d5-bd17-3ba3185690c5-0-0.mp4
camera.semantic_segmentationPath to the captured semantic segmentation, relative to dataset fileclips/094b9fe1-babe-48d5-bd17-3ba3185690c5-0-1.apng
camera.transform[x, y, z, pitch, yaw, roll] as extracted from carla.Transform‘[−5.6229, 316.1439, 1.1189, −9.4047, −65.5020, 0.0000]’            
pedestrian.idxPedestrian index for data in this row; each clip can potentially feature multiple pedestrians (currently, only a single one is used)0
pedestrian.modelBlueprint name of the spawned walkerwalker.pedestrian.0001
pedestrian.ageSpawned walker age (used to retrieve the blueprint name)adult
pedestrian.genderSpawned walker gender (used to retrieve the blueprint name)female
pedestrian.spawn_point[x, y, z, pitch, yaw, roll] as extracted from carla.Transform‘[0.3317, 308.9236, 0.4300, 0.0000, 0.0000, 0.0000]’
frame.idxClip frame index for data in this row; each clip has multiple frames; frames indices match frames in the recording file0
world.frameThe frame number as retrieved from the simulation; may not be continuous as some frames could have been skipped in recording (e.g., due to timeouts)9
frame.pedestrian.transform[x, y, z, pitch, yaw, roll] as extracted from carla.Transform depicting pedestrian world position in this frame‘[0.3306, 308.9260, 0.9515, 0.0000, 114.8754, 0.0000]’
frame.pedestrian.velocity[x, y, z] as extracted from carla.Vector3D depicting pedestrian velocity vector in this frame‘[−0.0328, 0.0709, 0.0000]’
frame.pedestrian.pose.in_frameIs any joint of the pedestrian visible in the frame?True
frame.pedestrian.pose.in_segmentationIs any part of the pedestrian visible in semantic segmentation?True
frame.pedestrian.pose.worldList of 26 joint transforms in the world coordinates in the format of [x, y, z, pitch, yaw, roll]. Please see the CARLA Skeleton page for details.‘[[ 0.3284, 308.9307, 0.0315, 0.0000, 24.8751, 89.9962 ],
[ 0.3262, 308.9266, 1.0834, −0.3833, 22.5587, 91.0798 ],
[ 0.3304, 308.9183, 1.1901, 0.0551, 21.5080, 91.7672 ],
[ 0.3409, 308.8913, 1.3531, 0.4753, 19.9551, 91.1696 ],
[ 0.3754, 308.9134, 1.5020, −3.6991, 8.6236, 95.4971 ],
[ 0.4833, 308.9297, 1.4949, −77.1929, 22.7891, −80.1452 ],
[ 0.5425, 308.9556, 1.2332, −73.6844, 89.9181, −139.3493 ],
[ 0.5427, 309.0256, 0.9941, −2.6864, −53.1221, 167.7393 ],
[ 0.3419, 308.8837, 1.5485, 0.8279, −157.0359, 77.1548 ],
[ 0.3317, 308.9111, 1.6377, 2.2554, −154.0686, 74.4913 ],
[ 0.3249, 309.0070, 1.7104, −2.2605, 25.9578, 96.4855 ],
[ 0.2658, 308.9782, 1.7130, −2.2605, 25.9578, 96.4855 ],
[ 0.2979, 308.8853, 1.5013, −1.5655, −152.7219, −93.7213 ],
[ 0.2008, 308.8352, 1.4983, 76.9049, 16.1328, 105.7521 ],
[ 0.1371, 308.8182, 1.2369, 72.4123, −65.7691, 30.6981 ],
[ 0.1064, 308.8882, 0.9997, 2.2517, 80.1024, −25.2115 ],
[ 0.2587, 308.8813, 0.9966, 5.2301, 40.1852, −97.5098 ],
[ 0.2449, 308.9321, 0.5368, 2.7539, 40.1287, 98.4941 ],
[ 0.2776, 308.8831, 0.1057, 0.4839, 43.3462, −159.9944 ],
[ 0.2029, 308.9626, 0.0708, −4.2481, 134.8538, 90.5368 ],
[ 0.1711, 308.9946, 0.0556, −0.7624, −135.1572, −166.2830 ],
[ 0.4048, 308.9420, 0.9955, −4.0937, 11.0556, 85.5929 ],
[ 0.3875, 308.9619, 0.5335, −0.7111, 10.8526, −77.3653 ],
[ 0.4259, 308.8781, 0.1082, 1.4688, 6.0679, −156.9226 ],
[ 0.4157, 308.9846, 0.0676, −5.1784, 94.5061, 91.1217 ],
[ 0.4125, 309.0294, 0.0517, −0.8878, −175.3521, 12.7913 ]]’
frame.pedestrian.pose.componentList of 26 joint transforms in the format of [x, y, z, pitch, yaw, roll] relative to the actor’s pivot. Please see the CARLA Skeleton page for details.‘[[ 0.0000, 0.0000, −0.9200, 0.0000, −90.0002, 89.9962 ],
[ −0.0028, 0.0037, 0.1319, −0.3833, −92.3167, 91.0798 ],
[ −0.0121, 0.0034, 0.2386, 0.0551, −93.3674, 91.7672 ],
[ −0.0410, 0.0052, 0.4016, 0.4753, −94.9203, 91.1696 ],
[ −0.0355, −0.0354, 0.5505, −3.6991, −106.2519, 95.4971 ],
[ −0.0660, −0.1401, 0.5434, −77.1929, −92.0862, −80.1453 ],
[ −0.0675, −0.2048, 0.2817, −73.6844, −24.9572, −139.3493 ],
[ −0.0041, −0.2343, 0.0426, −2.6864, −167.9974, 167.7393 ],
[ −0.0484, 0.0075, 0.5970, 0.8279, 88.0888, 77.1548 ],
[ −0.0191, 0.0052, 0.6862, 2.2555, 91.0560, 74.4913 ],
[ 0.0707, −0.0289, 0.7589, −2.2605, −88.9175, 96.4855 ],
[ 0.0695, 0.0368, 0.7615, −2.2605, −88.9175, 96.4855 ],
[ −0.0284, 0.0467, 0.5498, −1.5655, 92.4027, −93.7213 ],
[ −0.0330, 0.1559, 0.5468, 76.9049, −98.7427, 105.7521 ],
[ −0.0216, 0.2208, 0.2854, 72.4123, 179.3556, 30.6981 ],
[ 0.0548, 0.2193, 0.0482, 2.2517, −34.7730, −25.2115 ],
[ −0.0155, 0.0840, 0.0451, 5.2301, −74.6902, −97.5098 ],
[ 0.0364, 0.0752, −0.4147, 2.7539, −74.7467, 98.4941 ],
[ −0.0219, 0.0661, −0.8458, 0.4839, −71.5292, −159.9944 ],
[ 0.0817, 0.1005, −0.8807, −4.2481, 19.9784, 90.5368 ],
[ 0.1241, 0.1158, −0.8959, −0.7624, 109.9674, −166.2830 ],
[ −0.0219, −0.0741, 0.0440, −4.0937, −103.8198, 85.5929 ],
[ 0.0034, −0.0667, −0.4180, −0.7111, −104.0229, −77.3653 ],
[ −0.0888, −0.0664, −0.8433, 1.4688, −108.8075, −156.9226 ],
[ 0.0122, −0.1020, −0.8839, −5.1784, −20.3693, 91.1217 ],
[ 0.0542, −0.1178, −0.8998, −0.8878, 69.7725, 12.7913 ]]’
frame.pedestrian.pose.relativeList of 26 joint transforms in the format of [x, y, z, pitch, yaw, roll] relative to the transform of the previous joint in the kinematic tree. Please see the CARLA Skeleton page for details.‘[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 89.9962 ],
[ −0.0037, −1.0519, −0.0027, −2.3165, 0.3835, 1.0681 ],
[ 0.0000, −0.1065, −0.0113, −1.0588, −0.4185, 0.6883 ],
[ 0.0000, −0.1621, −0.0340, −1.5650, −0.3723, −0.5853 ],
[ 0.0412, −0.1486, 0.0060, −11.2187, 4.4801, 3.5672 ],
[ 0.1093, 0.0000, 0.0000, 8.3887, 73.7163, −157.8598 ],
[ 0.2696, 0.0051, 0.0000, −15.8555, −3.6001, 6.7385 ],
[ 0.2491, 0.0000, 0.0000, 79.2443, −163.8402, 14.0588 ],
[ 0.0000, −0.1952, −0.0115, −3.0347, −178.7574, 168.2823 ],
[ 0.0000, −0.0935, −0.0087, 3.2087, −0.7351, −2.7639 ],
[ −0.0329, −0.0952, −0.0661, −0.0268, −179.9978, 170.9779 ],
[ 0.0329, −0.0952, −0.0661, −0.0268, −179.9978, 170.9779 ],
[ −0.0412, −0.1486, 0.0060, −7.2962, 178.7470, −2.3412 ],
[ 0.1093, 0.0000, 0.0000, −6.1137, 104.4920, 25.6830 ],
[ −0.2696, −0.0051, 0.0000, −19.5905, −5.2835, 5.7515 ],
[ −0.2491, 0.0000, 0.0013, 77.0075, 160.9265, −34.7577 ],
[ −0.0791, 0.0876, −0.0143, 17.4371, −6.2079, 169.7054 ],
[ −0.0198, −0.4622, 0.0127, 0.3794, −2.4476, −164.0003 ],
[ −0.0273, 0.4342, 0.0056, 3.5172, 1.7688, 101.4749 ],
[ −0.0001, −0.1145, −0.0045, −15.7630, −91.5994, −89.3997 ],
[ 0.0461, 0.0118, 0.0000, 89.7643, 73.1561, −178.8763 ],
[ 0.0791, 0.0876, −0.0143, −11.3999, 4.0120, −6.3385 ],
[ 0.0198, 0.4622, −0.0127, 0.0574, −3.3882, −162.9684 ],
[ 0.0273, −0.4342, −0.0056, 5.1449, 1.0837, −79.4769 ],
[ 0.0001, −0.1145, −0.0045, −17.8534, −88.5057, −90.9408 ],
[ 0.0461, 0.0118, 0.0000, 89.7659, −105.1966, −177.2250 ]]’
frame.pedestrian.is_crossingIs the pedestrian considered to be crossing the street in this frameFalse
frame.camera.poseList of 26 joint positions as projected to 2D from the current camera perspective; values are in pixels but may be outside the frame or NaNs. Please see the CARLA Skeleton page for details.‘[[ 1013.4298, 264.5229 ], [ 1017.1473, 170.7271 ],
[ 1017.3837, 161.0205 ], [ 1017.1865, 146.1832 ],
[ 1021.5769, 132.5240 ], [ 1030.2338, 133.2859 ],
[ 1034.8846, 157.1492 ], [ 1038.1151, 178.8281 ],
[ 1017.5922, 128.2966 ], [ 1018.8466, 119.9257 ],
[ 1024.4340, 112.6378 ], [ 1018.3950, 112.4053 ],
[ 1014.3293, 132.5611 ], [ 1004.3577, 132.8564 ],
[ 997.7958, 156.7637 ], [ 998.7059, 178.3826 ],
[ 1009.3145, 178.5858 ], [ 1009.5127, 220.1185 ],
[ 1007.4017, 257.8045 ], [ 1006.5015, 261.9123 ],
[ 1006.0143, 263.6995 ], [ 1023.3165, 178.6749 ],
[ 1021.3909, 220.2290 ], [ 1017.5359, 256.9405 ],
[ 1022.8544, 261.4956 ], [ 1025.1930, 263.3291 ]]’
Description of the data contained in the dataset. Numbers were rounded to four decimal places.