Audio/video packaging, unpackaging, encoding/decoding, visual perception (YOLO object detection + ByteTrack multi-object tracking) pipeline Audio/video unpackaging (MP4, RTSP), resampling, ...