The TrafficPerceiver framework integrates text instructions and visual inputs within a multimodal large language model to perform both traffic scene understanding and target-oriented segmentation.