We introduce the Berkeley Function Leaderboard (BFCL), the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions.
Importing modules and calling top-level functions from them Passing multiple positional and keyword arguments Receiving return values, including nested lists and dicts Getting Python exceptions across ...