Evaluating Fine-Grained Performance in Custom LLMs
How do you know your model is 'good'? Moving beyond loss curves to semantic evaluation frameworks and 'LLM-as-a-Judge'.
How do you know your model is 'good'? Moving beyond loss curves to semantic evaluation frameworks and 'LLM-as-a-Judge'.