Can AI Give Reliable Career Advice? Evaluating LLMs using CareerNet
AI-powered career coaching tools are becoming increasingly popular as they expand access to personalized career guidance to many who might not otherwise have access to it. Yet an essential question remains: how effective are they at delivering high-quality, reliable career advice?
Renaissance Philanthropy, with funding from the Schultz Family Foundation and support from The Learning Agency, are tackling that question with the launch of CareerNet, a project that includes the release of three benchmark datasets designed to evaluate AI-generated career guidance. CareerNet marks an important milestone in AI career guidance because, while Large Language Models (LLMs) demonstrate decent capability, there is substantial room for their improvement. Our recent research shows that, if LLMs offering career advice were graded like teachers grade students, most would get a C – not terrible, but not nearly good enough.
The result is a system where both sides default to what's visible – and what they know is the same narrow set of highly visible, highly congested channels. This has obvious costs for efficiency. But it has a subtler cost for the kind of research that gets funded.
What is CareerNet
CareerNet is designed as the first of its kind to develop high-quality benchmark datasets across three high-growth career domains: allied health, computer science, and reskilling pathways. The datasets draw from CareerVillage, an online community that has crowdsourced thousands of career-related questions and responses spanning a wide range of occupations and learner needs.
Each of the three datasets were reviewed and rated by career navigation experts for completion, coherence, and correctness. In addition to these ratings, the datasets include scenario and goal labels – metadata designed to help AI systems better understand the user’s intent, such as whether a learner wants information about education requirements, job responsibilities or career progression.
CareerNet serves a dual purpose. First, it enables the training of LLMs to provide better relevant and trustworthy career information. Second, it provides a rigorous benchmark for assessing the outputs of LLMs.
By releasing these resources openly, Renaissance Philanthropy aims to strengthen the infrastructure guiding AI development in career navigation applications. Open, high-quality benchmarks help ensure that AI systems evolve in ways that expand access to reliable guidance and serve the public good.
The CareerNet datasets are available for free on GitHub, where researchers and developers can download the data and access detailed documentation about their development.
How LLMs Perform
The release of CareerNet enables a critical step in scaling AI-powered career guidance: measuring how well large language models actually deliver career guidance in practice. Partnering with The Learning Agency, we sampled 300 career questions from the dataset and generated responses from three LLMs: OpenAI’s GPT-5.2, Google’s Gemini 2.5 Flash, and Meta’s Llama 4 Maverick.
We evaluated responses across two dimensions: completeness (did the answer cover the relevant information?) and correctness (was it well-written and easy to follow?). Across models, the results were decent. The LLMs provided complete and coherent answers. GPT-5.2 and Gemini 2.5 Flash were found to have the most complete responses, scoring a “fully complete” on roughly 77% of questions. However, there were still 23-30% of responses that fell short of the top score. On coherence, Llama 4 Maverick was found to rank first.
These findings suggest that while these LLMs are not perfect, they could provide some useful career guidance. They are capable of giving coherent advice, though they will miss details. For platforms delivering career guidance, this points to a practical approach: automated scoring systems, like those used in this study, could act as quality filters, flagging responses that fall short before they reach users.
For those interested in the full technical methodology and detailed results, a report is available here.
Looking Ahead
CareerNet provides a clear, data-driven benchmark for evaluating how well AI models can support career navigation. Our findings demonstrate that while current Large Language Models are capable of delivering some useful and coherent guidance, there remains much room to improve, both accuracy and coverage.
Looking ahead, CareerNet can help developers identify strengths and weaknesses in different models, guiding iterative improvements in AI-driven career tools. By combining transparent, high-quality datasets with automated evaluation methods, we can ensure that AI systems are not only effective but also equitable and reliable for learners across diverse backgrounds.
This work reflects Renaissance Philanthropy’s broader mission: building the tools and benchmarks that steer AI development toward addressing society’s needs in beneficial ways. By investing in open datasets and rigorous evaluation, Renaissance demonstrates how focused philanthropic support can create lasting, positive societal impact. Initiatives like CareerNet show that, with the right infrastructure and guidance, AI has the potential to serve as a public good, helping every learner access reliable career guidance in an increasingly complex labor market.