AI Experience Impact Score (AXIS): A new North Star metric for AI-powered customer support

Kenji Hayward

Kenji Hayward,

Head of Support @ Front

14 January 20250 min read

AI is handling more and more customer support queries, but visibility into AI’s service quality hasn’t kept up. Enter the AI Experience Impact Score: a new way to pinpoint where AI-led customer experience can be improved.

As customer support leaders, we’re all navigating the exciting but complex world of AI-powered support. Looking past the hype, we’re faced with the reality that 61% of customers think AI advancements make it more important for companies to be trustworthy. While AI promises to revolutionize how we serve customers, it also brings a critical challenge to uphold trust: How do we measure and ensure the quality of these AI interactions?

That’s why I’m excited to introduce the AI Experience Impact Score (AXIS), a new metric I designed specifically for evaluating AI-supported customer interactions. AXIS offers something support leaders have needed all this time: a practical way to measure and improve our AI-led support quality.

Here, I’ll be showing how AXIS is calculated, evaluated, and implemented. I’ve also written an in-depth white paper for a more detailed explanation of the scoring rubric, the customizable formula, and free checklist for next steps after knowing your score. 

Why we need a new way to measure AI-powered customer service

Traditional support metrics like CSAT and First Response Time (FRT) serve us well for human interactions, but AI-supported conversations need to be measured separately while the technology advances (we’re just not quite there yet).

AI-led customer interactions have three common friction points:

  1. Difficulty understanding customer queries

  2. Excessive back-and-forth exchanges

  3. Choppy handoffs between AI and human agents

Unfortunately, these challenges happen more often than not — meaning customer satisfaction is at stake. 

The three-part framework of AXIS

AXIS breaks down AI support quality into three key components:

Resolution Accuracy (RA)

This measures how well your AI-powered support understands and resolves customer issues. It’s not just about getting the right answer — it’s about getting it right the first time, without unnecessary steps or clarifications.

Interaction Effort (IE)

We all know that even a correct answer can feel unsatisfying if it takes too much work to get there. This component measures how easily customers interact with your AI-powered support, checking for the number of exchanges needed and whether customers have to repeat themselves. 

Handoff Smoothness (HS)

There should always be a clear path to a human agent in customer support, regardless of whether or not an escalation is needed. This component measures how seamlessly that transition happens. Does the customer have to repeat anything? Does your human agent have all the context they need? Gartner reports that smoother transitions result in higher CSAT, more positive referrals, and better customer retention.

Getting started with AXIS

AXIS is an easy-to-interpret metric on a scale of 1 to 5. Each component is scored on a 1-5 scale with clear criteria for each level. The industry-standard formula is calculated as follows: 

Calculate AXIS by uploading a customer conversation (taking care to exclude sensitive customer data) and the AXIS scoring rubric into a generative AI tool like ChatGPT or Claude. The tool will calculate AXIS for you with a detailed summary of what went well and not so well within the AXIS framework before assigning a final score.

What’s a good AXIS score?

  • 4 to 5 is excellent and the interaction was accurate, low effort, and smoothly transitioned to human assistance

  • 3 to 3.9 is a fair interaction where there could have either been some inaccuracies, more back and forth, or a slightly rough handoff

  • 1 to 2.9 is considered poor and requires root cause analysis for where improvements can be made 

Download the detailed scoring rubric for the AXIS framework and generative AI prompt from the white paper.

What AXIS means for your business

Good AXIS scores correlate with what we all care about: happy customers and efficient operations. Teams using AXIS can experience:

  • Clearer priorities for AI optimization

  • Better alignment between AI and human support teams

  • Data-driven decisions about AI implementation

  • Deeper insights into AI-powered customer experiences

Quick-start implementation tips

  • Stick to one generative AI tool for calculating AXIS for consistency. We’ve tried AXIS across the two major generative AI platforms ChatGPT and Claude. Both yield similar results (scores appear to have a ~0.2 difference). ChatGPT shows the calculation in a formula for better visualization whereas Claude appears to give more constructive feedback. 

  • Establish a standardized naming convention for labeling your customer interactions. That way, if you’re evaluating more than one at a time, it’s easier to parse the AXIS scores.

  • Fine-tune your model. The more you use AXIS and provide feedback to generative AI, the more accurate the evaluations will be over time. For example, if your support workflow requires customers to confirm information for security reasons across a minimum of three exchanges, inform the generative AI tool that the interaction effort may be scored higher beyond four or more exchanges.

Making AXIS work for your team

AXIS is customizable to your team function, customer experience goals, and business needs. Leading a technical support team? You might weigh Resolution Accuracy more heavily. Running a high-volume chat? Interaction Effort might be your priority.

Start small by scoring a sample of your AI interactions weekly. Look for patterns in your lowest-scoring components — these are your biggest opportunities for improvement. Share these insights with your support operations team to guide troubleshooting or optimization efforts.

A note on using AXIS at scale: As we’re in the early stages of testing, the process ends up being more manual, limiting the number of customer interactions that can be evaluated. Aiming to score 2-5% of all conversations is a reasonable starting point. Having some idea is better than none, but bear in mind the inherently limited view that informs your overall assessment. Got any feedback on AXIS? We’re happy to hear them! Continue the conversation over at Front Community

Ready to understand your AI-powered support on a deeper level?

We’re only just getting started with AXIS and are excited to see how it’s used in the world. Get more details about AXIS in the full white paper that covers:

  • Detailed scoring rubrics for each component

  • Custom weighted formula for example use cases

  • Ready-to-use generative AI prompts for instant evaluation

  • A checklist for next steps after knowing your score

Watch me give a tutorial on calculating AXIS with an example customer interaction in my Top-Tier Support newsletter!

guide: AI Experience Impact Score (AXIS) for evaluating AI-based customer support interactions

AXIS is a new customer service metric measuring service quality of AI-powered customer experiences

Frequently asked questions

How do I ensure the confidentiality of customer information when I upload customer conversations to a generative AI tool?

Take precautions and exclude any sensitive customer data. We recommend you refer to your company’s data security and privacy policies. 

Can I calculate more than one conversation at a time?

Yes, you can upload multiple conversations along with the AXIS scoring rubric at one time. Just make sure you’re clearly labeling each interaction so you can associate the right AXIS scores, respectively.

How do I know AXIS is calculated correctly by gen AI?

Always review the output to make sure the assessment is accurate before making final conclusions.

Is there a way to calculate AXIS at scale?

You can randomly sample a subset of your customer conversations to use as your average. As a manual process, 2-5% of conversations is a reasonable sample size. 

Will Front be building AXIS into the product?

It’s not on the roadmap currently, but you’re welcome to add it to our product ideas and upvote!

What if there is no handoff required between AI and human agent? How is Handoff Smoothness (HS) scored?

Handoff Smoothness (HS) would be treated as a “neutral” value of 5, assuming AI has handled the interaction completely on its own.

Can AXIS be tailored to my current customer support workflow?

Yes, you can customize AXIS by weighting the three components differently to emphasize certain aspects more than others. See p. 6-7 of the white paper for the formula and examples.

Which generative AI tool does AXIS work best with? 

We’ve tried AXIS across the two major generative AI platforms ChatGPT and Claude. Both yield similar results (scores appear to have a ~0.2 difference). ChatGPT shows the calculation in a formula for better visualization. Claude appears to give more constructive feedback. We recommend sticking with one tool to keep your AXIS evaluations consistent.

Written by Kenji Hayward

Stories that focus on building stronger customer relationships