OracyAI: an EdTech venture building an AI-powered oracy

7 minute read

Andreea Zaman

Context & Challenge

Oracy Group is building OracyAI to give schools a credible, scalable way to measure spoken language skills — not just teach them. The platform uses AI to assess pupils across four strands: Physical, Linguistic, Cognitive, and Social and Emotional. The ambition is assessment-grade measurement: results that schools, parents, and regulators can trust in the same way they trust a reading age or a writing score. That is a considerably higher bar than most EdTech AI tools attempt.

The challenge was that the underlying measurement infrastructure did not yet exist. Spoken language is a complex, multidimensional construct, and automating its assessment introduces well-documented risks: construct under-representation (measuring only what is easy to detect rather than what matters), differential validity across ages and cultural backgrounds, and fairness concerns that, if ignored at the design stage, become very expensive to correct later. With the platform entering active build and early school pilots on the horizon, Oracy Group needed the measurement architecture to be right from the start, not retrofitted.

Approach

  • Defined the constructs. Working from the Voice21 four-strand framework, I specified what each strand means as a measurable construct, separating dimensions that could be reliably automated from those requiring human judgment in the current state of the technology.

  • Designed the scoring model. For the Physical Strand — the most technically developed — I designed a five-point developmental scale (Emerging to Accomplished) with age-differentiated thresholds across three bands (ages 8–11, 12–14, 15–18), grounded in developmental research and documented in a technical brief.

  • Built fairness constraints into the framework. I identified and excluded dimensions with documented cultural or demographic bias — most notably eye contact — before they could be embedded in the product.

  • Led ASR provider evaluation. I defined the measurement requirements that any speech recognition provider had to meet, with word-level confidence scores for the Clarity dimension as a non-negotiable criterion. That requirement drove the selection of AssemblyAI Universal-2 over competing providers.

  • Designed assessment task specifications. I set prompt formats (spontaneous opinion or memory recall, not read-aloud) and stimulus requirements by age band, grounded in best practice for eliciting valid spoken language samples.

  • Produced developer-ready documentation. I translated the scoring model and measurement logic into structured technical briefs for the backend developer, ensuring the implementation matched the psychometric intent.

Outcomes

  • A complete, implementation-ready scoring model for the Physical Strand exists where none did before — with defined constructs, age-differentiated thresholds, and a clear rationale for every decision.

  • The platform's ASR selection is grounded in measurement criteria rather than cost or brand recognition, reducing the risk of building on a provider that cannot support valid scoring at scale.

  • Fairness and bias considerations are embedded in the framework at the design stage, not treated as an afterthought — a meaningful differentiator for a product seeking trust from schools and parents in culturally diverse markets.

  • The development team has a replicable methodology for the remaining three strands (Linguistic, Cognitive, Social and Emotional), reducing ambiguity in the roadmap and enabling faster iteration.

  • Oracy Group can now articulate its measurement approach to schools, regulators, and potential research partners with precision — supporting credibility conversations that a purely technical product team would find difficult to have.

Call to Action

If you are building or procuring an AI tool that assesses human skills and you need someone who can make the measurement stand up to scrutiny, get in touch.