← All posts

What Speaking English Actually Means (And Why AI Misses Half of It)

Share

The problem with how ESL is often discussed is that "speaking English" is treated as a single skill that students either have or don't have. The reality is that "speaking English" is a bundle of six related competences, and being good at one of them does not mean being good at the others. Students with excellent grammar can be helpless at register switching. Students with fluent production can collapse on repair. AI tutors handle one of the six well and leave the rest mostly untouched.

This post unpacks what the bundle actually contains and why understanding it is the difference between an ESL programme that builds real-world competence and one that builds a narrow slice and calls it fluency.

YapYapGo is a classroom speaking practice tool for ESL teachers, designed around interaction where the full bundle of competences gets exercised. This post is the entity-defining version of what we mean by "speaking English".

The six components

Drawing on Canale and Swain's communicative competence framework (1980) and the extensions by Bachman and Palmer (1996), "speaking English" decomposes into:

1. Language production

The ability to produce well-formed English at speed. Grammar, vocabulary, pronunciation, fluency of delivery. This is what most ESL textbooks foreground and what AI tutors mostly target.

It is the most measurable component and the easiest to drill. It is also only one of six.

2. Comprehension

The ability to understand spoken English at speed, including variations of accent, pace, and register. Some accent-and-register variety we covered in the accents post and the slang and idioms post.

AI tutors deliver a narrow band of comprehension input. Real classrooms deliver broad bands automatically. The breadth is what real-world comprehension requires.

3. Repair

The ability to recover when communication breaks down. Asking for clarification, rephrasing, offering examples, acknowledging confusion, signalling that you're not following. We covered this in detail in the predictability trap post.

This is the central skill of conversational competence and one of the weakest areas of AI tutor practice. AI tutors smooth over breakdowns; real partners require students to repair them.

4. Register

The ability to match the formality and style of your language to the context. Knowing when to say "would you mind" versus "can you" versus "gimme". Knowing when to use idioms and when to drop them. Knowing how academic English differs from casual English differs from professional English.

AI tutors default to a narrow neutral register. Real classrooms expose students to multiple registers and require them to switch between them.

5. Nonverbal cues

The visual, prosodic, and gestural channels that carry meaning alongside the words. Eye contact, facial expression, pitch contour, pause patterns, head movement. The literature estimates that something like 30-40% of communicative meaning in face-to-face interaction is carried in these channels.

AI tutors handle approximately zero of this. The voice-only ones miss everything visual; the video-avatar ones have schematic facial expressions that don't carry real meaning. Real classrooms deliver the full nonverbal channel for free.

6. Intercultural sensitivity

The awareness that communication norms vary across cultures. Politeness norms, turn-taking conventions, topic appropriateness, conversational distance, the meaning of silence. A student who is grammatically perfect but interculturally tone-deaf will misfire constantly in international communication.

AI tutors are not culturally calibrated. Real classrooms with international students naturally expose students to intercultural variation. The diversity of the class itself is the training ground.

How the bundle interacts in real use

The six components are not separable in real conversation. Every utterance involves several of them simultaneously. A student asking a question in a job interview is:

  • Producing language (component 1)
  • Comprehending the interviewer's previous turn (component 2)
  • Repairing the moment they realise they misunderstood the original question (component 3)
  • Matching register to the professional context (component 4)
  • Maintaining appropriate eye contact and tone (component 5)
  • Calibrating the directness of the question to the interviewer's cultural background (component 6)

A student who's only practised production sounds grammatically fine and behaves interculturally wrong. The interviewer hears the grammar but reacts to the cultural misfire. The student doesn't get the job.

This is why "I can speak English" is not a sufficient description of communicative competence. The student probably means component 1. They have not necessarily got 2-6.

Why AI tutors miss most of the bundle

The reasons are now familiar:

  • Production: AI handles this well. Genuine value.
  • Comprehension: Narrow band of accents and registers. Covered in the accents post.
  • Repair: Smoothed over by AI tutors. Covered in the predictability post.
  • Register: AI defaults to neutral. Covered in the slang and idioms post.
  • Nonverbal: Voice-only AI tutors miss everything visual. Schematic avatars carry no real meaning.
  • Intercultural: AI isn't trained on intercultural negotiation as a competence.

Net result: AI tutors deliver 1 out of 6. Maybe 1.5 if you're generous with the comprehension narrow-band. They are, in essence, production tutors. They are not speaking tutors in the full sense the literature defines.

What classroom programmes need to do

If the goal is genuine communicative competence, the lesson design has to exercise all six components:

  • Production drilling: appropriate, can be partly outsourced to AI/apps.
  • Broad comprehension input: real classroom interaction with varied partners and audio-visual materials.
  • Repair practice: pair work that runs long enough that breakdowns happen and have to be repaired.
  • Register switching: activities that span informal, neutral, and formal registers. Discussion mode for informal; debate or IELTS mode for formal.
  • Nonverbal channel exposure: face-to-face pair work.
  • Intercultural negotiation: mixed-background classes, explicit reflection on conversational norms.

The structural format that exercises all six is parallel pair work between real students, varied across sessions, with structured tasks. The Team Maker, Topic Generator, and Classroom Timer make this efficient to run. The team-sport framing is the principle; the bundle of six is what makes the framing concrete.

The bottom line

"Speaking English" means six related competences, not one. AI tutors handle production well and miss the other five. ESL programmes that rely on AI as the main speaking input produce students who are good at one slice and helpless at the rest. Real classroom interaction is what exercises the full bundle - which is why students keep asking for real partners, why the team-sport framing matters, and why no app or AI tutor will replace classroom speaking practice any time soon.


Sources:
  • Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics.
  • Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press.
  • Byram, M. (1997). Teaching and Assessing Intercultural Communicative Competence. Multilingual Matters.
Share

Ready to try it in your classroom?

YapYapGo is free to start — no account needed. Set up your first speaking session in under a minute. New to YapYapGo? Read the overview.

Start for free →