The challenge with picture description activities is that most teachers use them at their most basic level: "Look at the picture and say what you see." That version is indeed basic - and not particularly useful beyond A2 level, since describing visible objects doesn't produce extended, complex language.
But pictures can do much more than prompt description. They can trigger narrative, generate debate, spark speculation, create information gaps, and provide scaffolding for abstract discussion. The difference between a picture activity that produces two sentences and one that produces ten minutes of genuine discussion is almost entirely the design of the task around the picture.
YapYapGo is a classroom speaking practice tool for ESL and EFL teachers. Picture activities work well as triggers before structured pair discussion in YapYapGo's modes. Here are 12 formats that move picture description well beyond "what do you see."Why pictures work in speaking class
They provide shared content without shared language. Two students looking at the same picture have the same content but may describe it completely differently. This creates genuine information exchange - each student notices different things and interprets differently. They scaffold lower levels. A picture of a busy marketplace gives an A2 student vocabulary cues (food, stalls, people) without requiring background knowledge of the topic. They remove blank-page anxiety. Students who freeze when asked to "discuss" a topic immediately can describe what they see, which gives them content to work with. They work for Cambridge FCE Part 2. The IELTS and Cambridge exam picture description tasks are specific, high-stakes skills that require dedicated practice.Format 1: What story does this tell? (B1-C1)
Show a photo of a situation with implied context: two people in conversation that clearly isn't going well, a person looking at a letter with mixed expression, an empty playground. Students speculate in pairs: "What happened just before this photo was taken? What happens next? What is each person thinking?"
The speculation format forces hypothetical language (might, could, may have, must have) and extended inference rather than simple description.
Questions to push deeper: "What does the photo tell us about society?" "If this photo appeared in a newspaper, what would the headline be?"Format 2: The compare and contrast pair (all levels)
Each student in a pair has a different picture on a related theme. Without showing each other their photos, they describe and compare: "In my picture, I can see... whereas in yours you mentioned... Both pictures seem to show... but mine is more..."
The information gap creates genuine communicative purpose. This is also the exact format of IELTS Part 2 and Cambridge FCE Part 2, making it directly exam-relevant.
Levelling: At A2, use pictures with simple, concrete differences. At B2-C1, use pictures that require nuanced interpretation and abstract comparison ("These pictures both explore the theme of isolation, but in very different ways").Format 3: The controversial image (B1-B2)
Show an image that could be interpreted in multiple ways or that raises a genuine question: a child using a smartphone, an empty factory floor, a luxury resort next to a fishing village. Students discuss: "What do you think about this? Is this image showing something positive or negative? What questions does it raise?"
The ambiguity drives discussion because there's no single correct answer. Students who agree initially are challenged to find the alternative reading.
Format 4: Caption competition (B1-B2)
Show an image. In pairs, students write the best possible caption (one sentence). Share captions with the class. Vote on the best one.
The caption task forces precision - one sentence that conveys the essential meaning of the image in an interesting way. It practises summary language, connotation awareness, and tone. The competitive voting element creates genuine investment.
Tool tip: After a picture activity warm-up, YapYapGo provides the structured follow-up discussion. The picture generates the topic context; YapYapGo provides levelled discussion questions on that theme for extended pair practice. A classroom countdown timer keeps each picture activity round to a consistent length.
Format 5: The rank and justify (B1-C1)
Give students five pictures on a theme (five images of "success," five environments, five faces showing different emotions). They rank them individually from most to least [interesting / positive / representative of the theme] and explain their ranking to their partner.
Ranking forces opinion formation. Comparing rankings forces justification and counter-argument. The sequence from individual ranking to partner comparison to class share produces three rounds of increasingly complex language production on the same visual content.
Format 6: The prediction sequence (A2-B2)
Show the first picture in a sequence (a person arriving somewhere, a scene that implies something is about to happen). Students predict: "What happens next?" Then show the next picture. Were they right? What do they predict now?
Prediction generates future forms and conditional language naturally. The reveal element keeps engagement high. Works particularly well with advertisement stills, narrative photo series, or images of processes.
Format 7: The empathy task (B1-C1)
Show a picture of people in a specific situation - waiting at a job centre, celebrating a promotion, navigating a crowd. Students choose one person and speak in that person's voice: "I am standing here because... I'm feeling... I'm thinking about..."
Speaking from another person's perspective generates different language from description or commentary. It practises reported thought, emotional vocabulary, and narrative voice. It also develops empathetic perspective-taking, which is a sophisticated communicative skill.
Format 8: The selective describe (A2-B1)
Both students look at the same busy picture. Student A secretly chooses three objects/people in the picture to describe. Student B tries to identify which three they're describing based on the description.
The guessing element creates genuine information exchange. The selective element requires specific, precise description rather than general observation.
IELTS and Cambridge exam-specific practice
For exam preparation, the picture description format needs to match the exam format precisely:
IELTS Part 2 (2 minutes): Individual long turn from a cue card, not picture description. But pictures can be used in practice to trigger the same kind of extended personal narrative. "Look at this image of someone learning a language. Tell your partner about a time you experienced something similar." Cambridge FCE Part 2: Two photographs, one minute to describe and compare, brief response from partner. Practise with: compare and contrast language, speculation vocabulary ("It looks as if...", "They seem to be..."), evaluative language ("In the first photo... whereas the second shows..."), and direct answer to the examiner's question.For more on Cambridge FCE preparation specifically, see our post on Cambridge B2 First speaking practice. A random student picker is useful for selecting which pair presents. An activity timer labelled with the picture format keeps each round consistent.
Sources:
- Ur, P. (1981). Discussions That Work. Cambridge University Press. - Visual prompts as triggers for genuine communicative tasks.
- Nation, I.S.P. & Newton, J. (2009). Teaching ESL/EFL Listening and Speaking. Routledge. - Picture activities and their role in scaffolding speaking for lower levels.
- Cambridge Assessment English. B2 First Speaking Test Format. - Official guidance on Part 2 picture description task requirements.
