Feature · Media analysis

The AI watches your Reels. Reads your photos.

Every Instagram post gets a multimodal analysis the moment it goes live — audio transcribed, on-screen text extracted, products identified, scene described. Replies are grounded in what your post is actually showing.

What it extracts

Six layers of context, per post.

Captions tell the AI what you wanted to say. Media analysis tells it what your audience actually saw. The combination makes replies specific in a way captions alone can't.

M:01

Audio transcription

Every Reel's spoken audio is transcribed and added to that post's context. The AI quotes you when it matters and avoids contradicting what you said on camera.

Reels & video
M:02
T

On-screen text

Captions burned into Reels, price overlays, sticker text, hand-written notes — all extracted automatically and treated as part of the post's truth.

OCR layer
M:03

Visual description

What's actually in frame: the product, the room, the look, the food, the location. Replies stop guessing from captions and start describing what followers can see.

Vision model
M:04

Brand + product detection

Identifies products and brands in shot. Useful for affiliate creators, multi-product carousels, and unboxing videos where the caption can't list everything.

Catalog-aware
M:05

Content type tag

Tutorial, testimonial, product demo, lifestyle, behind-the-scenes — every post gets categorised so you can run different rules per post type.

Routing signal
M:06

Auto-analyse on upload

Flip on the automation toggle and every new post analyses itself the moment it appears. By the time the first comment lands, the AI already knows what your post is about.

Hands-off
The reply pipeline

Where media analysis plugs in.

1. Post arrives. Your Reel, photo, or carousel syncs into ReplyMagic within 15 minutes of posting (or instantly if you trigger a manual refresh).

2. Analysis runs. Gemini transcribes audio, OCRs on-screen text, describes the visual scene, identifies products and brands, and tags the content type.

3. Context stitches together. Your manual per-post context (price, link, offer rules) is combined with the analysis output to form a single source of truth for that post.

4. Comment lands. When a follower asks a question, the AI pulls the stitched context, applies your voice profile, and drafts a reply in their language.

5. Auto-send or queue. Trusted categories ship. Edge cases wait for you.

A real scenario

A Reel about shoes. Without media analysis.

You post a 22-second Reel of an outfit. The caption reads: "the only pair I'll wear all spring 🌸". No product names. No links.

By morning there are 47 comments asking "what shoes?" — and a keyword bot can't answer because the caption never said.

With media analysis, ReplyMagic has already identified the white leather sneakers, transcribed your line "I'm wearing the New Balance 530 in ivory," and pulled the affiliate link you saved against the post. Every reply names the shoe, links to your storefront, and sounds like you wrote it at 11pm.

FAQ

Common questions about media analysis.

Why does this matter for replies? +
Captions are unreliable. People comment about what they see, not what you wrote. If a follower asks 'what shoes are those?' on a Reel, ReplyMagic has already identified them visually — so the reply names the actual product instead of dodging.
How accurate is the transcription? +
We use Google's Gemini multimodal models. Transcription quality is comparable to YouTube auto-captions for clear English — and the system handles 100+ languages. For ambiguous audio you can edit the extracted context manually before the AI starts replying.
Does it work on photos and carousels too? +
Yes. Photos get visual descriptions and product detection. Carousels get analysed slide by slide so the per-post context covers every frame.
Do I need to enable analysis manually? +
No. The 'AI asset analysis' automation in your dashboard auto-runs analysis on every new post and flips AI replies on once it finishes. You can also run analysis manually on any older post.
What about privacy? +
Media is fetched from your Instagram account, processed by our analysis pipeline, and the resulting context is stored against your account. We don't use your media to train models. See our AI disclosure for full detail.

See what the AI sees in your posts.