IntroducingWizzx
Get Free API
#1 Open-Source Image Model

Create Images with Z Image

Z Image is Alibaba's 6-billion parameter open-source image generator built on Single-Stream Diffusion Transformer (S3-DiT) architecture. Ranked #1 among open-source models on Artificial Analysis leaderboard with sub-second inference and bilingual text rendering.

About Z Image

What is Z Image?

Z Image (Zao Xiang) is an efficient 6-billion parameter image generation foundation model developed by Alibaba's Tongyi MAI team. Its Single-Stream Diffusion Transformer (S3-DiT) architecture concatenates text, visual semantic tokens, and image VAE tokens into a unified input stream for maximum parameter efficiency. The Z-Image-Turbo variant achieves sub-second inference with only 8 steps, and the model ranks 8th overall on the Artificial Analysis Text-to-Image Leaderboard — the highest of any open-source model.

5 Credits
Per generation
~8 seconds
Generation time
5 Ratios
Aspect ratios
PNG
Output format
Key Features

Why Choose Z Image?

Ultra-Efficient Architecture

6B parameters with S3-DiT architecture — 4x faster than Flux.1 and runs on consumer GPUs with as little as 16GB VRAM. Sub-second inference on enterprise hardware.

Bilingual Text Rendering

Best-in-class text rendering in both English and Chinese with 0.072 word error rate — outperforming Flux.2 Dev (0.143) and other competitors significantly.

#1 Open-Source Model

Ranked 8th overall on Artificial Analysis Text-to-Image Leaderboard — the highest-ranking open-source model, beating all other OSS alternatives.

Semantic Understanding

Uses structured reasoning chains to inject logic and common sense. Z Image transcends surface-level prompt descriptions to tap into world knowledge.

Rich Aesthetic Quality

Vibrant colors, fine textures, and photorealistic detail across diverse styles — portraits, landscapes, architecture, creative art, sci-fi, and more.

Apache 2.0 Open Source

Fully open-source under Apache 2.0 license with 10K+ GitHub stars. Supported by ComfyUI, DiffSynth, and 400+ community finetunes on Hugging Face.

Technical Specifications

Technical Specifications

Max Resolution1024px
Aspect Ratios1:1, 4:3, 3:4, 16:9, 9:16
Generation Speed~8s
Output FormatsPNG

Single pricing tier for this model.

How to Use

Create with Z Image in 3 Steps

1

Enter Your Prompt

Describe the image you want. Z Image excels at detailed descriptions of style, composition, lighting, and subject — in both English and Chinese.

2

Select Aspect Ratio

Choose from 5 aspect ratios: square (1:1), landscape (4:3, 16:9), or portrait (3:4, 9:16) to match your content format.

3

Generate & Download

Get your AI-generated image in about 8 seconds. Download in high-quality PNG format ready for immediate use.

Use Cases

What Can You Create with Z Image?

🎨

Photorealistic Portraits & Scenes

📱

Bilingual Poster & Graphic Design

🖼️

Creative Art & Illustrations

💼

Social Media & Marketing Content

Pricing

Z Image Pricing

5 credits per image generation. Top-ranking open-source quality at an affordable price.

5 credits per image

Ready to Create with Z Image?

Generate stunning AI images with Alibaba's #1 open-source model. Sub-second inference, bilingual text rendering, and exceptional quality.

Start Creating with Z Image

Z Image FAQ

Common questions about Z Image by Alibaba

Z Image (Zao Xiang) is a 6-billion parameter open-source image generation model by Alibaba's Tongyi MAI team. Built on S3-DiT architecture, it's ranked 8th overall on Artificial Analysis leaderboard — the #1 open-source image model — with sub-second inference and bilingual text rendering.

Z Image costs 5 credits per image generation. The model itself is open-source under Apache 2.0 license, and we provide optimized inference on our platform for fast, reliable generation.

Z Image is the #1 open-source image model on independent leaderboards. Key strengths include sub-second inference speed, best-in-class bilingual text rendering (English & Chinese), and a highly efficient 6B parameter architecture that delivers quality rivaling much larger models.

Yes, Z Image has the best bilingual text rendering of any image model — it natively renders both English and Chinese text within images with a word error rate of just 0.072, far ahead of competitors like Flux.2 Dev (0.143).

Z Image offers the best balance of quality, speed, and efficiency. It's 4x faster than Flux.1, runs on 16GB consumer GPUs, and ranks higher than all open-source alternatives. For photorealism, try Grok Imagine. For 4K resolution, choose Nano Banana Pro.

Z Image generates images in approximately 8 seconds on our platform. The Z-Image-Turbo variant achieves sub-second inference on enterprise GPUs with only 8 steps — making it one of the most efficient image models available.

Can't find what you're looking for? Contact our support team