Build a Ghibli-Style AI Video Generator with n8n + ComfyUI + Azure TTS

A complete guide to building a cinematic AI video workflow using n8n, FFmpeg, ComfyUI, Gemini, and Azure TTS—with a downloadable ready-to-use template.

阅读时长: 4 分钟
共 1636字
作者: eimoon.com

Environment Setup Guide: Installing FFmpeg & Configuring APIs

This post walks you through building a full AI-powered video generation pipeline using n8n, ComfyUI, FFmpeg, Google Gemini, and Azure TTS. The final result is a fully automated workflow that transforms a short story into an animated video in the style of Studio Ghibli—complete with generated voiceover and seamless YouTube publishing.

👉 Download the ready-to-use n8n workflow here: Get it on Gumroad

To ensure this workflow runs correctly, you need to make some key configurations to your n8n deployment environment. As the workflow relies on external services, please follow the steps below.

1. Install FFmpeg for n8n (via a Custom Docker Image)

The official n8n Docker image does not include FFmpeg by default. Therefore, we need to build a custom image to add it. This is a prerequisite for the Execute Command node to successfully run ffmpeg commands.

Step 1: Create a Dockerfile

In the same directory where your docker-compose.yml file is located, create a new file named Dockerfile and add the following content:

# Use a specific n8n version as the base image for stability
FROM n8nio/n8n:1.99.0

# Switch to the root user to get permissions to install software
USER root

# Use the apk package manager to install ffmpeg. The --no-cache option reduces the final image size
RUN apk add --no-cache ffmpeg

# Switch back to the non-privileged node user, which is recommended for running n8n securely
USER node

Description: This file defines how to install the FFmpeg package on top of a standard n8n image.

💡 If you’d prefer to skip setup and test a working pipeline first, you can download the full n8n workflow here.

Step 2: Modify the docker-compose.yml File

Next, edit your docker-compose.yml file to use the Dockerfile we just created to build the image, instead of pulling it directly from Docker Hub.

Modify the n8n service under services by adding the build directive:

services:
  n8n:
    # Use the build directive to build a custom image
    build:
      context: .  # context: . indicates the Dockerfile is in the current directory
      dockerfile: Dockerfile
    # container_name and image are optional but help with management
    container_name: n8n-custom-ffmpeg
    image: my-n8n-with-ffmpeg:latest
    restart: always
    ports:
      - "5678:5678"
    env_file:
      - .env
    volumes:
      - n8n_data:/home/node/.n8n
    # ... other configurations remain the same ...

volumes:
  n8n_data:

Description: When you run docker-compose up --build, Docker Compose will first build a new image named my-n8n-with-ffmpeg:latest based on the Dockerfile, and then start the n8n service using this new image that includes FFmpeg.

2. Configure the ComfyUI Server Address

Your n8n workflow needs to know the ComfyUI server address to send requests to it. Using environment variables is the recommended way to configure this, as it is more flexible and secure.

  1. Confirm docker-compose.yml Configuration: Ensure the n8n service section in your docker-compose.yml file includes the line env_file: - .env.

  2. Create or Edit the .env File: In the same directory as your docker-compose.yml, create a file named .env (if it doesn’t already exist) and add the following content, replacing the URL with your actual address:

# .env file

# Replace this URL with the access address of your ComfyUI instance
COMFYUI_BASE_URL_1=http://your-comfyui-server-ip:8188

# ... other environment variables ...

How it works: When n8n starts, it automatically loads all variables from the .env file. The expression {{ $env.COMFYUI_BASE_URL_1 }} in the workflow can then successfully read this address. Note for Local ComfyUI Users: If you are running ComfyUI on the same machine as your Docker-based n8n instance, you cannot use localhost or 127.0.0.1 directly. Instead, you may need to use a special Docker network address like host.docker.internal to allow the n8n container to reach the ComfyUI service running on the host machine. For example: COMFYUI_BASE_URL_1=http://host.docker.internal:8188.

3. Configure API Credentials

This workflow requires calls to several external service APIs. Please go to the Credentials section in your n8n instance to create and configure credentials for the following services.

Set up Google Gemini API Key

  • Purpose: Used for story generation, scene breakdown, and creating video prompts.
  • Action:
    1. In n8n, choose to create a new credential.
    2. Search for and select “Google Gemini (PaLM) API”.
    3. Enter your API key as prompted.
  • Reference Document: https://docs.n8n.io/integrations/builtin/credentials/google/

Set up YouTube API Credentials

  • Purpose: Used to automatically upload the final composite video to your YouTube channel.
  • Action:
    1. In n8n, choose to create a new credential.
    2. Search for and select “YouTube OAuth2 API”.
    3. This requires you to create a project in the Google Cloud Console, enable the YouTube Data API v3, and obtain an OAuth 2.0 Client ID and Secret.
  • Reference Document: https://docs.n8n.io/integrations/builtin/credentials/google/

Set up Azure TTS API Credentials

  • Purpose: Used to convert text narration and dialogue into high-quality speech.
  • Action:
    1. In n8n, choose to create a new credential.
    2. Search for and select “Microsoft Azure Speech API”.
    3. You will need to create a “Speech service” resource in the Azure portal to get the required Subscription Key and Region.
  • Reference Document: https://docs.n8n.io/integrations/builtin/credentials/microsoft-azure-speech/

🚀 Ready to Try It Instantly?

You can skip the setup and get started right away with a pre-built, battle-tested n8n workflow—including nodes for scene generation, voiceover synthesis, video merging, and upload logic.

👉 Download the full project on Gumroad

How to Use Your Own ComfyUI Workflow

The Set Workflow Payload node is the “blueprint” for video generation. You can replace the JSON content inside it with any workflow you create in ComfyUI. Follow these steps:

Step 1: Prepare and Export Your Workflow in ComfyUI

  1. Design and Debug: First, ensure your workflow runs correctly without errors in the ComfyUI interface. This is the most important step and will prevent many future issues.
  2. Export API Format: In the ComfyUI sidebar menu, click the “Save (API Format)” button. This will download a .json file containing the API call format for your workflow.
  3. Copy the Content: Open the downloaded .json file with a text editor (like VS Code or Notepad) and copy its entire content.

Step 2: Replace the Content in the n8n Node

  1. Find the Node: In n8n, open the Set Workflow Payload node.
  2. Clear and Paste: Delete all the existing JSON code in its JSON Output field, and then paste in the new code you just copied from the file.

Step 3: Reconnect the Dynamic Prompt (The Most Crucial Step!)

When you paste your own workflow, the original connection for the dynamic prompt will be broken. You need to reconnect it manually.

  1. Find Your “Positive Prompt” Node: In the new JSON code you just pasted, find the CLIPTextEncode node that handles the Positive Prompt. Its class_type should be CLIPTextEncode.
    • Hint: It usually connects to the positive input of a KSampler node.
  2. Modify the text Field: Inside this node, find the text field within the inputs object. It will likely be a fixed, hardcoded prompt, for example, "text": "a beautiful landscape".
  3. Replace it with an n8n Expression: Replace this hardcoded text exactly with the following n8n expression: {{ $json.output.video_prompt }}

Before:

"6": {
  "inputs": {
    "text": "a cat sitting on a bench, best quality", // <-- This is the fixed text you wrote in ComfyUI
    "clip": [ "38", 0 ]
  },
  "class_type": "CLIPTextEncode"
},

After:

"6": {
  "inputs": {
    "text": "{{ $json.output.video_prompt }}", // <-- Change it to this expression
    "clip": [ "38", 0 ] // Make sure the clip connection remains unchanged
  },
  "class_type": "CLIPTextEncode"
},

Please note that your node numbers (e.g., "6" or "38") may be different depending on your own workflow structure. The key is to find the correct CLIPTextEncode node and modify its text input.

After completing these three steps, your n8n workflow will now use your own ComfyUI design to generate videos.

4. Debugging & Optimization Tips

Handling Slow ComfyUI Generation

On machines with lower GPU performance, ComfyUI video generation can be very time-consuming, which can significantly impact the efficiency of debugging subsequent processes (like video/audio merging, uploading, etc.). Here are two methods to speed up the debugging process:

  • Method 1: Use a Fixed Test Video (Disabled by default) In the workflow, we have pre-configured two nodes for downloading landscape and portrait test videos. You can disable the ComfyUI generation flow and enable one of these download nodes to quickly get a video sample for testing.

  • Method 2: Generate Only the First Scene This is an excellent trade-off if you want to test the complete end-to-end flow (from AI generation to final upload) but don’t want to wait for all scenes to be generated.

    1. Find the Debug a data entry node: This is a Code node that is disabled by default.
    2. Activate this node: Right-click the node and select “Activate”.

    How it works: This node’s function is very simple: it takes the array of scenes generated by the AI and keeps only the first one. As a result, all subsequent processes (video generation, audio generation, merging, etc.) will only run once for this single scene. This drastically reduces the testing time for the entire workflow, allowing you to quickly verify the overall logic. Remember to disable this node again when you’re done debugging to generate the full video.

Time Calibration

In this workflow, the Extract Scenes & Dialogue node, in its system prompt, requests that the generated dialogue text corresponds to a reading time of approximately 8 seconds. Therefore, in your ComfyUI workflow settings, it is best for the generated video duration to match this (for example, by setting an appropriate number of frames) to avoid mismatches between video and audio length during the final merge.


After completing all the above configurations, restart your n8n container (docker-compose up -d --build), and your environment will be ready.

📬 关注我获取更多资讯

公众号
📢 公众号
个人号
💬 个人号
使用 Hugo 构建
主题 StackJimmy 设计