繁中

TOPIC DETAIL

AI Image Generation Workflow

整合 Claude 的控制能力與 GPT-Image 2.0 / Google AI 的生成品質,打造一個兼具成本效益與高品質輸出的全方位圖像生成及應用流程。

核心觀念

  • 引擎與駕駛分離:將 Claude 作為控制前端(駕駛),調用 GPT-Image 2.0 API 或 Google AI 作為生成後端(引擎),實現強強聯手。
  • 成本效益:使用 OpenAI API 生圖,每張成本遠低於直接訂閱 ChatGPT Plus,實現經濟高效的圖像生成。
  • 多模態整合:生圖不是終點,而是其他技能的「燃料」。生成的圖片可無縫整合到考卷、簡報、遊戲素材、AI 助教等應用中。
  • 風格一致性:透過生成「角色設定圖」或定義「視覺化標準」,可確保 AI 在進行系列創作(如漫畫、品牌簡報)時,維持人物與風格的統一。

整合工作流

  1. 安裝生圖引擎
    • 在 Claude Code 中,使用懶人包安裝「Draw」生圖技能,並導入你的 OpenAI API Key(需先儲值並關閉自動加值)。
    • 這會將 GPT-Image 2.0 引擎裝進 Claude。
  2. 基礎生圖與改圖
    • 在 Claude 中直接下指令「畫一張…」,即可調用 Image 2 引擎生成圖片。
    • 對於生成後的圖片,可直接追問「把…改成…」進行簡單修改。
    • 若需精準修改,可下載圖片後再上傳,用畫筆圈選區域並描述修改需求。
  3. 教學應用整合
    • 考卷/學習單:請 AI 根據題目情境生成插圖,並直接生成包含圖片的 Word 檔案。
    • 簡報:生成風格一致的簡報背景圖或內容插圖,再透過工具(如 Nano Banana Pro)將文字疊加,兼顧美觀與可編輯性。
  4. 進階視覺化
    • 資訊圖表:在 Google Gemini 或 NotebookLM 中,使用「思考型」模型,將複雜文字轉換為帶有繁體中文的資訊圖表。
    • 網站美化:利用 Claude Design 上傳現有網頁截圖或網址,AI 會分析並生成優化後的視覺設計,再將指令導出至 Claude Code 實現。

最佳金句

「生圖不是終點,是其他技能的燃料。」 「又便宜又好用平均生成一張圖片差不多是台幣 0.3 元三張是台幣一塊錢。」 「從現在開始你所有的技能都可以包含圖片。」

教學切入建議

此主題適合已掌握基本 AI 對話能力的學員。教學可從「如何讓 AI 幫你的文件加上插圖」開始,先展示單純的生圖指令。接著進入核心,教學員如何申請 OpenAI API Key 並整合進 Claude,體會到「低成本高品質」的優勢。最後,透過教學簡報、遊戲卡牌等實際案例,讓學員練習將生圖能力應用到自己的專業領域中。

常見誤區

  • API Key 安全問題:忘記關閉 OpenAI 的「自動儲值」功能,或在前端程式碼中洩漏 API Key,導致被盜用。
  • 模型選擇錯誤:在 Google Gemini 中生成中文資訊圖表時,未使用「思考型」模型,導致文字出現亂碼或錯誤。
  • 無法維持風格一致:在創作系列圖片時,沒有先生成並使用「角色設定圖」作為參考,導致每張圖的人物或風格都不同。
  • 過度追求完美提示詞:花費大量時間在微調單一生圖提示詞,而忽略了「多次生成、後製修改」的組合拳可能更有效率。

---en---

Core Concepts

  • Engine and Driver Separation: Use Claude as the control frontend (driver) and call GPT-Image 2.0 API or Google AI as the generation backend (engine) to achieve a powerful synergy.
  • Cost-Effectiveness: Generating images using the OpenAI API is significantly cheaper per image than directly subscribing to ChatGPT Plus, enabling cost-efficient image generation.
  • Multimodal Integration: Image generation is not the endpoint; it serves as “fuel” for other skills. Generated images can be seamlessly integrated into applications like exam papers, presentations, game assets, and AI teaching assistants.
  • Style Consistency: By generating “character reference sheets” or defining “visual standards,” AI can maintain consistent character appearances and styles across serial creations (e.g., comics, branded presentations).

Integrated Workflow

  1. Install Image Generation Engine:
    • In Claude Code, use a lazy pack to install the “Draw” image generation skill and import your OpenAI API Key (ensure you’ve topped up and disabled auto-recharge).
    • This integrates the GPT-Image 2.0 engine into Claude.
  2. Basic Image Generation and Modification:
    • Simply issue commands like “draw a…” in Claude to invoke the Image 2 engine for image generation.
    • For generated images, follow up with requests like “change… to…” for simple modifications.
    • For precise modifications, download the image, re-upload it, use a brush to select the area, and describe the desired changes.
  3. Educational Application Integration:
    • Exam Papers/Worksheets: Ask AI to generate illustrative images based on scenario descriptions and produce Word documents directly containing these images.
    • Presentations: Generate visually consistent presentation backgrounds or content illustrations, then overlay text using tools (e.g., Nano Banana Pro) to balance aesthetics with editability.
  4. Advanced Visualization:
    • Infographics: In Google Gemini or NotebookLM, use the “Thoughtful” model to convert complex text into infographics with Traditional Chinese text.
    • Website Beautification: Upload existing webpage screenshots or URLs to Claude Design. AI will analyze and generate optimized visual designs, then export the instructions to Claude Code for implementation.

Best Quotes

“Image generation is not the end, it’s fuel for other skills.” “It’s cheap and effective; generating an image costs about NT$0.3, three images are NT$1.” “From now on, all your skills can include images.”

Teaching Entry Points

This topic is suitable for students who have mastered basic AI conversational abilities. Teaching can begin with “how AI can add illustrations to your documents,” demonstrating simple image generation commands. Then, delve into the core: teaching students how to apply for an OpenAI API Key and integrate it into Claude, experiencing the advantage of “low cost and high quality.” Finally, through practical examples like educational presentations and game cards, students can practice applying image generation capabilities to their professional fields.

Common Pitfalls

  • API Key Security Issues: Forgetting to disable OpenAI’s “auto-recharge” feature or exposing the API Key in frontend code, leading to potential misuse and significant costs.
  • Incorrect Model Selection: When generating Chinese infographics in Google Gemini, not using the “Thoughtful” model can result in garbled or incorrect text.
  • Inability to Maintain Style Consistency: When creating a series of images, not generating and using a “character reference sheet” as a guide can lead to inconsistent character appearances or styles across different images.
  • Over-reliance on Perfect Prompts: Spending excessive time fine-tuning a single image generation prompt, overlooking that a combination of “multiple generations and post-editing” might be more efficient.

Key Concepts

  • 引擎與駕駛分離
  • 成本效益
  • 多模態整合
  • 風格一致性
  • AI 圖像生成
  • OpenAI API
  • 低成本高品質
  • 教學簡報
  • 遊戲卡牌
  • API Key 安全
  • 模型選擇
  • 風格一致
  • 提示詞優化

Related Episodes