LLM
- 生成式AI不能做到什麼與訓練資料的影響
- :star: A grounded take on agentic coding for production environments
- LLM 學習筆記 - 從 LLM 輸入問題,按下 Enter 後會發生什麼事? :: 2025 iThome 鐵人賽
- LLM 學習筆記 :: 2023 iThome 鐵人賽
- microsoft/generative-ai-for-beginners: 21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
- GitHub - microsoft/ai-agents-for-beginners: 10 Lessons to Get Started Building AI Agents
- RAG 和 Prompt 原理超簡單解說!想知道 AI 怎麼找答案看這篇
- GPT best practices - OpenAI API
- 對 AI 的粗淺認識與個人見解 | Thinking Today
- ChatGPT 跟 new Bing一些使用技巧與參考範例/ Twitter
- 嘗試透過 ChatGPT API 打造了一個「靈感夥伴」 - Pin 起來!
- Google Bard 開放支援 AI 中文問答,6種製作中文試算表與報告案例
- ChatGPT 與 Quiz Wizard 幫老師家長 AI 生成選擇題、抽認卡教學
- ChatGPT 提示語說明書:通用三層結構與 9 個技巧提高 AI 生產力
- 如何問出更好的問題?AI世代師生都必須面對的挑戰 | Gamma
- 快速輸入 ChatGPT 常用提示語,減少重複打字,附實戰範例下載
- 一般人最好上手且有效的 ChatGPT 提問法,以生成英文練習題為例
- 我与ChatGPT结对编程的体验 · BMPI
- 玩具烏托邦: ChatGPT: 正確與錯誤的使用示範
- 大語言模型 LLM 應用開發 投影片 - ihower { blogging }
- [ChatGPT] 使用 ChatGPT 建立一個中英單字翻譯器,同時提供音標與例句 | EPH 的程式日記
- 大砲打小鳥之 ChatGPT 翻譯鳳梨酥 LINE Bot-黑暗執行緒
- 使用 ChatGPT 學習 Git 版本控管 | The Will Will Web
- 介紹好用工具:CodeGPT (使用 GPT 自動化產生 Git 的 Commit Log 訊息) | The Will Will Web
- .NET Walker: 使用LINE Bot搭配OpenAI API建立出新一代的AI機器人
- How To Use Midjourney for Web Design
- 替你的應用程式加上智慧! 談 LLM 的應用程式開發 — 安德魯的部落格
- 生成式 AI 年會大禮包
- Perplexity AI 搜尋服務的開源替代品 — Perplexica - MyApollo
- 提示工程指南 | Prompt Engineering Guide
- Python 新手的 AI 之旅:從零開始打造屬於你的 AI / LLM 應用 :: 2024 iThome 鐵人賽
- 學習用生成式 AI 寫程式 — Coursera 課程心得 • 好豪筆記
- 無需程式碼!用 tldraw computer 畫張流程圖輕鬆打造 AI 自動化系統
- 好文推薦 — My LLM codegen workflow atm - MyApollo
- NotebookLM 教學,以知名財經 Podcast 為例(持續更新資料集)
- 陶哲軒對使用 AI 的看法 – Gea-Suan Lin's BLOG
- 當每個人都能做出 App 時,你要怎麼脫穎而出? | Alex Hsu 徐小翔
- AI 時代寫程式,你是在學習還是在偷懶? | 高見龍
- The Best Way to Use AI for Learning | by 詹雨安 Alan Chan | Heptabase | Sep, 2025 | Medium
- 周报 #102 - 我是如何使用 AI 的
- AI 编程真的靠谱吗? | 赵化冰的博客 | Zhaohuabing Blog
- OpenClaw 龍蝦佐本地 AI 模型:從評估到放棄?-黑暗執行緒
- AI 當道,為什麼我還是推薦寫部落格? | 是 Ray 不是 Array
- BBC 6 Minute English:提升英文聽力的好夥伴 - Code and Me
- 400小時Cursor經驗分享:AI輔助開發的終極指南 | 技術視野洞察 - Dennis的專業視角
- Shipping at Inference-Speed | Peter Steinberger
- Workflow - 2025/12
- Just Talk To It - the no-bs Way of Agentic Engineering | Peter Steinberger
- Workflow - 2025/10
- 2025 年度回顧:慢下來,才能更快 | Ernest Chiang
- 討論與AI的關係
- 过年了,聊聊AI和人文 - 铁蕾的个人博客
- Tactiq 線上會議轉錄工具:AI摘要、逐字稿生成、重點標示、螢幕截圖 - George的私房筆記
- 閒聊 - AI 讓 StackOverflow 熱度爆跌,技術部落格也要涼了嗎?-黑暗執行緒
- 周报 #102 - 我是如何使用 AI 的
- 【生成式AI導論 2024】 - YouTube
- 從 ClawdBot 到 OpenClaw:三個月「小龍蝦」重度玩家實測心得與省錢攻略
- 從 Agent 到 Agentic AI | 弦而時習之


2025個人認為還做不到的事
- 從 MCE Error 到 IOMMU: 追查 Kernel panic 一年的真相 - Zen's Blog
- 在白馬的那一週 - 🍃 leafwind.tw
- 真人寫的文章和使用LLM生成類似模仿風格的文章,後者永遠無法取代前者,因為會變成欺騙,因為後者是算出來的並無實際的經歷
推特AI 取暖會
- :star: - 「第一屆 AI 取暖會」講義文字稿 - ai agent 原理、應用與展望與挑戰
- 「第二屆推特AI 取暖會」中講的題目
- 「第二屆 AI 取暖會」講義文字稿。
- thinking modal出現後
- 一般 sonnet 就夠,沒事別上 opus
- 設定
effort為medium就好
- 設定
- 一般 sonnet 就夠,沒事別上 opus
- context engineering要處理的問題
- attention drift, context rot
- Agent 四大超能力
- shell
- file system
- scripting
- subagents
- 許願 → 監工 → 驗收
- thinking modal出現後
- Agentic Engineering 不傳之秘 / Jeremy Lu / https://x.com/thecat88tw - YouTube
- 「第二屆 AI 取暖會」講義文字稿。
ihower
- 愛好 AI Engineer 電子報 🚀 新型態代理人 OpenClaw 正夯,電子報改版 #35 – ihower { blogging }
- 愛好 AI 工程 Blog: blog.aihao.tw/ (整個站都是AI生成)
- :star:ihower-agents-202412 - ihower-agents-202412.pdf
- edd - ihower-edd-202409.pdf
- 實戰 AI Agents 應用開發: TTFT 和 Prompt Caching – ihower { blogging }
- 愛好 AI Engineer 電子報 🚀 AI 應用開發的常見錯誤 #22 – ihower { blogging }
- 愛好 AI Engineer 電子報 🚀 就是有深度 DeepSeek R1 和 OpenAI Deep Research #23 – ihower { blogging }
- 愛好 AI Engineer 電子報 🚀 恩尼格瑪評估 #24 – ihower { blogging }
- AI 大神免費教你生活用 AI,入門實例解析互動技巧、工具使用、檔案處理,帶你快速掌握LLM應用!!OpenAI 共同創辦人、特斯拉人工智慧總監 Andrej Karpathy - YouTube
- Leaderboard
- Tiktokenizer
- Don't fully trust it, 可能會有虛構的內容
- model
- 花錢的pre training
- 特性: 不會有最新的資料,會停在某一個時間點
- 一般模型
- 寫詩、履歷、郵件
- knowledge-based query
- 詢問旅遊建議
- 打磨性格的post training
- 加上強化式學習(reinforcement learning)的模型 => 思維模型(thinking model)
- 使用時機為需要獲取品質較好的回答
- 花錢的pre training
- tool use
- internet search
- 使用時機:
- 替換掉google search的動作
- 覺得可能是較新的資訊
- 會隨著時間改變的資訊
- 使用時機:
- deep research
- multiple internet search + 思維模型(thinking model)
- file upload
- 使用時機: 一起讀文件、一起讀書
- python interpreter
- internet search
- 使用場景
- flash card
- 一段文字,請LLM生成flash card
- Claude Artifacts Showcase | Share Your AI Creations
- 一段文字,請LLM生成diagram,視覺化
- cursor
- composer
- flash card
- 輸入/輸出
- 文字
- 音訊
- 進階語音模式,即模型內部的真實音訊
- 影片
- 圖片
- AI 大神免費教你生活用 AI,入門實例解析互動技巧、工具使用、檔案處理,帶你快速掌握LLM應用!!OpenAI 共同創辦人、特斯拉人工智慧總監 Andrej Karpathy - YouTube
李宏毅
- 解剖小龍蝦 — 以 OpenClaw 為例介紹 AI Agent 的運作原理 - YouTube
- https://notebooklm.google.com/notebook/4299a7e4-8b2a-4c0d-88d9-d29b5922f1e0
Prompt
實際動手好用案例
- 第一個過程中全部透過agent mode的案例
- Keycloak + LDAP
- Vaultwaden
- NotebookLM
- 了解巴金森氏症
Agentic AI
- Linux Foundation Announces the Formation of the Agentic AI Foundation (AAIF), Anchored by New Project Contributions Including Model Context Protocol (MCP), goose and AGENTS.md - Agentic AI Foundation (AAIF)
- AFK-surf/open-agent: Open-source alternative to Claude Agent SDK, ChatGPT Agents, and Manus.
- Build agentic AI with open source - YouTube - Redhat
- LlamaStack
- :star:30-30: Junior AI Application Engineer 的學習指南 - 透過實作 AI 學習工具人 - iT 邦幫忙
- 从 0 到 1 复刻一个 Claude Code 这样的 Agent | plantegg
- crewai
- agentic workflows
- What are Agentic Workflows? | IBM
- Andrew Ng – The Rise of Agentic Workflows in AI - YouTube
- Harness engineering: leveraging Codex in an agent-first world | OpenAI
- Under the hood: Security architecture of GitHub Agentic Workflows - The GitHub Blog
- How They Work | GitHub Agentic Workflows
- 從 Claude Code 轉移到 OpenCode:一個開發者的真實體驗 - HackMD
- NeMo Agent Toolkit | NVIDIA Developer
- Agentic AI Conference Sessions | NVIDIA GTC 2026
LangChain/LangGraph
- 30-3: [知識] LangChain 的好朋友之 LangGraph - 可以做到 Agentic AI 的關鍵 - iT 邦幫忙
- 使用 LangChain & LangGraph 打造自己的 AI 助理
- LangChain Framework 讓 LLM 快速落地的好幫手
- Deep Agents overview - Docs by LangChain
Deepagent
Deepagent CLI
curl -LsSf https://raw.githubusercontent.com/langchain-ai/deepagents/refs/heads/main/libs/cli/scripts/install.sh | bash
There is a OpenAI-Compatible API on http://10.184.28.123:8000 , follow the Configuration - Compatible APIs - Docs by LangChain
the model name can be checked by
curl http://10.184.28.123:8000/v1/models \
-H "Content-Type: application/json" \
| python3 -m json.tool
~/.deepagents/config.toml
[models]
recent = "openai:nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
[models.providers.openai]
base_url = "http://10.184.28.123:8000/v1"
api_key_env = "EXAMPLE_API_KEY"
models = [
"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
]
Claude Agent SDK
OpenAI Agent SDK
NemoClaw
- Safer AI Agents & Assistants with OpenClaw | NVIDIA NemoClaw
- NVIDIA/NemoClaw: NVIDIA plugin for secure installation of OpenClaw
- NVIDIA NemoClaw — NVIDIA NemoClaw Developer Guide
/opt/nemoclaw-blueprint/policies/openclaw-sandbox.yaml
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Default policy for the OpenClaw sandbox.
# Principle: deny by default, allow only what's needed for core functionality.
# Dynamic updates (network_policies, inference) can be applied post-creation
# via `openshell policy set`. Static fields are effectively creation-locked.
#
# Policy tiers (future):
# default — this file. Minimum for onboard + basic agent operation.
# relaxed — adds third-party model providers, broader web access.
#
# To add endpoints: update this file and re-run `nemoclaw onboard`
# or apply dynamically via `openshell policy set`.
version: 1
filesystem_policy:
include_workdir: true
read_only:
- /usr
- /lib
- /proc
- /dev/urandom
- /app
- /etc
- /var/log
- /sandbox/.openclaw # Immutable gateway config — prevents agent
# from tampering with auth tokens or CORS.
# Writable state (agents, plugins) lives in
# /sandbox/.openclaw-data via symlinks.
# Ref: https://github.com/NVIDIA/NemoClaw/issues/514
read_write:
- /sandbox
- /tmp
- /dev/null
- /sandbox/.openclaw-data # Writable agent/plugin state (symlinked from .openclaw)
landlock:
compatibility: best_effort
process:
run_as_user: sandbox
run_as_group: sandbox
network_policies:
claude_code:
name: claude_code
endpoints:
- host: api.anthropic.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: "*", path: "/**" }
- host: statsig.anthropic.com
port: 443
rules:
- allow: { method: "*", path: "/**" }
- host: sentry.io
port: 443
rules:
- allow: { method: "*", path: "/**" }
binaries:
- { path: /usr/local/bin/claude }
nvidia:
name: nvidia
endpoints:
- host: integrate.api.nvidia.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: "*", path: "/**" }
- host: inference-api.nvidia.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: "*", path: "/**" }
binaries:
- { path: /usr/local/bin/claude }
- { path: /usr/local/bin/openclaw }
github:
name: github
endpoints:
- host: github.com
port: 443
access: full
- host: api.github.com
port: 443
access: full
binaries:
- { path: /usr/bin/gh }
- { path: /usr/bin/git }
# ── OpenClaw "phone home" ────────────────────────────────────────────
# Minimum viable set for OpenClaw to authenticate, discover plugins,
# and reach ClawHub. Binary-restricted to openclaw only.
# Docs access is read-only (GET). ClawHub and openclaw.ai are
# restricted to GET+POST (auth flows, plugin discovery).
clawhub:
name: clawhub
endpoints:
- host: clawhub.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
- allow: { method: POST, path: "/**" }
binaries:
- { path: /usr/local/bin/openclaw }
openclaw_api:
name: openclaw_api
endpoints:
- host: openclaw.ai
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
- allow: { method: POST, path: "/**" }
binaries:
- { path: /usr/local/bin/openclaw }
openclaw_docs:
name: openclaw_docs
endpoints:
- host: docs.openclaw.ai
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
binaries:
- { path: /usr/local/bin/openclaw }
# npm registry — needed for `openclaw plugins install` and `npm install`
npm_registry:
name: npm_registry
endpoints:
- host: registry.npmjs.org
port: 443
access: full
binaries:
- { path: /usr/local/bin/openclaw }
- { path: /usr/local/bin/npm }
# ── Messaging — pre-allowed for OpenClaw agent notifications ────
# Restricted to node processes to prevent arbitrary data exfiltration
# via curl, wget, python, etc. (See: #272)
telegram:
name: telegram
endpoints:
- host: api.telegram.org
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/bot*/**" }
- allow: { method: POST, path: "/bot*/**" }
binaries:
- { path: /usr/local/bin/node }
discord:
name: discord
endpoints:
- host: discord.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
- allow: { method: POST, path: "/**" }
- host: gateway.discord.gg
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
- allow: { method: POST, path: "/**" }
- host: cdn.discordapp.com
port: 443
protocol: rest
enforcement: enforce
tls: terminate
rules:
- allow: { method: GET, path: "/**" }
binaries:
- { path: /usr/local/bin/node }
OpenShell
MCP
- 從開發者角度理解 Model Context Protocol (MCP) - MyApollo
- 說明MCP
- MCP 是什麼?可以吃嗎?
- Cloudflare推出新MCP伺服器,僅兩介面簡化全套Cloudflare API存取 | iThome
機器學習
Claude Code
- Anthropic
- Day 17 - AI 工具實戰:Claude Code 從零到上手 - iT 邦幫忙
- Claude Code 發佈 Command Line 的新工具 | Darrell TW
- 使用 CluadeCode 的經驗和心得
- 適用情境1: 熟悉程式碼repository
- 適用情境2: 初始化測試環境,完善測試程式
- 一个半月高强度 Claude Code 使用后感受 | OneV's Den
- [實作筆記] AI Agent 實作 Web Search API:從設計到部署的完整記錄 | Marsen's Blog
- Claude Code 從 0 到 1:完整實戰指南 | Cash Wu Geek
- Claude Code 完整實戰手冊
- 我與 Claude Code 協作時怎麼寫 prompt ? (2025 下半) · YWC 科技筆記
- Claude Code 創辦人 Boris 親授分享的 13 條核心技巧(補充個人簡短評註) | Kenmingの鮮思維
- Claude Code Skills:讓 AI 變身專業工匠 | 高見龍
- 想用 Claude Code 開發?這篇帶你從入門到進階 | 是 Ray 不是 Array
- 用 superpowers 與 spec-kit 打造 AI 輔助開發流程 - 實戰心得與效率觀察 | 忍者工坊
- 全程用 Claude Code 搓了一个 macOS 原生应用:SkillDeck - crossoverJie's Blog
- 從 Claude Code 轉移到 OpenCode:一個開發者的真實體驗 - HackMD
- Claude Cowork Dispatch 與 Remote Control - 用手機遠端控制你的 AI Agent | 是 Ray 不是 Array
- CLAUDE.md 與 Rules for AI 撰寫技巧:讓 AI 更懂你的專案 | 是 Ray 不是 Array
- Agent Skills 需要多少內容? | 弦而時習之
應用
Computer Use
Browser Use
Cursor
- Cursor Docs
- Cursor Enterprise – Kickoff Guide
- Cursor AI Tutorial for Beginners [2025 Edition] - YouTube
- Best practices for coding with agents · Cursor
- agent harness
- Instructions: The system prompt and rules that guide agent behavior
- Tools: File editing, codebase search, terminal execution, and more
- User messages: Your prompts and follow-ups that direct the work
- planning before coding
- Using Plan Mode
- Research your codebase to find relevant files
- Ask clarifying questions about your requirements
- Create a detailed implementation plan with file paths and code references
- Wait for your approval before building
- Not every task needs a detailed plan. For quick changes or tasks you've done many times before, jumping straight to the agent is fine.
- Using Plan Mode
- Let the agent find context
- You don't need to manually tag every file in your prompt.
- Keep it simple: if you know the exact file, tag it. If not, the agent will find it.
- When to start a new conversation
- You're moving to a different task or feature
- The agent seems confused or keeps making the same mistakes
- You've finished one logical unit of work
- Long conversations can cause the agent to lose focus.
- Reference past work
- Cursor provides two main ways to customize agent behavior
- Rules for static context that applies to every conversation
- Think of them as always-on context that the agent sees at the start of every conversation.
- Create rules as markdown files in .cursor/rules/:
- Start simple. Add rules only when you notice the agent making the same mistake repeatedly. Don't over-optimize before you understand your patterns.
- Skills for dynamic capabilities the agent can use when relevant
- extend what agents can do.
- Skills package domain-specific knowledge, workflows, and scripts that agents can invoke when relevant.
- Skills are defined in SKILL.md files
- Custom commands: Reusable workflows triggered with / in the agent input
- Hooks: Scripts that run before or after agent actions
- Domain knowledge: Instructions for specific tasks the agent can pull in on demand
- Unlike Rules which are always included, Skills are loaded dynamically when the agent decides they're relevant. This keeps your context window clean while giving the agent access to specialized capabilities.
- Beyond coding, you can connect the agent to other tools you use daily. MCP (Model Context Protocol) lets the agent read Slack messages, investigate Datadog logs, debug errors from Sentry, query databases, and more.
- Rules for static context that applies to every conversation
- The agent can process images directly from your prompts.
- Common agent patterns
- Test-driven development
- The agent can write code, run tests, and iterate automatically
- Codebase understanding
- When onboarding to a new codebase, use the agent for learning and exploration. Ask the same questions you would ask a teammate
- Git workflows
- Test-driven development
- Reviewing code
- Running agents in parallel
- We've found that having multiple models attempt the same problem and picking the best result significantly improves the final output, especially for harder tasks.
- Cursor automatically creates and manages git worktrees for parallel agents. Each agent runs in its own worktree with isolated files and changes, so agents can edit, build, and test code without stepping on each other.
- A powerful pattern is running the same prompt across multiple models simultaneously.
- Delegating to cloud agents
- Debug Mode for tricky bugs
- Instead of guessing at fixes, Debug Mode:
- Generates multiple hypotheses about what could be wrong
- Instruments your code with logging statements
- Asks you to reproduce the bug while collecting runtime data
- Analyzes actual behavior to pinpoint the root cause
- Makes targeted fixes based on evidence
- This works best for:
- Bugs you can reproduce but can't figure out
- Race conditions and timing issues
- Performance problems and memory leaks
- Regressions where something used to work
- Instead of guessing at fixes, Debug Mode:
- effective way
- write specific prompts
- :warning: "add tests for auth.ts"
- :+1: "Write a test case for auth.ts covering the logout edge case, using the patterns in tests/ and avoiding mocks."
- Start simple
- review carefully
- provide verifiable goals
- treat agents as capable collaborators
- write specific prompts
- agent harness
- Reviewing Code with Cursor | Cursor Docs
agent cmd-K tab
cursor rule cursor command MCP
Gemini
Inference engine
vLLM
-
https://github.com/NVIDIA/NemoClaw/issues/315#issuecomment-4090919603
- Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit
nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8
docker run -d --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--reasoning-parser nemotron_v3 \
--max-model-len 32768 \
--max-num-seqs 1 \
--trust-remote-code \
--gpu-memory-utilization 0.85 \
--kv-cache-dtype fp8 \
--host 0.0.0.0
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
docker run -d --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--reasoning-parser nemotron_v3 \
--max-model-len 32768 \
--max-num-seqs 1 \
--trust-remote-code \
--gpu-memory-utilization 0.85 \
--kv-cache-dtype fp8 \
--host 0.0.0.0
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
docker run -d --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
--max-num-seqs 8 \
--tensor-parallel-size 1 \
--max-model-len 262144 \
--port 8000 \
--trust-remote-code \
--tool-call-parser qwen3_coder \
--reasoning-parser nemotron_v3 \
--enable-auto-tool-choice \
--host 0.0.0.0
validation
{
"object": "list",
"data": [
{
"id": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
"object": "model",
"created": 1774539878,
"owned_by": "vllm",
"root": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
"parent": null,
"max_model_len": 32768,
"permission": [
{
"id": "modelperm-bd89d55791b3a278",
"object": "model_permission",
"created": 1774539878,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
"messages":[{"role": "user", "content": "Write a haiku about GPUs"}],
"chat_template_kwargs": {"enable_thinking": true}
}' | python3 -m json.tool
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
"messages":[{"role": "user", "content": "expalin the message from vllm log: Engine 000: Avg prompt throughput: 2.3 tokens/s, Avg generation throughput: 49.4 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%"}],
"chat_template_kwargs": {"enable_thinking": true}
}' | python3 -m json.tool
services:
vllm:
image: vllm/vllm-openai:latest
container_name: vllm
restart: unless-stopped
runtime: nvidia
ipc: host
ports:
- "8000:8000"
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command: >
--model nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--reasoning-parser nemotron_v3
--max-model-len 32768
--max-num-seqs 1
--trust-remote-code
--gpu-memory-utilization 0.85
--kv-cache-dtype fp8
--host 0.0.0.0
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "3000:8080"
volumes:
- open-webui-data:/app/backend/data
environment:
- OPENAI_API_BASE_URL=http://vllm:8000/v1
depends_on:
- vllm
volumes:
open-webui-data: