LLM

2025個人認為還做不到的事

從 MCE Error 到 IOMMU: 追查 Kernel panic 一年的真相 - Zen's Blog
在白馬的那一週 - 🍃 leafwind.tw
- 真人寫的文章和使用LLM生成類似模仿風格的文章，後者永遠無法取代前者，因為會變成欺騙，因為後者是算出來的並無實際的經歷

推特AI 取暖會

:star: - 「第一屆 AI 取暖會」講義文字稿 - ai agent 原理、應用與展望與挑戰
- 認識 AI 關鍵術語：Agent / Tool / Skill-黑暗執行緒
- 一行代码没写，用 AI 搓出三个实用 SKILLS - crossoverJie's Blog
「第二屆推特AI 取暖會」中講的題目
- 「第二屆 AI 取暖會」講義文字稿。
  - thinking modal出現後
    - 一般 sonnet 就夠，沒事別上 opus
      - 設定 effort 為 medium 就好
  - context engineering要處理的問題
    - attention drift, context rot
  - Agent 四大超能力
    - shell
    - file system
    - scripting
    - subagents
  - 許願 → 監工 → 驗收
- Agentic Engineering 不傳之秘 / Jeremy Lu / https://x.com/thecat88tw - YouTube

ihower

愛好 AI Engineer 電子報 🚀 新型態代理人 OpenClaw 正夯，電子報改版 #35 – ihower { blogging }
- 愛好 AI 工程 Blog: blog.aihao.tw/ (整個站都是AI生成)
  - 別再打造 Agent 了，打造 Skills 吧 — Anthropic 演講重點整理 | 愛好 AI 工程 Blog
:star:ihower-agents-202412 - ihower-agents-202412.pdf
- 如何設計自己的 AI Agent 框架 | 弦而時習之
- 小型代理人大革命：Smolagents、網頁爬蟲與DeepSeek V3的超強組合 | 技術視野洞察 - Dennis的專業視角
edd - ihower-edd-202409.pdf
實戰 AI Agents 應用開發: TTFT 和 Prompt Caching – ihower { blogging }
- ihower-ai-agents-webconf-2025
愛好 AI Engineer 電子報 🚀 AI 應用開發的常見錯誤 #22 – ihower { blogging }
愛好 AI Engineer 電子報 🚀 就是有深度 DeepSeek R1 和 OpenAI Deep Research #23 – ihower { blogging }
愛好 AI Engineer 電子報 🚀 恩尼格瑪評估 #24 – ihower { blogging }
- AI 大神免費教你生活用 AI，入門實例解析互動技巧、工具使用、檔案處理，帶你快速掌握LLM應用！！OpenAI 共同創辦人、特斯拉人工智慧總監 Andrej Karpathy - YouTube
  - Leaderboard
    - Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
    - SEAL LLM Leaderboards: Expert-Driven Private Evaluations
  - Tiktokenizer
    - Tiktokenizer
  - Don't fully trust it, 可能會有虛構的內容
  - model
    - 花錢的pre training
      - 特性: 不會有最新的資料，會停在某一個時間點
      - 一般模型
        
        寫詩、履歷、郵件
        
        knowledge-based query
        
        詢問旅遊建議
    - 打磨性格的post training
      - 加上強化式學習(reinforcement learning)的模型 => 思維模型(thinking model)
      - 使用時機為需要獲取品質較好的回答
  - tool use
    - internet search
      - 使用時機:
        
        替換掉google search的動作
        
        覺得可能是較新的資訊
        
        會隨著時間改變的資訊
    - deep research
      - multiple internet search + 思維模型(thinking model)
    - file upload
      - 使用時機: 一起讀文件、一起讀書
    - python interpreter
  - 使用場景
    - flash card
      - 一段文字，請LLM生成flash card
      - Claude Artifacts Showcase | Share Your AI Creations
    - 一段文字，請LLM生成diagram，視覺化
    - cursor
      - composer
  - 輸入/輸出
    - 文字
    - 音訊
      - 進階語音模式，即模型內部的真實音訊
    - 影片
    - 圖片

李宏毅

解剖小龍蝦 — 以 OpenClaw 為例介紹 AI Agent 的運作原理 - YouTube
- https://notebooklm.google.com/notebook/4299a7e4-8b2a-4c0d-88d9-d29b5922f1e0

Prompt

使用這些來源:https://www.cna.com.tw/, https://www.dcard.tw

產生和老婆聊天的10組有趣話題

實際動手好用案例

第一個過程中全部透過agent mode的案例
- Keycloak + LDAP
- Vaultwaden
NotebookLM
- 了解巴金森氏症

Agentic AI

LangChain/LangGraph

Deepagent

Deepagent CLI

curl -LsSf https://raw.githubusercontent.com/langchain-ai/deepagents/refs/heads/main/libs/cli/scripts/install.sh | bash

There is a OpenAI-Compatible API on http://10.184.28.123:8000 , follow the Configuration - Compatible APIs - Docs by LangChain

the model name can be checked by

curl http://10.184.28.123:8000/v1/models \
    -H "Content-Type: application/json" \
    | python3 -m json.tool

~/.deepagents/config.toml

[models]
recent = "openai:nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"

[models.providers.openai]
base_url = "http://10.184.28.123:8000/v1"
api_key_env = "EXAMPLE_API_KEY"
models = [
    "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
]

EXAMPLE_API_KEY=dummy deepagents

Claude Agent SDK

anthropics/claude-agent-sdk-python

OpenAI Agent SDK

openai/openai-agents-python: A lightweight, powerful framework for multi-agent workflows

NemoClaw

export NEMOCLAW_EXPERIMENTAL=1
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard

openshell term
nemoclaw my-assistant connect
openshell gateway destroy --name nemoclaw

/opt/nemoclaw-blueprint/policies/openclaw-sandbox.yaml

# ## # # # ## # # ## #  ver  fil                     lan   pro    net

href="#__codelineno-8-1"># SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0 > Default policy for the OpenClaw sandbox. Principle: deny by default, allow only what's needed for core functionality. Dynamic updates (network_policies, inference) can be applied post-creation via `openshell policy set`. Static fields are effectively creation-locked. > Policy tiers (future): default — this file. Minimum for onboard + basic agent operation. relaxed — adds third-party model providers, broader web access. span> To add endpoints: update this file and re-run `nemoclaw onboard` or apply dynamically via `openshell policy set`. sion: 1 esystem_policy: include_workdir: true read_only: - /usr - /lib - /proc - /dev/urandom - /app - /etc - /var/log - /sandbox/.openclaw # Immutable gateway config — prevents agent # from tampering with auth tokens or CORS. # Writable state (agents, plugins) lives in # /sandbox/.openclaw-data via symlinks. # Ref: https://github.com/NVIDIA/NemoClaw/issues/514 read_write: - /sandbox - /tmp - /dev/null - /sandbox/.openclaw-data # Writable agent/plugin state (symlinked from .openclaw) dlock: compatibility: best_effort cess: run_as_user: sandbox run_as_group: sandbox work_policies: claude_code: name: claude_code endpoints: - host: api.anthropic.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: "*", path: "/**" } - host: statsig.anthropic.com port: 443 rules: - allow: { method: "*", path: "/**" } - host: sentry.io port: 443 rules: - allow: { method: "*", path: "/**" } binaries: - { path: /usr/local/bin/claude } nvidia: name: nvidia endpoints: - host: integrate.api.nvidia.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: "*", path: "/**" } - host: inference-api.nvidia.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: "*", path: "/**" } binaries: - { path: /usr/local/bin/claude } - { path: /usr/local/bin/openclaw } github: name: github endpoints: - host: github.com port: 443 access: full - host: api.github.com port: 443 access: full binaries: - { path: /usr/bin/gh } - { path: /usr/bin/git } # ── OpenClaw "phone home" ──────────────────────────────────────────── # Minimum viable set for OpenClaw to authenticate, discover plugins, # and reach ClawHub. Binary-restricted to openclaw only. # Docs access is read-only (GET). ClawHub and openclaw.ai are # restricted to GET+POST (auth flows, plugin discovery). clawhub: name: clawhub endpoints: - host: clawhub.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } - allow: { method: POST, path: "/**" } binaries: - { path: /usr/local/bin/openclaw } openclaw_api: name: openclaw_api endpoints: - host: openclaw.ai port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } - allow: { method: POST, path: "/**" } binaries: - { path: /usr/local/bin/openclaw } openclaw_docs: name: openclaw_docs endpoints: - host: docs.openclaw.ai port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } binaries: - { path: /usr/local/bin/openclaw } # npm registry — needed for `openclaw plugins install` and `npm install` npm_registry: name: npm_registry endpoints: - host: registry.npmjs.org port: 443 access: full binaries: - { path: /usr/local/bin/openclaw } - { path: /usr/local/bin/npm } # ── Messaging — pre-allowed for OpenClaw agent notifications ──── # Restricted to node processes to prevent arbitrary data exfiltration # via curl, wget, python, etc. (See: #272) telegram: name: telegram endpoints: - host: api.telegram.org port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/bot*/**" } - allow: { method: POST, path: "/bot*/**" } binaries: - { path: /usr/local/bin/node } discord: name: discord endpoints: - host: discord.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } - allow: { method: POST, path: "/**" } - host: gateway.discord.gg port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } - allow: { method: POST, path: "/**" } - host: cdn.discordapp.com port: 443 protocol: rest enforcement: enforce tls: terminate rules: - allow: { method: GET, path: "/**" } binaries: - { path: /usr/local/bin/node }

OpenShell

NVIDIA/OpenShell: OpenShell is the safe, private runtime for autonomous AI agents.

MCP

機器學習

全民瘋AI系列 [經典機器學習]

Claude Code

應用

Computer Use

深夜重磅！Claude 推出 Computer Use：大模型學會使用電腦，開啟全新應用可能

Browser Use

Browser Use讓AI會用網站，4天開發就登上GitHub熱門榜｜Meet創業小聚

Cursor

Cursor Docs
Cursor Enterprise – Kickoff Guide
Cursor AI Tutorial for Beginners [2025 Edition] - YouTube
Best practices for coding with agents · Cursor
- agent harness
  - Instructions: The system prompt and rules that guide agent behavior
  - Tools: File editing, codebase search, terminal execution, and more
  - User messages: Your prompts and follow-ups that direct the work
- planning before coding
  - Using Plan Mode
    - Research your codebase to find relevant files
    - Ask clarifying questions about your requirements
    - Create a detailed implementation plan with file paths and code references
    - Wait for your approval before building
  - Not every task needs a detailed plan. For quick changes or tasks you've done many times before, jumping straight to the agent is fine.
- Let the agent find context
  - You don't need to manually tag every file in your prompt.
  - Keep it simple: if you know the exact file, tag it. If not, the agent will find it.
- When to start a new conversation
  - You're moving to a different task or feature
  - The agent seems confused or keeps making the same mistakes
  - You've finished one logical unit of work
  - Long conversations can cause the agent to lose focus.
- Reference past work
- Cursor provides two main ways to customize agent behavior
  - Rules for static context that applies to every conversation
    - Think of them as always-on context that the agent sees at the start of every conversation.
    - Create rules as markdown files in .cursor/rules/:
    - Start simple. Add rules only when you notice the agent making the same mistake repeatedly. Don't over-optimize before you understand your patterns.
  - Skills for dynamic capabilities the agent can use when relevant
    - extend what agents can do.
    - Skills package domain-specific knowledge, workflows, and scripts that agents can invoke when relevant.
    - Skills are defined in SKILL.md files
      - Custom commands: Reusable workflows triggered with / in the agent input
      - Hooks: Scripts that run before or after agent actions
      - Domain knowledge: Instructions for specific tasks the agent can pull in on demand
    - Unlike Rules which are always included, Skills are loaded dynamically when the agent decides they're relevant. This keeps your context window clean while giving the agent access to specialized capabilities.
    - Beyond coding, you can connect the agent to other tools you use daily. MCP (Model Context Protocol) lets the agent read Slack messages, investigate Datadog logs, debug errors from Sentry, query databases, and more.
- The agent can process images directly from your prompts.
- Common agent patterns
  - Test-driven development
    - The agent can write code, run tests, and iterate automatically
  - Codebase understanding
    - When onboarding to a new codebase, use the agent for learning and exploration. Ask the same questions you would ask a teammate
  - Git workflows
- Reviewing code
- Running agents in parallel
  - We've found that having multiple models attempt the same problem and picking the best result significantly improves the final output, especially for harder tasks.
  - Cursor automatically creates and manages git worktrees for parallel agents. Each agent runs in its own worktree with isolated files and changes, so agents can edit, build, and test code without stepping on each other.
  - A powerful pattern is running the same prompt across multiple models simultaneously.
- Delegating to cloud agents
- Debug Mode for tricky bugs
  - Instead of guessing at fixes, Debug Mode:
    - Generates multiple hypotheses about what could be wrong
    - Instruments your code with logging statements
    - Asks you to reproduce the bug while collecting runtime data
    - Analyzes actual behavior to pinpoint the root cause
    - Makes targeted fixes based on evidence
  - This works best for:
    - Bugs you can reproduce but can't figure out
    - Race conditions and timing issues
    - Performance problems and memory leaks
    - Regressions where something used to work
- effective way
  - write specific prompts
    - :warning: "add tests for auth.ts"
    - :+1: "Write a test case for auth.ts covering the logout edge case, using the patterns in tests/ and avoiding mocks."
  - Start simple
  - review carefully
  - provide verifiable goals
  - treat agents as capable collaborators
Reviewing Code with Cursor | Cursor Docs

agent cmd-K tab

cursor rule cursor command MCP

Gemini

Inference engine

vLLM

nemotron-3-nano-30b-a3b Model by NVIDIA | NVIDIA NIM

vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
  --max-num-seqs 8 \
  --tensor-parallel-size 1 \
  --max-model-len 262144 \
  --port 8000 \
  --trust-remote-code \
  --tool-call-parser qwen3_coder \
  --reasoning-parser-plugin nano_v3_reasoning_parser.py \
  --reasoning-parser nano_v3

https://github.com/NVIDIA/NemoClaw/issues/315#issuecomment-4090919603

vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nemotron_v3 \
  --max-model-len 32768 \
  --max-num-seqs 1 \
  --trust-remote-code \
  --gpu-memory-utilization 0.85 \
  --kv-cache-dtype fp8 \
  --host 0.0.0.0

Using Docker - vLLM
Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit

nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8

docker run -d --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nemotron_v3 \
  --max-model-len 32768 \
  --max-num-seqs 1 \
  --trust-remote-code \
  --gpu-memory-utilization 0.85 \
  --kv-cache-dtype fp8 \
  --host 0.0.0.0

nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

docker run -d --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nemotron_v3 \
  --max-model-len 32768 \
  --max-num-seqs 1 \
  --trust-remote-code \
  --gpu-memory-utilization 0.85 \
  --kv-cache-dtype fp8 \
  --host 0.0.0.0

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

docker run -d --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
  --max-num-seqs 8 \
  --tensor-parallel-size 1 \
  --max-model-len 262144 \
  --port 8000 \
  --trust-remote-code \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nemotron_v3 \
  --enable-auto-tool-choice \
  --host 0.0.0.0

validation

curl http://localhost:8000/v1/models \
    -H "Content-Type: application/json"

{
    "object": "list",
    "data": [
        {
            "id": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
            "object": "model",
            "created": 1774539878,
            "owned_by": "vllm",
            "root": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
            "parent": null,
            "max_model_len": 32768,
            "permission": [
                {
                    "id": "modelperm-bd89d55791b3a278",
                    "object": "model_permission",
                    "created": 1774539878,
                    "allow_create_engine": false,
                    "allow_sampling": true,
                    "allow_logprobs": true,
                    "allow_search_indices": false,
                    "allow_view": true,
                    "allow_fine_tuning": false,
                    "organization": "*",
                    "group": null,
                    "is_blocking": false
                }
            ]
        }
    ]
}

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
        "messages":[{"role": "user", "content": "Write a haiku about GPUs"}],
        "chat_template_kwargs": {"enable_thinking": true}
    }' | python3 -m json.tool

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8",
        "messages":[{"role": "user", "content": "expalin the message from vllm log: Engine 000: Avg prompt throughput: 2.3 tokens/s, Avg generation throughput: 49.4 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%"}],
        "chat_template_kwargs": {"enable_thinking": true}
    }' | python3 -m json.tool

Open WebUI - vLLM

docker run -d \
    --name open-webui \
    -p 3000:8080 \
    -v open-webui:/app/backend/data \
    -e OPENAI_API_BASE_URL=http://0.0.0.0:8000/v1 \
    --restart always \
    ghcr.io/open-webui/open-webui:main

services:
  vllm:
    image: vllm/vllm-openai:latest
    container_name: vllm
    restart: unless-stopped
    runtime: nvidia
    ipc: host
    ports:
      - "8000:8000"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    command: >
      --model nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8
      --enable-auto-tool-choice
      --tool-call-parser qwen3_coder
      --reasoning-parser nemotron_v3
      --max-model-len 32768
      --max-num-seqs 1
      --trust-remote-code
      --gpu-memory-utilization 0.85
      --kv-cache-dtype fp8
      --host 0.0.0.0

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "3000:8080"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      - OPENAI_API_BASE_URL=http://vllm:8000/v1
    depends_on:
      - vllm

volumes:
  open-webui-data: