V2EX › gcod 的所有回复 › 第 1 页 / 共 15 页

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

For Existing Member Sign In

1 2 3 4 5 6 7 8 9 10 ... 15

❮

❯

4 月 25 日

回复了 KaiWuBOSS 创建的主题 › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 4B 果然也不行...

PS C:\Windows\system32> irm https://raw.githubusercontent.com/val1813/kaiwu/main/install.ps1 | iex
Kaiwu Installer
===============

Detected: windows/amd64
Fetching latest release...
Latest version: v0.1.6
Downloading https://github.com/val1813/kaiwu/releases/download/v0.1.6/kaiwu-windows-amd64.zip...

Kaiwu installed successfully!

Kaiwu v0.1.6

Get started:
kaiwu run Qwen3-30B-A3B

Note: restart your terminal for PATH changes to take effect.
PS C:\Windows\system32> kaiwu run Qwen3-4B

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.6 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1660 Ti (SM75, 6144 MB VRAM, 288 GB/s)
RAM: 31 GB DDR4
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-4B (dense, 4B)
Quant: q5-k-m (2.8 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-4B-Q5_K_M.gguf [cached]

[4/6] Preflight check...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce GTX 1660 Ti: 6144 MB VRAM
模型 Qwen3-4B: ~2867 MB
KV cache (4K, q4_0): ~112 MB
预估总需: ~4003 MB

建议:
1. 运行 kaiwu run qwen3-4b --reset 重新探测参数
2. 模型较小但仍 OOM ，可能是参数配置问题，请升级到最新版本

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--reset 清除缓存，重新 warmup 探测最优参数

PS C:\Windows\system32> kaiwu run qwen3-4b --reset

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.6 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1660 Ti (SM75, 6144 MB VRAM, 288 GB/s)
RAM: 31 GB DDR4
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-4B (dense, 4B)
Quant: q5-k-m (2.8 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-4B-Q5_K_M.gguf [cached]

[4/6] Preflight check...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
已清除缓存，重新探测
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce GTX 1660 Ti: 6144 MB VRAM
模型 Qwen3-4B: ~2867 MB
KV cache (4K, q4_0): ~112 MB
预估总需: ~4003 MB

建议:
1. 运行 kaiwu run qwen3-4b --reset 重新探测参数
2. 模型较小但仍 OOM ，可能是参数配置问题，请升级到最新版本

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--reset 清除缓存，重新 warmup 探测最优参数

4 月 25 日

回复了 KaiWuBOSS 创建的主题 › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

PS C:\Windows\system32> kaiwu run Qwen3-1.7B --ctx-size 2048

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.4 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1660 Ti (SM75, 6144 MB VRAM, 0 GB/s)
RAM: 31 GB DDR4
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-1.7B (dense, 2B)
Quant: q5-k-m (1.2 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-1.7B-Q5_K_M.gguf [cached]

[4/6] Preflight check...
llama-server 不支持 iso3 (或首次 JIT 编译超时)，回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
用户指定 ctx=2048 ，跳过缓存
User override: ctx=2K ... ⚠️ Warmup failed: user-specified ctx=2K failed to start (OOM?)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce GTX 1660 Ti: 6144 MB VRAM
模型 Qwen3-1.7B: ~1228 MB
KV cache (4K, q4_0): ~112 MB
预估总需: ~2364 MB

建议:
1. 选择更小的量化 (Q2_K)
2. 选择更小的模型
3. 使用 MoE offload 模型（ experts 放 CPU RAM ）
Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--reset 清除缓存，重新 warmup 探测最优参数

7 年前的老机子了 1660 Ti😮‍💨

4 月 12 日

回复了 DopaminePlz 创建的主题 › 宽带症候群 › fast.com 居然比 https://10000.gd.cn/#/speed 还快

你的千兆不达标吧。。
[![18a09a9d53dd2a0542cf9b62c1dc27d5]( https://wx2.vv1234.cn/s1-pic/2026/04/a5f446ba5ba0c9c2ca51e8ec52fe9e77.jpg)]( https://wx2.vv1234.cn/s1-pic/2026/04/a5f446ba5ba0c9c2ca51e8ec52fe9e77.jpg)

2025 年 12 月 28 日

回复了 cxbdasheng 创建的主题 › NAS › D-NET 支持阿里云 ESA，实现 IPv6 免费加速方案（IPv4/IPv6 访问）

方案本质是通过跳过 DDNS 域名回源、直接更新 CDN 源站 IP 来减少解析延迟和故障点，同时巧妙利用 ESA 免费支持 IPv4/IPv6 双栈访问。把 DDNS 、IPv6 动态监测和 CDN 配置自动化打包成 D-NET 工具，确实大幅简化了运维，很有实用创意

2025 年 8 月 29 日

回复了 dsd2077 创建的主题 › 程序员 › 在生产环境服务器中使用 AI，你怎么看？

前提一定一定是你要知道你在做什么，而不是无脑的根据 AI 的答复来操作，有时候 AI 幻觉很致命.

2025 年 6 月 5 日

回复了 1145148964 创建的主题 › 问与答 › 想在河南办一个宽带。不确定办联通/电信。请教一下大家。

电信的话便宜点儿，租的话，千兆大概 360 一年~