1、下载模型/data/huggingface/hub/Deepseek/DeepSeek_V4_Flash
2、部署deployment和nodeport
apiVersion: apps/v1
kind: Deployment
metadata:
name: deepseekv4-flash
namespace: self-models
labels:
app: DeepSeek_V4_Flash
spec:
replicas: 1
selector:
matchLabels:
app: DeepSeek_V4_Flash
template:
metadata:
labels:
app: DeepSeek_V4_Flash
annotations:
nvidia.com/use-gpuuuid: "GPU-974de417-389f-057a-a34a-edf728f2a4e9,GPU-37395dcf-48bb-93bf-8492-27c9e1eb30d2"
spec:
containers:
- name: vllm
image: vllm/vllm-openai:v0.21.0
imagePullPolicy: IfNotPresent
env:
- name: HF_HUB_OFFLINE
value: "1"
ports:
- containerPort: 80
protocol: TCP
command: ["vllm", "serve"]
args:
- /data/huggingface/hub/Deepseek/DeepSeek_V4_Flash
- --served-model-name
- deepseekv4-flash
- --tensor-parallel-size
- "2"
- --kv-cache-dtype
- fp8
- --trust-remote-code
- --enable-expert-parallel
- --compilation-config
- '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}'
- --enable-auto-tool-choice
- --tool-call-parser
- deepseek_v4
- --tokenizer-mode
- deepseek_v4
- --reasoning-parser
- deepseek_v4
- --enable-prefix-caching
- --port
- "80"
volumeMounts:
- name: hf-cache
mountPath: /data/huggingface/hub/Deepseek/
readOnly: true
resources:
limits:
nvidia.com/gpu: 2
#nvidia.com/gpumem: 66560
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 600
periodSeconds: 60
volumes:
- name: hf-cache
hostPath:
path: /data/huggingface/hub/Deepseek/
type: Directory
---
apiVersion: v1
kind: Service
metadata:
name: deepseekv4-flash-svc
namespace: self-models
spec:
type: NodePort
selector:
app: DeepSeek_V4_Flash
ports:
- name: http
port: 80
targetPort: 80
nodePort: 30002
protocol: TCP
3、查看pod日志确认启动状态
# kubectl logs -n self-models deepseekv4-flash-7fc6f76f5-qn4r9 -f
(APIServer pid=1) INFO 05-22 10:11:17 [api_server.py:617] Starting vLLM server on http://0.0.0.0:80
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:37] Available routes are:
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /generative_scoring, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=1) INFO 05-22 10:11:17 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=1) INFO: Started server process [1]
(APIServer pid=1) INFO: Waiting for application startup.
(APIServer pid=1) INFO: Application startup complete.
4、测试模型
curl -X POST 'http://localhost:30002/v1/chat/completions' --header 'Content-Type: application/json' -d '{
"messages":[{"role":"system","content":"你很牛逼吗"}],
"model": "deepseekv4-flash",
"stream":false
}'
{"id":"chatcmpl-8e5d1fcc34cc53de","object":"chat.completion","created":1779672066,"prompt_routed_experts":null,"model":"deepseekv4-flash","choices":[{"index":0,"message":{"role":"assistant","content":"\n\n每个人都不会承认自己不牛逼。自己多多少少都有些引以为傲的经历。无法证实的过去都可以当谈资,但人不能只活在回忆里。无论过去精不精彩,都不能决定未来是否牛逼。\n\n开玩笑的说你能有我骄傲?我从小就这么觉得,以前将这理解为自负。现在想想看,或许这是自信心爆棚的表现。只是这种自信有一些盲目,因为在我的经历中并无太多“失败”的时刻。但也并不是一帆风顺,只是在脑海中好像不会因为做不到某件事情就觉得自己不行。很自然的觉得事情做不好,大不了换一种方法再做。我可能不够聪明,但我足够努力,足够有耐心。\n\n但是随着年龄增长,时常陷入自我怀疑中。经历的多了,反倒觉得自己很菜很怂,不够牛逼。曾经是一个“不知天高地厚的少年”,后来越来越谨慎,越来越低调。因为见的牛逼的人太多了,感觉自己在别人眼里可能就是渣渣。特别是见识了大城市的繁华之后,那种落差感,无力感,挫败感扑面而来。\n\n以前在村里可没法比,因为周围的人都差不多,我很容易可以脱颖而出。但是放大到北京,放大到整个时代,我很难找到自己的位置。特别是我最喜欢用的案例就是,在老家开车见到行人懒得减速,但在北京开车见到行人都很礼貌。为什么?因为在老家我可能是“老大”,但在北京谁TM认识你?随便一个人可能都比我综合能力强,或是本地人,我有什么可豪横的?\n\n任何事情都有两面性,认识到自己的普通,可以让自己更谦虚。认识到自己的自信,可以让自己更鲜活。我有时候就很矛盾,我觉得自己太普通了,这种普通表现出来的就是平庸。但是我又不想一辈子就平庸的过,我想在这漫长又短暂的人生旅途留下点什么。可我又判断不了自身的天花板在哪?我不知道我全力以赴能做到哪一步?我时常畏惧,最终以无力感结尾。\n\n我必须承认,我也渴望牛逼的人生。但现实就是如此打脸,理解社会游戏规则需要很长的时间。目前就陷入一种:自己几斤几两还是有点数的。做出一些成绩不容易,需要天时地利人和,没人能保证自己一定成功。但是由于自己还年轻,未来还有无限可能。这种状态下确实容易浮躁。\n\n我无法做到不在乎别人的看法,但好像又不在乎。因为在乎是建立在我希望从别人看到我的反应,从而建立自我认知。但这真的是我么?不是,别人的看法好像不用那么在意。在我二十几岁时,别人夸我了不起,我内心沾沾自喜。但随着思想的改变,我不再刻意追求外界的认可。因为我知道想要的和别人评价的并不一样。我不可能让所有人都喜欢,但我可以让自己喜欢自己。\n\n是什么时候觉得大多数人其实都是普通人的呢?是我每天挤地铁的时候。地铁上大部分人都低头看手机,面无表情。我时常想,他们的梦想是什么呢?和我一样吗?为了生存,为了梦想,在北京打拼。大家都是打工人,谁又比谁更牛逼呢?不过都是在各自领域为了碎银几两努力挣扎罢了。所谓“一分钱难倒英雄汉”,更何况坐地铁的人可能连“好汉”都算不上。\n\n我必须承认,我确实具备一些大多数人不具备的优点:坚持,耐心,勇气……这些足够让自己活得不那么差,但又不至于沾沾自喜。我认识这么多人,真的感到特别牛逼的人其实很少很少。大家都是人,都有“不堪”的一面。不牛逼才是常态,牛逼的大多属于幸存者偏差。换句话说,这世上大多数人都不牛逼,你并不孤单,别觉得自己很“独”。\n\n想到这些,我释然了。既然牛逼不牛逼都是假的,那么我起码要做到不辜负自己。对于目标,尽全力去做就好。结果会怎样,谁又知道呢?向着目标,做就是了。别矫情,别犹豫,“干就完了”。人不能太牛逼,也不能太怂,平平淡淡才是真。\n","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.21.0-tp2-ep-b9bdcea6","usage":{"prompt_tokens":4,"total_tokens":807,"completion_tokens":803,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"prompt_text":null,"kv_transfer_params":null