English 简体中文 繁體中文 한국 사람 日本語 Deutsch русский بالعربية TÜRKÇE português คนไทย french
查看: 3|回复: 0

bin格式转safetensors

[复制链接]
查看: 3|回复: 0

bin格式转safetensors

[复制链接]
查看: 3|回复: 0

239

主题

0

回帖

727

积分

高级会员

积分
727
Y6lAWrtvdDmU

239

主题

0

回帖

727

积分

高级会员

积分
727
2025-2-24 16:41:51 | 显示全部楼层 |阅读模式
技术背景

本文主要介绍在Hugging Face上把bin格式的模型文件转为safetensors格式的模型文件,并下载到本地的方法。
bin转safetensors

首先安装safetensors:
$ python3 -m pip install safetensors --upgrade然后把Github的safetensors仓库克隆下来:
$ git clone https://github.com/huggingface/safetensors.git正克隆到 'safetensors'...remote: Enumerating objects: 4812, done.remote: Counting objects: 100% (1486/1486), done.remote: Compressing objects: 100% (406/406), done.remote: Total 4812 (delta 1340), reused 1082 (delta 1079), pack-reused 3326 (from 2)接收对象中: 100% (4812/4812), 1.15 MiB | 1.22 MiB/s, 完成.处理 delta 中: 100% (2457/2457), 完成.进入子目录:
$ cd safetensors/bindings/python/$ ll总用量 84drwxrwxr-x 6 dechin dechin  4096 2月  21 16:37 ./drwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 ../drwxrwxr-x 2 dechin dechin  4096 2月  21 16:37 benches/-rw-rw-r-- 1 dechin dechin   476 2月  21 16:37 Cargo.toml-rw-rw-r-- 1 dechin dechin  1454 2月  21 16:37 convert_all.py-rw-rw-r-- 1 dechin dechin 14769 2月  21 16:37 convert.py-rw-rw-r-- 1 dechin dechin   729 2月  21 16:37 fuzz.py-rw-rw-r-- 1 dechin dechin   685 2月  21 16:37 .gitignore-rw-rw-r-- 1 dechin dechin  1103 2月  21 16:37 Makefile-rw-rw-r-- 1 dechin dechin   190 2月  21 16:37 MANIFEST.in-rw-rw-r-- 1 dechin dechin  2419 2月  21 16:37 pyproject.tomldrwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 py_src/-rw-rw-r-- 1 dechin dechin   852 2月  21 16:37 README.md-rw-rw-r-- 1 dechin dechin   891 2月  21 16:37 setup.cfgdrwxrwxr-x 2 dechin dechin  4096 2月  21 16:37 src/-rw-rw-r-- 1 dechin dechin  5612 2月  21 16:37 stub.pydrwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 tests/其中有一个convert.py的格式转换脚本。查看用法:
$ python3 convert.py --helpusage: convert.py [-h] [--revision REVISION] [--force] [-y] model_idSimple utility tool to convert automatically some weights on the hub to `safetensors` format. It is PyTorchexclusive for now. It works by downloading the weights (PT), converting them locally, and uploading them back as aPR on the hub.positional arguments:  model_id             The name of the model on the hub to convert. E.g. `gpt2` or `facebook/wav2vec2-base-960h`options:  -h, --help           show this help message and exit  --revision REVISION  The revision to convert  --force              Create the PR even if it already exists of if the model was already converted.  -y                   Ignore safety prompt这个脚本可以将指定路径的模型文件转为safetensors模型,但是如果直接运行会报错:
$ python3 convert.py --force -y Salesforce/blip-image-captioning-baseconfig.json: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 15.3MB/s]pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:06<00:00, 7.82MB/s]Traceback (most recent call last):  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status    response.raise_for_status()  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status    raise HTTPError(http_error_msg, response=self)requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1The above exception was the direct cause of the following exception:Traceback (most recent call last):  File "/datb/DeepSeek/safetensors/bindings/python/convert.py", line 369, in <module>    commit_info, errors = convert(api, model_id, revision=args.revision, force=args.force)  File "/datb/DeepSeek/safetensors/bindings/python/convert.py", line 313, in convert    new_pr = api.create_commit(  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn    return fn(*args, **kwargs)  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1524, in _inner    return fn(self, *args, **kwargs)  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3961, in create_commit    self.preupload_lfs_files(  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4184, in preupload_lfs_files    _fetch_upload_modes(  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn    return fn(*args, **kwargs)  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/_commit_api.py", line 542, in _fetch_upload_modes    hf_raise_for_status(resp)  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status    raise _format(RepositoryNotFoundError, message, response) from ehuggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67b83d70-5af65b805cde0ba55c72abd1;ce2be295-8da7-4230-a8db-f505919ddb85)Repository Not Found for url: https://huggingface.co/api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1.Please make sure you specified the correct `repo_id` and `repo_type`.If you are trying to access a private or gated repo, make sure you are authenticated.Invalid username or password.Note: Creating a commit assumes that the repo already exists on the Huggingface Hub. Please use `create_repo` if it's not the case.这需要我们先注册一个Hugging Face的账号,然后免费获取一个token:
       

把token加到convert.py的前两行:
$ head -n 2 convert.py from huggingface_hub import loginlogin("your_token")再次执行该转换脚本:
$ python3 convert.py --force -y Salesforce/blip-image-captioning-baseconfig.json: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 16.2MB/s]pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:04<00:00, 7.94MB/s]Pr created at https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42### Success 🔥Yay! This model was successfully converted and a PR was open using your token, here:[https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42](https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42)成功构建safetensor文件。并提交了一个Pull Request,那么就算这个PR没有合并,我们也可以从自己的PR下载相关模型文件。
从HF下载仓库

我们可以使用git-lfs从Hugging Face的PR里面下载模型文件。首先我们从主分支下载所有的小型文件(非LFS文件):
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Salesforce/blip-image-captioning-base正克隆到 'blip-image-captioning-base'...remote: Enumerating objects: 76, done.remote: Counting objects: 100% (76/76), done.remote: Compressing objects: 100% (38/38), done.remote: Total 76 (delta 39), reused 72 (delta 37), pack-reused 0 (from 0)展开对象中: 100% (76/76), 323.20 KiB | 1.05 MiB/s, 完成.下载完轻量级文件,进入到下载的路径下:
$ cd blip-image-captioning-base/$ ll总用量 976drwxrwxr-x 3 dechin dechin   4096 2月  21 17:22 ./drwxrwxr-x 3 dechin dechin   4096 2月  21 17:22 ../-rw-rw-r-- 1 dechin dechin   4563 2月  21 17:22 config.jsondrwxrwxr-x 9 dechin dechin   4096 2月  21 17:22 .git/-rw-rw-r-- 1 dechin dechin   1477 2月  21 17:22 .gitattributes-rw-rw-r-- 1 dechin dechin    287 2月  21 17:22 preprocessor_config.json-rw-rw-r-- 1 dechin dechin    134 2月  21 17:22 pytorch_model.bin-rw-rw-r-- 1 dechin dechin   6359 2月  21 17:22 README.md-rw-rw-r-- 1 dechin dechin    125 2月  21 17:22 special_tokens_map.json-rw-rw-r-- 1 dechin dechin    134 2月  21 17:22 tf_model.h5-rw-rw-r-- 1 dechin dechin    506 2月  21 17:22 tokenizer_config.json-rw-rw-r-- 1 dechin dechin 711396 2月  21 17:22 tokenizer.json-rw-rw-r-- 1 dechin dechin 231508 2月  21 17:22 vocab.txt可以看到此时的大模型文件是没有被下载下来的,然后在这个路径下pull我们自己的PR分支的内容:
$ git pull origin refs/pr/42remote: Enumerating objects: 4, done.remote: Counting objects: 100% (4/4), done.remote: Compressing objects: 100% (3/3), done.remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)展开对象中: 100% (3/3), 1.42 KiB | 1.42 MiB/s, 完成.来自 https://huggingface.co/Salesforce/blip-image-captioning-base * branch            refs/pr/42 -> FETCH_HEAD更新 82a3776..0f2f8a0Fast-forward model.safetensors | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 model.safetensorsorigin后面跟着的是我们的PR的名称,在Hugging Face相关分支主页可以查看:
       

再次查看本地路径:
$ ll总用量 967508drwxrwxr-x 3 dechin dechin      4096 2月  21 17:24 ./drwxrwxr-x 3 dechin dechin      4096 2月  21 17:22 ../-rw-rw-r-- 1 dechin dechin      4563 2月  21 17:22 config.jsondrwxrwxr-x 9 dechin dechin      4096 2月  21 17:24 .git/-rw-rw-r-- 1 dechin dechin      1477 2月  21 17:22 .gitattributes-rw-rw-r-- 1 dechin dechin 989721336 2月  21 17:24 model.safetensors-rw-rw-r-- 1 dechin dechin       287 2月  21 17:22 preprocessor_config.json-rw-rw-r-- 1 dechin dechin       134 2月  21 17:22 pytorch_model.bin-rw-rw-r-- 1 dechin dechin      6359 2月  21 17:22 README.md-rw-rw-r-- 1 dechin dechin       125 2月  21 17:22 special_tokens_map.json-rw-rw-r-- 1 dechin dechin       134 2月  21 17:22 tf_model.h5-rw-rw-r-- 1 dechin dechin       506 2月  21 17:22 tokenizer_config.json-rw-rw-r-- 1 dechin dechin    711396 2月  21 17:22 tokenizer.json-rw-rw-r-- 1 dechin dechin    231508 2月  21 17:22 vocab.txt可以看到safetensors模型文件下载成功,这样我们就完成了线上模型格式转换,再下载到本地的过程。
总结概要

本文介绍了一种将Hugging Face上bin格式的大模型文件,在线转换为safetensors文件格式,然后下载到本地的方法。
版权声明

本文首发链接为:https://www.cnblogs.com/dechinphy/p/bin-safetensors.html
作者ID:DechinPhy
更多原著文章:https://www.cnblogs.com/dechinphy/
请博主喝咖啡:https://www.cnblogs.com/dechinphy/gallery/image/379634.html
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

239

主题

0

回帖

727

积分

高级会员

积分
727

QQ|智能设备 | 粤ICP备2024353841号-1

GMT+8, 2025-3-10 19:06 , Processed in 3.273348 second(s), 29 queries .

Powered by 智能设备

©2025

|网站地图