paddlehub -- 开箱即用的模型库

PaddleHub

https://github.com/PaddlePaddle/PaddleHub

令人惊叹的已训练好的模型工具库，基于Paddle。

Awesome pre-trained models toolkit based on PaddlePaddle.（260+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)

提供丰富、高质量、直接可用的已训练好的模型

不需要深度学习背景

覆盖四大类别，图像、文本、音频、视频

开源、免费

Introduction

PaddleHub aims to provide developers with rich, high-quality, and directly usable pre-trained models.

No need for deep learning background, you can use AI models quickly and enjoy the dividends of the artificial intelligence era.

Covers 4 major categories of Image, Text, Audio, and Video, and supports one-click prediction, easy service deployment and transfer learning

All models are OPEN SOURCE, FREE to download and use them in offline scenario.

特定模型服务于特定场景

https://www.paddlepaddle.org.cn/hub

PaddleHub

便捷地获取PaddlePaddle生态下的预训练模型，完成模型的管理和一键预测。配合使用Fine-tune API，可以基于大规模预训练模型快速完成迁移学习，让预训练模型能更好地服务于用户特定场景的应用。

无需数据和训练，一键模型应用

一键模型转服务

易用的迁移学习

丰富的预训练模型

安装两个库

!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple

示例

几行代码就可使用。

如下是中文分词工具使用

!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple

import paddlehub as hub

lac = hub.Module(name="lac")
test_text = ["今天是个好天气。"]

results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
print(results)
#{\'word\': [\'今天\', \'是\', \'个\', \'好天气\', \'。\'], \'tag\': [\'TIME\', \'v\', \'q\', \'n\', \'w\']}

模型库-modelbase

https://www.paddlepaddle.org.cn/modelbase

智能视觉(PaddleCV)

图像分类

目标检测

图像分割

关键点检测

图像生成

场景文字识别

度量学习

视频

智能文本处理(PaddleNLP)

NLP 基础技术

NLP 核心技术

NLP系统应用

智能推荐(PaddleRec)

智能语音(PaddleSpeech)

其他模型

chinese_ocr_db_crnn_mobile

https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition

支持中文的OCR模型。

支持三种使用方式

命令行预测

$ hub run chinese_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"

API调用

import paddlehub as hub
import cv2

ocr = hub.Module(name="chinese_ocr_db_crnn_mobile")
result = ocr.recognize_text(images=[cv2.imread(\'/PATH/TO/IMAGE\')])

# or
# result = ocr.recognize_text(paths=[\'/PATH/TO/IMAGE\'])

服务部署

启动PaddleHub Serving

运行启动命令：

$ hub serving start -m chinese_ocr_db_crnn_mobile

发送预测请求

配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果

import requests
import json
import cv2
import base64

def cv2_to_base64(image):
    data = cv2.imencode(\'.jpg\', image)[1]
    return base64.b64encode(data.tostring()).decode(\'utf8\')

# 发送HTTP请求
data = {\'images\':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))

# 打印预测结果
print(r.json()["results"])

DEMO

https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/paddle/ocr.py

从验证码中提取数字

import paddlehub as hub
import cv2

ocr = hub.Module(name="chinese_ocr_db_crnn_mobile")
result = ocr.recognize_text(images=[cv2.imread(\'./test2.png\')])

print(result)

如下为打印，粗体为提取数字。

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0310 14:03:05.659210 11502 default_variables.cpp:429] Fail to open /proc/self/io: No such file or directory [2]
/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module\'s documentation for alternative uses
import imp
[2021-03-10 14:03:14,697] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0310 14:03:14.708413 11502 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
[2021-03-10 14:03:15,063] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
[{\'save_path\': \'\', \'data\': [{\'text\': \'6067\', \'confidence\': 0.8805994987487793, \'text_box_position\': [[9, 2], [52, 2], [52, 16], [9, 16]]}]}]