CRNN+EAST实现银行卡号定位与识别
源码:https://github.com/ShawnHXH/BankCard-Recognizer
实现工具:Python 3.6, Win10, Keras(backend is TensorFlow)
CRNN:
需求分析:
1. 银行卡号的长度大小并不是固定不变的,有的有20个字符,有的只有19个。所以模型要能够识别不定长度的卡号;
2. 模型的输入是图像,输出是文本,故模型既需要涉及CNN也需要涉及到RNN,故称为CRNN。
模型选取:
1. 不定长度的识别,目前多流行采用CTC作为损失函数;
2. CNN则选择采用了VGG, RNN可以使用双向LSTM(BLSTM)或GRU;
模型预览:
1. CNN部分:
1 def PatternUnits(inputs, index, activation="relu"): 2 inputs = BatchNormalization(name="BN_%d" % index)(inputs) 3 inputs = Activation(activation, name="Relu_%d" % index)(inputs) 4 5 return inputs
6 initializer = initializers.he_normal() 7 inputs = Input(shape=(img_height, img_width, 1), name=\'img_inputs\') 8 x = Conv2D(64, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_1\')(inputs) 9 x = PatternUnits(x, 1) 10 x = MaxPooling2D(strides=2, name=\'Maxpool_1\')(x) 11 x = Conv2D(128, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_2\')(x) 12 x = PatternUnits(x, 2) 13 x = MaxPooling2D(strides=2, name=\'Maxpool_2\')(x) 14 15 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_3\')(x) 16 x = PatternUnits(x, 3) 17 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_4\')(x) 18 x = PatternUnits(x, 4) 19 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name=\'Maxpool_3\')(x) 20 21 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_5\')(x) 22 x = PatternUnits(x, 5) 23 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name=\'Conv2d_6\')(x) 24 x = PatternUnits(x, 6) 25 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name=\'Maxpool_4\')(x) 26 27 x = Conv2D(512, (2, 2), padding=\'same\', activation=\'relu\', kernel_initializer=initializer, name=\'Conv2d_7\')(x) 28 x = PatternUnits(x, 7) 29 conv_output = MaxPooling2D(pool_size=(2, 1), name="Conv_output")(x) 30 x = Permute((2, 3, 1), name=\'Permute\')(conv_output)
2. RNN部分(使用BLSTM):
rnn_input = TimeDistributed(Flatten(), name=\'Flatten_by_time\')(x) y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), merge_mode=\'sum\', name=\'LSTM_1\')(rnn_input) y = BatchNormalization(name=\'BN_8\')(y) y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), name=\'LSTM_2\')(y) y_pred = Dense(num_classes, activation=\'softmax\', name=\'y_pred\')(y)
3. CTC损失函数:
def ctc_loss_layer(args): """ y_true: True label. y_pred: Predict label. pred_length: Predict label length. label_length: True label length. :param args: (y_true, y_pred, pred_length, label_length). :return: batch_cost with shape (batch_size, 1). """ y_true, y_pred, pred_length, label_length = args batch_cost = K.ctc_batch_cost(y_true, y_pred, pred_length, label_length) return batch_cost y_true = Input(shape=[max_label_length], name=\'y_true\') y_pred_length = Input(shape=[1], name=\'y_pred_length\') y_true_length = Input(shape=[1], name=\'y_true_length\') ctc_loss_output = Lambda(ctc_loss_layer, output_shape=(1,), name=\'ctc_loss_output\')([y_true, y_pred, y_pred_length, y_true_length])
EAST:
当下最热的图像文本定位算法莫属CTPN,其次还有Faster RCNN, Seg-Link,Mask RCNN,EAST等等。
详情见:https://github.com/huoyijie/AdvancedEAST