RDKit一个用于化学信息学的python库。使用支持向量回归(SVR)来预测logP。 分子的输入结构特征是摩根指纹,输出是logP。


代码示例:


  1. #导入依赖库

  2. import numpy as np

  3. from rdkit import Chem

  4. from rdkit.Chem.Crippen import MolLogP

  5. from rdkit import Chem, DataStructs

  6. from rdkit.Chem import AllChem

  7. from sklearn.svm import SVR

  8. from sklearn.metrics import mean_squared_error, r2_score

  9. from scipy import stats

  10. import matplotlib.pyplot as plt

载入smile分子库,计算morgan指纹和logP

  1. num_mols = 5000

  2. f = open('smiles.txt', 'r')

  3. contents = f.readlines()

  4. fps_total = []

  5. logP_total = []

  6. for i in range(num_mols):

  7. smi = contents[i].split()[0]

  8. m = Chem.MolFromSmiles(smi)

  9. fp = AllChem.GetMorganFingerprintAsBitVect(m,2)

  10. arr = np.zeros((1,))

  11. DataStructs.ConvertToNumpyArray(fp,arr)

  12. fps_total.append(arr)

  13. logP_total.append(MolLogP(m))

  14. fps_total = np.asarray(fps_total)

  15. logP_total = np.asarray(logP_total)

划分训练集和测试集

  1. num_total = fps_total.shape[0]

  2. num_train = int(num_total*0.8)

  3. num_total, num_train, (num_total-num_train)

  1. fps_train = fps_total[0:num_train]

  2. logP_train = logP_total[0:num_train]

  3. fps_test = fps_total[num_train:]

  4. logP_test = logP_total[num_train:]

将SVR模型用于回归模型

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

  1. _gamma = 5.0

  2. clf = SVR(kernel='poly', gamma=_gamma)

  3. clf.fit(fps_train, logP_train)

完成训练后,应该检查预测的准确性。对于评估,将使用r2和指标的均方误差。

  1. logP_pred = clf.predict(fps_test)

  2. r2 = r2_score(logP_test, logP_pred)

  3. mse = mean_squared_error(logP_test, logP_pred)

  4. r2, mse

模型结果可视化

  1. slope, intercept, r_value, p_value, std_error = stats.linregress(logP_test, logP_pred)

  2. yy = slope*logP_test+intercept

  3. plt.scatter(logP_test, logP_pred, color='black', s=1)

  4. plt.plot(logP_test, yy, label='Predicted logP = '+str(round(slope,2))+'*True logP + '+str(round(intercept,2)))

  5. plt.xlabel('True logP')

  6. plt.ylabel('Predicted logP')

  7. plt.legend()

  8. plt.show()

RDKit:基于支持向量回归(SVR)预测logP

参考:

https://github.com/SeongokRyu/CH485---Artificial-Intelligence-and-Chemistry

https://blog.csdn.net/zb123455445/article/details/78354489

相关文章: