将对象类型列转换为数字、字符串等答案

【问题标题】：Converting a Object Type Column to a numeric, string, etc将对象类型列转换为数字、字符串等
【发布时间】：2017-08-13 18:57:14
【问题描述】：

我根据从AWS 提取的数据在 python 中创建了一个数据框。

我将使用 67 列中的 3 列，我意识到这些列的数据类型是对象。

我想知道如何将这些对象的数据类型更改为其他类型。

我尝试了很多方法，但都不起作用。

我的数据如下所示：

formation_tops = pd.read_csv("C:/Users/juan/Documents/revonos-ds-sandbox/formation_tops/regulatory_agency=COGCC/000000_0",
                             sep='\t', header = None, names= cols1, index_col = False, dtype='unicode')

然后我用我想要的 3 列创建了一个不同的数据框：

            formation_name log_bottom log_top
UWI                                           
05-001-05000      BENTONITE         \N    5118
05-001-05000         D SAND         \N    5211
05-001-05000      GREENHORN         \N    4908
05-001-05000         J SAND         \N    5260
05-001-05000       NIOBRARA         \N    4380
05-001-05001        CARLILE         \N    4720
05-001-05001         D SAND         \N    5131
05-001-05001      GREENHORN         \N    4821
05-001-05001         J SAND         \N    5179
05-001-05001          MOWRY         \N    5034
05-001-05001       NIOBRARA         \N    4227

我尝试了不同的方法来尝试更改数据类型，但出现以下错误：

File "pandas\_libs\src\inference.pyx", line 1047, in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56433)

ValueError: Unable to parse string "\N" at position 0

还有

 cleaned_dataframe['log_bottom']=  cleaned_dataframe.log_bottom.str.replace('\N', '')
                                                                              ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape

我假设由于存在 unicode 错误，我应该以某种方式将其编码为可读格式。

任何帮助将不胜感激。

【问题讨论】：

要更改哪些列？你想把它们从什么转换成什么？
你为什么要传递dtype='unicode'？只需将该参数删除到pd.read_csv
第一个到字符串，另外两个到数字（浮点数或整数）就可以了。
for col in width_cleaned:print (col, width_cleaned[col].dtypes)formation_name object log_bottom object log_top object
我运行这段代码，原始格式是对象。我拿出了那个论点，但它仍然不起作用。

标签： python pandas dataframe unicode

【解决方案1】：

我能够使用函数df['column'].convert_object(convert_numeric = True) 转换数据帧。

此函数允许将列显示为float64。它将\N 转换为NaN 并使用df.dropna() 函数，现在我的数据框已被清理。

【讨论】：