检查特定列中值的长度是否超过 11答案

【问题标题】：Check if the length of values in a specific column exceeds 11检查特定列中值的长度是否超过 11
【发布时间】：2021-09-03 13:17:51
【问题描述】：

我正在尝试编写一个脚本（参见下面的代码）来检查“手机号码”列中的任何值是否超过 11 的长度。如果有，则打印该值的索引并删除数据框中该索引的整个记录。但是，程序没有正确执行这行代码：if len(data['Mobile Phone Number']) > 11:，即使满足条件？我需要删除两个超过11 长度的电话号码。

import pandas as pd

data = {
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
}

df = pd.DataFrame(data)

print(df)

for i in range(len(data)):
    if len(data['Mobile Phone Number']) > 11:
        print('Number at index ', i, 'is incorrect')
        data = data.drop(['Mobile Phone Number'][i], axis=1)
    else:
        print('\nNo length of > 11 found in Mobile Phone Numbers')

这是上面代码的输出：

     Name  Mobile Phone Number
0     Tom          13805647925
1  Joseph      145792860326480
2   Krish      184629730518469
3    John          18218706491

No length of > 11 found in Mobile Phone Numbers

No length of > 11 found in Mobile Phone Numbers

【问题讨论】：

len(data['Mobile Phone Number']) 确实返回您的列中有多少电话号码，而不是它们的长度
您的示例可能是错误的，因为如果您的电话号码前导为 0，您将丢失该号码，因为您的列的 dtype 是“int”（在我的国家就是这种情况）。
另外，您正在对字典 (data) 进行操作，我想您应该改用数据框 (df)。
手机号码在你的DataFrame中应该是字符串还是数字？
@accdias，谢谢！好地方！

标签： python pandas

【解决方案1】：

对于以下Dataframe() 作为输入：

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

#      Name  Mobile Phone Number
# 0     Tom          13805647925
# 1  Joseph      145792860326480
# 2   Krish      184629730518469
# 3    John          18218706491

你可以试试这个：

df = df[df['Mobile Phone Number'].apply(lambda x: len(str(x)) <= 11)]
df

要有这个输出：

    Name    Mobile Phone Number
0   Tom     13805647925
3   John    18218706491

编辑：如果你想显示number > 11 的错误，你可以试试这个：

if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11)):
    print("Error! you have number > 11")

第二次编辑：如果你想显示错误消息然后删除number >11使用下面的代码：

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

print(df)

if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11)):
    print("\n Error! you have number > 11 \n")
    df = df[df['Mobile Phone Number'].apply(lambda x: len(str(x)) <= 11)]

print(df)

输出：

     Name  Mobile Phone Number
0     Tom          13805647925
1  Joseph      145792860326480
2   Krish      184629730518469
3    John          18218706491


 Error! you have number > 11 


   Name  Mobile Phone Number
0   Tom          13805647925
3  John          18218706491

【讨论】：

嗨@user1740577，这是有效的！但是，您能解释一下为什么 lambda 中的条件是 11？
@hello ，你想删除数字> 11 然后你想保留<= 11 为此我设置条件<=11 然后为他们获取true 并显示true 的行
@hello 如果这是正确的，请upvote 并阅读此链接：meta.stackexchange.com/questions/5234/…
如果 > 11 我也想打印错误消息怎么办？
@你好，我编辑答案并添加这一行：if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11))这是你的答案吗？

【解决方案2】：

你可以试试这个：

moblie_longer_than_11 = df[df["Mobile Phone Number"].astype(str)\
                                                    .apply(len).gt(11)].index

print(df.loc[set(df.index).difference(moblie_longer_than_11)])

输出：

    Name    Mobile Phone Number
0   Tom     13805647925
3   John    18218706491

【讨论】：

你不需要那个 \ 来告诉 Python 继续下一行。
@accdias 你是什么意思？
`.astype(str)\` 中不需要尾随反斜杠。
@accdias 哦！我虽然在不同的行上跟踪 pandas 方法是必需的。谢谢！

【解决方案3】：

这是先前答案的组合，以给出 OP 预期的结果。归功于其他作者。

import pandas as pd

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

invalid_phones = df['Mobile Phone Number'].astype(str).apply(len).gt(11)

if invalid_phones.any():
    for _ in df[invalid_phones].index:
        print(f'Number at index {_} is incorrect')
else:
    print('No length of > 11 found in Mobile Phone Numbers')

上面的代码会产生如下输出：

Number at index 1 is incorrect
Number at index 2 is incorrect

要从df 中删除无效电话，您可以使用：

df = df.loc[set(df.index).difference(df[invalid_phones].index)]

或：

df = df.drop(df[invalid_phones].index)

甚至更好：

df.drop(df[invalid_phones].index, inplace=True)

这将导致以下结果：

print(df)
   Name  Mobile Phone Number
0   Tom          13805647925
3  John          18218706491

【讨论】：

嗨@accdias，感谢您的更新！是不是也可以在检查后去掉不正确的？
当然！我将更新答案并将其附加到代码中。
您是否收到有关删除无效电话的代码行的警告消息？ - 'UserWarning: Boolean Series key 将被重新索引以匹配 DataFrame 索引。'
您提供的示例数据没有警告。
好的，不用担心。对于我猜的错误消息，我将在 StackOverflow 上发布一个新问题。感谢您的帮助！

【解决方案4】：

我相信在你的情况下，你可以比较数字。

mask = df['Mobile Phone Number'] >= 1e11

if mask.any():
    for i in df[mask].index:
        print('Number at index ', i, 'is incorrect')
else:
    print('\nNo length of > 11 found in Mobile Phone Numbers')

【讨论】：