【问题标题】:Extract pattern from column从列中提取模式
【发布时间】:2015-09-20 14:25:50
【问题描述】:

我正在处理一个巨大的 Excel 工作表(有 200K 行),我需要从某个列 (B) 列表中提取字符串中存在的所有电子邮件地址。

我想要达到的目标:

  1. 从字符串中提取电子邮件
  2. (at) 转换为@(dot) 转换为.
  3. 将姓名和电子邮件保存在不同的列中

B 列示例:

Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com>
Puneet Arora <ar.puneetarora(at)gmail(dot)com>
Anand Upadhyay <001.anand(at)gmail(dot)com>
Rajat Gupta <rajatgupta0889(at)gmail(dot)com>
Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.

虽然我希望能够在 Excel 上执行此操作,但任何其他基于 Windows 的实用程序建议都会有所帮助。

【问题讨论】:

  • 您更喜欢哪种语言?如果是 python,我应该使用 openpyxl。

标签: regex excel pattern-matching vba


【解决方案1】:

这可以假设它们都采用相同的格式并且每个单元格只添加 1 封电子邮件

=SUBSTITUTE(SUBSTITUTE(MID(B1,FIND("

【讨论】:

    【解决方案2】:

    试试这个:

    Sub splitter()
       Dim r As Range, v As String
    
       For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange)
          v = r.Text
          If v <> "" Then
             ary = Split(v, " <")
             r.Offset(0, 1).Value = ary(0)
             r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "@"), "(dot)", ".")
          End If
       Next r
    End Sub
    

    这个子使用列 CD 作为输出。修改代码以满足您的需求。

    【讨论】:

      【解决方案3】:

      要提取名称,请尝试 =TRIM(LEFT(B1,FIND("

      【讨论】:

        【解决方案4】:

        您也可以通过正则表达式轻松完成此操作(您需要添加对 Microsoft VBScript 正则表达式的引用):

        Private Sub ExtractEmailInfo(value As String)
        
            Dim expr As New RegExp
            Dim result As Object
            Dim user As String
            Dim addr As String
        
            expr.Pattern = "(.+)(<.+>)"
            Set result = expr.Execute(value)
            If result.Count > 0 Then
                user = result(0).SubMatches(0)
                addr = result(0).SubMatches(1)
                'Strip the < and >
                addr = Mid$(addr, 2, Len(addr) - 2)
                addr = Replace$(addr, "(at)", "@")
                addr = Replace$(addr, "(dot)", ".")
            End If
        
            Debug.Print user
            Debug.Print addr
        
        End Sub
        

        Debug.Print 调用替换为您需要执行的任何操作以将它们放入单元格中。

        【讨论】:

          【解决方案5】:

          这可以在不到 15 秒的时间内完成 200 K 行:

          Option Explicit
          
          Sub extractPattern()
              Dim ws As Worksheet, ur As Range, rng As Range, t As Double
              Dim fr As Long, fc As Long, lr As Long, lc As Long
          
              Set ws = Application.ThisWorkbook.Worksheets("Sheet1")
              Set ur = ws.UsedRange
              fr = 1
              fc = 1
              lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row
              lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column
          
              Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc))
          
              enableXL False
              t = Timer
              rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _
                                TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _
                                Space:=True
              With ws.Columns(lc + 3)
                  .Replace What:="(at)", Replacement:="@", LookAt:=xlPart
                  .Replace What:="(dot)", Replacement:=".", LookAt:=xlPart
                  .Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart
                  .Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart
              End With
              ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit
              Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds"
              enableXL    'Total rows: 200,000, Duration: 14.4296875 seconds
          End Sub
          
          Private Sub enableXL(Optional ByVal opt As Boolean = True)
              Application.ScreenUpdating = opt
              Application.EnableEvents = opt
              Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
          End Sub
          

          它将新数据放在末尾的第一个未使用的列中(也拆分名称)

          【讨论】:

            猜你喜欢
            • 2021-12-09
            • 2021-06-16
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2022-06-10
            • 2023-02-18
            相关资源
            最近更新 更多