【问题标题】:Newline between delimiters in awkawk 中分隔符之间的换行符
【发布时间】:2017-12-11 07:17:56
【问题描述】:

我正在解析一个文件,其中包含带有 nginx GET 请求正文的字符串。有时它包含同一请求的两个部分之间的换行符,因此我无法使用 awk 解析此类请求。

我有两个带有awk -F'delimeter1: |delimiter2' 的分隔符,也许我可以以某种方式告诉 awk 这些分隔符之间可能有一个换行符,所以它会将这两行作为一个处理?

提前致谢。

示例输入(Java 错误是随机示例):

[2017-12-04 20:53:07] [ERROR] [ID-XX] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&c33=427&d28=
Like
&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
          at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
          at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
          at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
          at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
          at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
          at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
          at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
          at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
          at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
          at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
          at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
          at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
          at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
          at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
          at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
          at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
          at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
  Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
          ... 12 more
[2017-12-04 21:03:07] [ERROR] [ID-YY] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&em=Exception: Error: [$sc:ind] Aborting!&ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16&time=04/Dec/2017:21:03:07 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
          at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
          at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
          at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
          at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
          at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
          at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
          at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
          at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
          at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
          at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
          at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
          at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
          at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
          at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
          at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
          at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
          at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
  Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
          ... 12 more
[2017-12-04 19:40:02] [ERROR] [ID-ZZ] Get: el=search&dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)&id=104777577&a=770227875&t=pageview&ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36&time=04/Dec/2017:19:39:04 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
          at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
          at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
          at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
          at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
          at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
          at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
          at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
          at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
          at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
          at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
          at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
          at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
          at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
          at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
          at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
          at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
          at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
  Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
          ... 12 more

所需的输出(在一行中打印 ID 和正文(在“”中)并将 &amp; 替换为 _&amp;_ ):

ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like_&_je=0_&_s4d=4-b_&_c32=(not set)_&_ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:20:52:02 +0200_&_qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://example.market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"

那些被撕裂的请求正文字符串并不多,大多数都在一行中,正如预期的那样。此外,只有 GET 请求出错,因此搜索模式不必包含 Get(不是必需的)。

【问题讨论】:

  • 感谢您的关注,@John1024。希望我的澄清很清楚。
  • @cardinal-gray:总是三组这样的行?
  • @Inian 不,那些撕裂的请求正文并不多,正如预期的那样,大多数都在一行中。
  • @RomanPerekhrest 只有带有错误的 GET 请求,所以是的
  • @cardinal-gray,单条发帖不够,发几条Get请求……单行多行混用

标签: regex bash awk sed


【解决方案1】:

Awk解决方案:

awk 'f{ if (/^\[/) { printf "\042\n"; f=0 } else printf("%s", $0) }
     / Get:/{ 
         f=1; gsub(/[\[\]]/, "", $4); id=$4; sub(/^.* Get: /, "");
         gsub("&", "_&_"); printf "%s \042%s",id,$0 
     }
     END{ if (f) printf "\042\n" }' file
  • / Get:/ - 遇到 "Get request"
    • f=1 - f 是表示从属/内部处理的标记
    • id=$4 - 捕获 ID 字段(例如ID-XX

输出:

ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"

【讨论】:

  • 感谢您的帮助,伙计,它工作得很好,但是,正如我所提到的,它是 Java 日志,因此在每条错误消息之后,它都包含大量带有类名等的无用输出。难道你不知道,我怎样才能避免它们添加到每个 ID "request_body" 字符串的末尾?我将在问题中添加一些示例字符串。提前致谢。
【解决方案2】:

我了解您希望保留带有 ERROR 的行并对其进行格式化。

我们不知道分隔符是什么。

奇怪的是你把 get 放在行尾。

你可以试试这个 sed

sed '
/.*ERROR] \[/!d                     # get the line with ERROR
s///                                # delete all from start to ID
:A
/=get$/!{N;bA}                      # if the line not end with =get; get one more
s/\([^]]*\)[^:]*: \(.*\)/\1 "\2"/   # remove Get: and add "
s/\n//g                             # remove \n
s/&/_&_/g                           # replace & by _&_
' infile

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-08-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-11-14
    • 2023-03-18
    • 1970-01-01
    • 2013-02-14
    相关资源
    最近更新 更多