我的理解

  • 在不发生错误时尝试 Catch 不重
    • 出错的惩罚很大
  • 最好在设计时考虑输入的不可靠程度和执行频率。
    • 我惊讶地发现仅仅创建一个 try catch 块会花费很多。

⇒ 毕竟,“不要猜测,测量”是最好的

想确认

  • System.Text 中的 Encoding.GetEncoding(string) 没有 TryGet
  • 我想从用户输入字符串中获取编码,所以输入被认为是不安全的。
    • 尤其是对于日语,类似“ISO-2022-JP”之类的东西可能会或可能不会在执行环境中工作
    • 很容易将“shift_jis”拼写为“shift-jis”。
  • Try Catch 印象很深,于是想到了可以用GetEncodings()获取的LINQing EncodingInfo[]

⇒ 我想知道到底发生了什么

基准

测量环境

  • BenchmarkDotNet
  • M2 MacBook Air

具体如下环境

BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.6 (21G115) [Darwin 21.6.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.402
  [Host]     : .NET 6.0.7 (6.0.722.32202), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 6.0.10 (6.0.1022.47605), Arm64 RyuJIT AdvSIMD
  • 请小心,因为除非您这样做,否则无法使用 BenchmarkDotNet brew install --cask dotnet-sdk

结果

  • 如果 GetEncoding(string) 成功,即使使用 try catch 块也非常快

    • 另一方面,它在失败和捕获的情况下慢了大约 700 倍。
  • 使用GetEncodings().FirstOrDefault(),无论您在寻找什么,它几乎都是一样的(EncodingInfo[] 是一个数组并且大小足够小,因此找到 First 并中断扫描似乎不太可能是一个优势)

    • 但比GetEncoding(string) 成功时慢约 45 倍
  • 第一次

|                          Method |         Mean |     Error |    StdDev |
|-------------------------------- |-------------:|----------:|----------:|
|    GetValidEncodingWithTryCatch |     22.26 ns |  0.035 ns |  0.032 ns |
|  GetInvalidEncodingWithTryCatch | 15,183.12 ns | 37.232 ns | 34.827 ns |
|   GetValidEncodingWithIteration |    899.17 ns |  1.814 ns |  1.697 ns |
| GetInvalidEncodingWithIteration |    894.75 ns |  6.936 ns |  5.792 ns |
  • 第二次
|                          Method |         Mean |      Error |     StdDev |
|-------------------------------- |-------------:|-----------:|-----------:|
|    GetValidEncodingWithTryCatch |     23.11 ns |   0.445 ns |   0.416 ns |
|  GetInvalidEncodingWithTryCatch | 16,103.76 ns | 316.660 ns | 311.003 ns |
|   GetValidEncodingWithIteration |    949.95 ns |  18.979 ns |  31.183 ns |
| GetInvalidEncodingWithIteration |    942.63 ns |  15.243 ns |  13.512 ns |

代码

  • 比较代码
using System.Text;
using BenchmarkDotNet.Attributes;

namespace Benchy;

public class GetEncodingTest
{
    private string validEncode { get; } = "UTF-8";
    private string invalidEncode { get; } = "utf8";

    public GetEncodingTest()
    {
    }

    [Benchmark]
    public Encoding GetValidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(validEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(invalidEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetValidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == validEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == invalidEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
}
  • 入口点
using BenchmarkDotNet.Running;
using Benchy;

BenchmarkRunner.Run<GetEncodingTest>();

多余

我觉得太快了,还以为是在IL级别优化的,所以从文件中读取,但是没有用。

|                          Method |         Mean |     Error |    StdDev |
|-------------------------------- |-------------:|----------:|----------:|
|    GetValidEncodingWithTryCatch |     22.88 ns |  0.069 ns |  0.061 ns |
|  GetInvalidEncodingWithTryCatch | 15,525.24 ns | 20.935 ns | 18.558 ns |
|   GetValidEncodingWithIteration |    918.30 ns |  1.785 ns |  1.670 ns |
| GetInvalidEncodingWithIteration |    916.27 ns |  1.619 ns |  1.435 ns |

using System.Text;
using BenchmarkDotNet.Attributes;

namespace Benchy;

public class GetEncodingTest
{
    private string validEncode { get; }
    private string invalidEncode { get; }

    public GetEncodingTest()
    {
        validEncode = System.IO.File.ReadAllText("valid.txt").Trim();
        invalidEncode = System.IO.File.ReadAllText("invalid.txt").Trim();
    }

    [Benchmark]
    public Encoding GetValidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(validEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(invalidEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetValidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == validEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == invalidEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
}

后记

我从@albireo 的评论中确认了,但是在这个过程中缓存肯定是有利的。

即使您每次都生成一个 Dictionary,它也只会比每次执行 FirstOrDefault 快 1.5 倍左右,因此即使迭代次数很少,缓存也可能是有利的。

|                          Method |          Mean |      Error |     StdDev |
|-------------------------------- |--------------:|-----------:|-----------:|
|    GetValidEncodingWithTryCatch |     22.295 ns |  0.0540 ns |  0.0479 ns |
|  GetInvalidEncodingWithTryCatch | 15,081.121 ns | 20.7711 ns | 19.4293 ns |
|   GetValidEncodingWithIteration |    894.545 ns |  2.0607 ns |  1.9276 ns |
| GetInvalidEncodingWithIteration |    905.887 ns | 16.0912 ns | 15.8037 ns |
|       GetValidEncodingWithCache |      7.990 ns |  0.0057 ns |  0.0045 ns |

|                          Method |          Mean |      Error |     StdDev |
|-------------------------------- |--------------:|-----------:|-----------:|
|    GetValidEncodingWithTryCatch |     22.312 ns |  0.0384 ns |  0.0341 ns |
|  GetInvalidEncodingWithTryCatch | 15,092.002 ns | 33.9453 ns | 30.0916 ns |
|   GetValidEncodingWithIteration |    893.170 ns |  1.2647 ns |  1.1830 ns |
| GetInvalidEncodingWithIteration |    848.864 ns |  1.5862 ns |  1.4837 ns |
|     GetInvalidEncodingWithCache |      7.993 ns |  0.0045 ns |  0.0042 ns |

|                          Method |         Mean |     Error |    StdDev |
|-------------------------------- |-------------:|----------:|----------:|
|    GetValidEncodingWithTryCatch |     22.30 ns |  0.031 ns |  0.028 ns |
|  GetInvalidEncodingWithTryCatch | 15,119.33 ns | 15.193 ns | 13.468 ns |
|   GetValidEncodingWithIteration |    896.30 ns |  1.740 ns |  1.627 ns |
| GetInvalidEncodingWithIteration |    897.54 ns |  2.107 ns |  1.971 ns |
|         GetValidEncodingWithDic |  1,124.98 ns |  1.886 ns |  1.764 ns |
  • 验证码
using System.Text;
using BenchmarkDotNet.Attributes;

namespace Benchy;

public class GetEncodingTest
{
    private string validEncode { get; }
    private string invalidEncode { get; }
    
    private static Dictionary<string, Encoding> encodingCache = null;

    public GetEncodingTest()
    {
        validEncode = System.IO.File.ReadAllText("valid.txt").Trim();
        invalidEncode = System.IO.File.ReadAllText("invalid.txt").Trim();
    }

    [Benchmark]
    public Encoding GetValidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(validEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithTryCatch()
    {
        Encoding enc;
        try
        {
            enc = Encoding.GetEncoding(invalidEncode);
        }
        catch (ArgumentException e)
        {
            enc = Encoding.UTF8;
        }
        return enc;
    }
    [Benchmark]
    public Encoding GetValidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == validEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
    [Benchmark]
    public Encoding GetInvalidEncodingWithIteration()
    {
        return Encoding.GetEncodings().FirstOrDefault(o => o.Name == invalidEncode)?.GetEncoding() ?? Encoding.UTF8;
    }
    
    public Encoding GetValidEncodingWithCache()
    {
        if (encodingCache is null)
        {
            encodingCache = new Dictionary<string, Encoding>(
                Encoding.GetEncodings().Select(
                    encInf => new KeyValuePair<string, Encoding>(encInf.Name, encInf.GetEncoding())
                )
            );
        }

        return encodingCache.TryGetValue(validEncode, out Encoding enc) ? enc : Encoding.UTF8;
    }

    public Encoding GetInvalidEncodingWithCache()
    {
        if (encodingCache is null)
        {
            encodingCache = new Dictionary<string, Encoding>(
                Encoding.GetEncodings().Select(
                    encInf => new KeyValuePair<string, Encoding>(encInf.Name, encInf.GetEncoding())
                )
            );
        }

        return encodingCache.TryGetValue(invalidEncode, out Encoding enc) ? enc : Encoding.UTF8;
    }
    
    [Benchmark]
    public Encoding GetValidEncodingWithDic()
    {
        encodingCache = new Dictionary<string, Encoding>(
            Encoding.GetEncodings().Select(
                encInf => new KeyValuePair<string, Encoding>(encInf.Name, encInf.GetEncoding())
            )
        );
        return encodingCache.TryGetValue(validEncode, out Encoding enc) ? enc : Encoding.UTF8;
    }
}

原创声明:本文系作者授权爱码网发表,未经许可,不得转载;

原文地址:https://www.likecs.com/show-308631685.html

相关文章:

  • 2021-07-31
  • 2021-11-16
  • 2022-12-23
  • 2021-11-09
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2021-10-29
  • 2022-12-23
  • 2021-11-28
  • 2021-05-24
  • 2021-08-05
相关资源
相似解决方案