Spring Batch：Azure SQL 性能不一致答案

【问题标题】：Spring Batch: Azure SQL Performance is inconsistentSpring Batch：Azure SQL 性能不一致
【发布时间】：2021-11-08 14:56:13
【问题描述】：

我有一个使用 Azure SQL 服务器作为后端的 Spring Batch 应用程序，我正在使用 Hibernate 来更新数据库。

我正在使用 FlatfileReader 从 CSV 文件中读取数据并使用 ItemWriter 写入 Azure SQL Server，如下所述

下面是我的休眠配置

<beans
    xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:tx="http://www.springframework.org/schema/tx"
    xmlns:p="http://www.springframework.org/schema/p"
    xsi:schemaLocation="http://www.springframework.org/schema/batch
    http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
    http://www.springframework.org/schema/tx
    http://www.springframework.org/schema/tx/spring-tx.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-3.0.xsd
    ">
    
    <context:annotation-config/>
    <context:component-scan base-package="com.demo.entity" />

    <bean id="itemWriter" class="com.demo.batch.jobs.csv.Writer" >
        <constructor-arg ref = "hibernateItemWriter"/>
    </bean>

    <bean id="hibernateItemWriter" class="org.springframework.batch.item.database.HibernateItemWriter">
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>

    <bean id="sessionFactory" class="org.springframework.orm.hibernate5.LocalSessionFactoryBean" >
        <property name="dataSource" ref="irusDataSource"/>
        <property name="hibernateProperties" ref="hibernateProperties"/>
        <property name="packagesToScan">
            <list>
                <value>com.demo.*</value>
            </list>
        </property>
    </bean>

    <!-- DATA SOURCE -->
    <bean id="irusDataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
        <property name="url" value="jdbc:sqlserver://sqlserver85.database.windows.net:1433;database=sqldb;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;" />
        <property name="username" value="ddddd" />
        <property name="password" value="`JNp" />
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.hibernate5.HibernateTransactionManager" lazy-init="true">
        <property name="sessionFactory" ref="sessionFactory" />
    </bean>

    <tx:annotation-driven transaction-manager="transactionManager"/>
    <bean id="hibernateProperties" class="org.springframework.beans.factory.config.PropertiesFactoryBean">
        <property name="properties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2012Dialect</prop>
            </props>
        </property>
    </bean>
</beans>
<bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="classpath:cvs/input/students.csv" />
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean
                    class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="names" value="student" />
                </bean>
            </property>
            <property name="fieldSetMapper">
                <bean class="com.demo.batch.mapper.StudentMapper" />
            </property>
        </bean>
    </property>
</bean>
<batch:step id="starterJob">
    <batch:tasklet>
        <batch:chunk
                reader="cvsFileItemReader"
                processor="itemProcessor"
                writer="itemWriter"
                commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
</batch:step>

下面是ItemProcessor

import com.demo.entity.Student;
import org.springframework.batch.item.ItemProcessor;

public class Processor implements ItemProcessor<Student, Student> {

    @Override
    public Student process(Student item) throws Exception {
        
        System.out.println("Processing..." + item);
        //Thread.sleep(50);
        return item;
    }

}

下面是ItemWriter

public class Writer implements ItemWriter<Student> {
    private HibernateItemWriter<Student> hibernateItemWriter;

    public Writer(HibernateItemWriter<Student> hibernateItemWriter) {
        this.hibernateItemWriter = hibernateItemWriter;

        System.out.println("Hibernate Writer instance is created..: " + hibernateItemWriter.hashCode());
    }

    @Override
    //@Transactional(propagation = Propagation.REQUIRES_NEW)
    public void write(List<? extends Student> list) throws Exception {
        hibernateItemWriter.write(list);
    }
}

CSV 文件有 6k 条记录，有时需要 4 分钟 才能完成工作

其他时候，需要 15 分钟

在同一个虚拟机上执行的简单代码怎么会有不同的性能结果？这是由于 Azure SQL 性能造成的吗不一致？如何确保它总是有 4 分钟执行时间？

执行 4 分钟的统计数据

19100 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15127700 nanoseconds spent preparing 200 JDBC statements;
4103544900 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
2099567900 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

15 分钟执行的统计数据

19400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15668300 nanoseconds spent preparing 200 JDBC statements;
13671456600 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
6881730800 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

【问题讨论】：

标签： azure azure-sql-database spring-batch azure-sql-server

【解决方案1】：

6K 记录的简单插入花费这么多时间听起来不太合适。您可以尝试启用休眠统计信息（请参阅here），您可能会了解休眠在其他内部任务和执行 SQL 上花费了多少时间。您将看到如下所示的内容

2021-11-08 21:58:18 - Session Metrics {
    4918700 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    20 nanoseconds spent preparing 1 JDBC statements;
    300 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

此外，值得检查运行代码的 VM 和 SQL Server 之间的网络延迟（例如，如果它们位于不同的区域，您可能会在网络往返中与数据库交互时受到很大的惩罚，尤其是如果您的 SQL 是没有被批处理）

【讨论】：

用统计信息更新了问题，查询执行和刷新需要更多时间。然而我不知道为什么。你能推荐吗？
查询执行和刷新有什么区别？
因此，在您的情况下，主要时间花在 VM 本身消耗 CPU 和内存上。查询执行是运行查询的时间 - 刷新是 Hibernate 检查对象及其脏污关系的方式（对于已更改的内容）。 Plus 看起来你没有使用 SQL Batching。
“执行 200 条 JDBC 语句花费了 13671456600 纳秒”——只有当你的 SQL 没有被批处理并且 SQL 被一个一个地执行时才会发生。
检查 docs.microsoft.com/en-us/azure/azure-sql/… 了解与 Azure SQL 批处理相关的影响和最佳实践。