【问题标题】:CSV into MySQL using load data使用加载数据将 CSV 导入 MySQL
【发布时间】:2017-08-18 10:18:46
【问题描述】:

我希望使用 MySQL 加载文件将 CSV 的内容插入到我的表中,但是,每次我使用以下命令执行此操作时都会删除一些行

LOAD DATA INFILE 'new.csv' INTO TABLE Example
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

我希望插入的 CSV 数据是 Companies House (http://download.companieshouse.gov.uk/en_output.html) 提供的免费公司产品数据

非常感谢任何帮助。 谢谢!

【问题讨论】:

  • 嗨@PutraPurba。感谢您的建议,但是,使用 INSERT 非常慢。 CSV 有大约 4 000 000 行。谢谢!
  • @JordanFry 我正在编写一个脚本来上传这些数据 - 我明天会发布它
  • 太好了,非常感谢@Phil

标签: mysql csv insert load


【解决方案1】:

鉴于此表定义,来自提供的数据规范here

create table companieshouse
(
CompanyName varchar(160),
CompanyNumber varchar(8),
RegAddressCareOf varchar(100),
RegAddressPOBox varchar(10),
RegAddressAddressLine1 varchar(300),
RegAddressAddressLine2 varchar(300),
RegAddressPostTown varchar(50),
RegAddressCounty varchar(60),
RegAddressCountry varchar(50),
RegAddressPostCode varchar(20),
CompanyCategory varchar(100),
CompanyStatus varchar(70),
CountryOfOrigin varchar(50),
DissolutionDate date,
IncorporationDate date,
AccountsAccountRefDay integer,
AccountsAccountRefMonth integer,
AccountsNextDueDate date,
AccountsLastMadeUpDate date,
AccountsAccountCategory varchar(30),
ReturnsNextDueDate date,
ReturnsLastMadeUpDate date,
MortgagesNumMortCharges integer,
MortgagesNumMortOutstanding integer,
MortgagesNumMortPartSatisfied integer,
MortgagesNumMortSatisfied integer,
SICCodeSicText_1 varchar(170),
SICCodeSicText_2 varchar(170),
SICCodeSicText_3 varchar(170),
SICCodeSicText_4 varchar(170),
LimitedPartnershipsNumGenPartners integer,
LimitedPartnershipsNumLimPartners integer,
URI varchar(47),
PreviousName1CONDATE date,
PreviousName1CompanyName varchar(160),
PreviousName2CONDATE date,
PreviousName2CompanyName varchar(160),
PreviousName3CONDATE date,
PreviousName3CompanyName varchar(160),
PreviousName4CONDATE date,
PreviousName4CompanyName varchar(160),
PreviousName5CONDATE date,
PreviousName5CompanyName varchar(160),
PreviousName6CONDATE date, 
PreviousName6CompanyName varchar(160),
PreviousName7CONDATE date,
PreviousName7CompanyName varchar(160),
PreviousName8CONDATE date,
PreviousName8CompanyName varchar(160),
PreviousName9CONDATE date, 
PreviousName9CompanyName varchar(160),
PreviousName10CONDATE date, 
PreviousName10CompanyName varchar(160),
ConfStmtNextDueDate date, 
ConfStmtLastMadeUpDate date
);

这会将提供的 .csv 文件中的数据加载到表中;

LOAD DATA INFILE '/var/lib/mysql-files/BasicCompanyData-2017-03-06-part1_5.csv' 
INTO TABLE companieshouse 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
ESCAPED BY ""
LINES TERMINATED BY '\n'
 IGNORE 1 LINES
(
CompanyName,
CompanyNumber,
RegAddressCareOf,
RegAddressPOBox,
RegAddressAddressLine1,
RegAddressAddressLine2,
RegAddressPostTown,
RegAddressCounty,
RegAddressCountry,
RegAddressPostCode,
CompanyCategory,
CompanyStatus,
CountryOfOrigin,
@DissolutionDate,
@IncorporationDate,
@AccountsAccountRefDay,
@AccountsAccountRefMonth,
@AccountsNextDueDate,
@AccountsLastMadeUpDate,
AccountsAccountCategory,
@ReturnsNextDueDate,
@ReturnsLastMadeUpDate,
@MortgagesNumMortCharges,
@MortgagesNumMortOutstanding,
@MortgagesNumMortPartSatisfied,
@MortgagesNumMortSatisfied,
SICCodeSicText_1,
SICCodeSicText_2,
SICCodeSicText_3,
SICCodeSicText_4,
@LimitedPartnershipsNumGenPartners,
@LimitedPartnershipsNumLimPartners,
URI,
@PreviousName1CONDATE,
PreviousName1CompanyName,
@PreviousName2CONDATE,
PreviousName2CompanyName,
@PreviousName3CONDATE,
PreviousName3CompanyName,
@PreviousName4CONDATE,
PreviousName4CompanyName,
@PreviousName5CONDATE,
PreviousName5CompanyName,
@PreviousName6CONDATE, 
PreviousName6CompanyName,
@PreviousName7CONDATE,
PreviousName7CompanyName,
@PreviousName8CONDATE,
PreviousName8CompanyName,
@PreviousName9CONDATE, 
PreviousName9CompanyName,
@PreviousName10CONDATE, 
PreviousName10CompanyName,
@ConfStmtNextDueDate, 
@ConfStmtLastMadeUpDate)
SET DissolutionDate = IF(@DissolutionDate = '', NULL, STR_TO_DATE(@DissolutionDate, '%d/%m/%Y')),
IncorporationDate = IF(@IncorporationDate = '', NULL, STR_TO_DATE(@IncorporationDate, '%d/%m/%Y')),
AccountsNextDueDate = IF(@AccountsNextDueDate = '', NULL, STR_TO_DATE(@AccountsNextDueDate, '%d/%m/%Y')),
AccountsLastMadeUpDate = IF(@AccountsLastMadeUpDate = '', NULL, STR_TO_DATE(@AccountsLastMadeUpDate, '%d/%m/%Y')),
ReturnsNextDueDate = IF(@ReturnsNextDueDate = '', NULL,  STR_TO_DATE(@ReturnsNextDueDate, '%d/%m/%Y')),
ReturnsLastMadeUpDate = IF(@ReturnsLastMadeUpDate = '', NULL, STR_TO_DATE(@ReturnsLastMadeUpDate, '%d/%m/%Y')),
PreviousName1CONDATE = IF(@PreviousName1CONDATE = '', NULL, STR_TO_DATE(@PreviousName1CONDATE, '%d/%m/%Y')),
PreviousName2CONDATE = IF(@PreviousName2CONDATE = '', NULL, STR_TO_DATE(@PreviousName2CONDATE, '%d/%m/%Y')),
PreviousName3CONDATE = IF(@PreviousName3CONDATE = '', NULL, STR_TO_DATE(@PreviousName3CONDATE, '%d/%m/%Y')),
PreviousName4CONDATE = IF(@PreviousName4CONDATE = '', NULL, STR_TO_DATE(@PreviousName4CONDATE, '%d/%m/%Y')),
PreviousName5CONDATE = IF(@PreviousName5CONDATE = '', NULL, STR_TO_DATE(@PreviousName5CONDATE, '%d/%m/%Y')),
PreviousName6CONDATE = IF(@PreviousName6CONDATE = '', NULL, STR_TO_DATE(@PreviousName6CONDATE, '%d/%m/%Y')),
PreviousName7CONDATE = IF(@PreviousName7CONDATE = '', NULL, STR_TO_DATE(@PreviousName7CONDATE, '%d/%m/%Y')),
PreviousName8CONDATE = IF(@PreviousName8CONDATE = '', NULL, STR_TO_DATE(@PreviousName8CONDATE, '%d/%m/%Y')),
PreviousName9CONDATE = IF(@PreviousName9CONDATE = '', NULL, STR_TO_DATE(@PreviousName9CONDATE, '%d/%m/%Y')),
PreviousName10CONDATE = IF(@PreviousName10CONDATE = '', NULL, STR_TO_DATE(@PreviousName10CONDATE, '%d/%m/%Y')),
AccountsAccountRefDay = NULLIF(@AccountsAccountRefDay, ''),
AccountsAccountRefMonth = NULLIF(@AccountsAccountRefMonth, '') ,
MortgagesNumMortCharges = NULLIF(@MortgagesNumMortCharges, ''),
MortgagesNumMortOutstanding = NULLIF(@MortgagesNumMortOutstanding, ''),
MortgagesNumMortPartSatisfied = NULLIF(@MortgagesNumMortPartSatisfied, ''),
MortgagesNumMortSatisfied = NULLIF(@MortgagesNumMortSatisfied, ''),
LimitedPartnershipsNumGenPartners = NULLIF(@LimitedPartnershipsNumGenPartners, ''),
LimitedPartnershipsNumLimPartners = NULLIF(@LimitedPartnershipsNumLimPartners, '')
;

...加载数据时没有任何错误或警告:

Query OK, 849999 rows affected (19.43 sec)
Records: 849999  Deleted: 0  Skipped: 0  Warnings: 0

【讨论】:

  • 这绝对是完美的——我感激不尽!
  • 我可以通过用反斜杠更改每个双引号来修改插入代码,以便让他作为 PHP 脚本的一部分执行吗?由 '\"' 封闭由 \"\" 转义跨度>
【解决方案2】:

从一方面 - 加载数据输入,禁食方法,但现实生活中总是需要一些检查、转换和其他业务逻辑。

和正常做法 - 使用 ETL 工具而不是直接导入

或多阶段处理 - 清理数据(在第一步检查、记录错误、转换、添加计算列等),而不是导入最终结果。

现在有很多优秀的开源工具可以做到这一点:

转换您的导入/导出逻辑您可以实现所有步骤,例如:

【讨论】:

    猜你喜欢
    • 2014-07-04
    • 2013-10-27
    • 2020-08-06
    • 1970-01-01
    • 2013-12-21
    • 2015-02-07
    • 1970-01-01
    • 1970-01-01
    • 2012-07-11
    相关资源
    最近更新 更多