【问题标题】:How to create an SSIS import raw file package?如何创建 SSIS 导入原始文件包?
【发布时间】:2011-11-08 03:20:39
【问题描述】:

我对 SSIS 很陌生。我正在使用 SSIS 2008。我看到有许多工具可以执行与某些 SQL 运算符相同的功能。什么时候应该使用 SSIS 工具与 TSQL 运算符?另外,这里有关于更有效解决方案的任何建议吗?

下面是我从 SSIS 导入/导出向导中选择的 tsql 查询。因此,我目前的解决方案除了一个数据流源和一个数据流目的地外,不使用任何 SSIS 工具。

SELECT
   en.uniqueid_c AS enrollment_id,
   CONVERT(nvarchar (20),c.clientcode_c) AS client_id, --Legacy CDT# (note this has to be the same value as client_id on the other tables)
   CONVERT(nvarchar (20),CASE --Program codes included here for enrollment data (Excludes enrollments with modifiers)
                         WHEN en.agency_c = 'ADO' THEN 'ADO'
                         WHEN en.agency_c = 'ADOT' THEN 'ADO'
                         WHEN en.agency_c in ('MRDD/IHS','MRDD/PSH','MRDD/REP','MRDD/RTC') THEN 'CHOICES'
                         WHEN en.agency_c = 'FPP' THEN 'CM' 
                         WHEN en.agency_c = 'CMGT' THEN 'CM'
                         WHEN en.agency_c = 'EDU' THEN 'EDU'
                         WHEN en.agency_c = 'COM' THEN 'GH'
                         WHEN en.agency_c = 'CSUP' THEN 'INT'
                         WHEN en.agency_c = 'IHS' THEN 'INT'
                         WHEN en.agency_c = 'IHST' THEN 'INT'
                         WHEN en.agency_c = 'MST' THEN 'MST'
                          WHEN en.agency_c = 'OMHS' THEN 'OMHS'
                        WHEN en.agency_c = 'ORTC' THEN 'RESA' 
                         WHEN en.agency_c = 'MRTC' THEN 'RESA' 
                         WHEN en.agency_c = 'RTC' THEN 'RESA'
                         WHEN en.agency_c = 'RFC' THEN 'RFC'
                         WHEN en.agency_c in ('SCCR','SCMN','SCRP') THEN 'SCS'
                         WHEN en.agency_c = 'SUB' THEN  'SUB' --uncertain about this one KMH - 06/23/10
                         WHEN en.agency_c = 'STFC' THEN 'TFC'
                         WHEN en.agency_c = 'MTFC' THEN 'TFC' 
                         WHEN en.agency_c = 'TFC' THEN 'TFC'
                         WHEN en.agency_c = 'TL' THEN 'TL'
                         WHEN en.agency_c = 'TLT' THEN 'TL'
                         ELSE en.agency_c
                         END) AS program_code,
-------------------------------------------------------------------------------------------------------------------------------------------                      
   --2nd Program_code entry handles program_modifier_code.
   --The codes need to be grouped and cased out to match the Evolv codes. --This was fixed.
   -- NOTE!!!! The codes below will need to be replaced with the finance modifiers just for TN. --per deneen and diane.
   UPPER(CONVERT(nvarchar(20), CASE --PROGRAM MODIFIERS --This is pulled into the program_code in the 2nd run. These should exclude non-modifiers.
                         WHEN en.agency_c in ('ADO','ADOT','IHST','TLT') THEN 'TRANS'
                         WHEN en.agency_c in ('CHOM','CGMT','COM','CSUP','EDU','IHS','LHS','MRTC','MST','MTFC','RTC', --use this to exclude recs
                                              'SCCR','SCRP','SUB','TFC','TL','ZADMIN','ZAWOL','ZDET','ZHOSP') THEN en.agency_c
                         WHEN en.agency_c = 'FPP' THEN 'FPP' 
                         WHEN en.agency_c = 'RFC' THEN 'RFC'
                         WHEN en.agency_c = 'TFC' THEN 'TFC'                    
                         WHEN en.agency_c = 'MRDD/IHS' THEN 'MRIHS'
                         WHEN en.agency_c = 'MRDD/PSH' THEN 'MRPSH'
                         WHEN en.agency_c = 'MRDD/REP' THEN 'MRREP'
                         WHEN en.agency_c = 'MRDD/RTC' THEN 'MRRTC'
                         WHEN en.agency_c = 'PD40' THEN 'PD40'
                         WHEN en.agency_c = 'PDET' THEN 'PDET'
                         WHEN en.agency_c = 'PINT' THEN 'PINT'
                         WHEN en.agency_c = 'PLV4' THEN 'PLV4'
                         WHEN en.agency_c = 'PWIL' THEN 'PWIL'
                         WHEN en.agency_c = 'PYDC' THEN 'PYDC'
                         WHEN en.agency_c = 'SCMN' THEN 'SCMN'
                         WHEN en.agency_c = 'STFC' THEN 'STFC'
                         ELSE en.agency_c                      
                         END)) AS program_modifier_code,
 -------------------------------------------------------------------------------------------------------------------------------------------  
 /*
 Group Homes and Inner Harbour locations were added on 7/26/10 - KMH
 */                   
   CONVERT(nvarchar(20),
   CASE
       WHEN en.location_c = 'ANNI' THEN 'AL-ANNI' 
       WHEN en.location_c = 'ASHE' THEN 'NC-ASHE'
       WHEN en.location_c = 'ATL' THEN 'GA-ATL'
       WHEN en.location_c = 'RMBT' THEN 'RTC-TN-BC'       
       WHEN en.location_c = 'BIL' THEN 'MS-BIL'
       WHEN en.location_c = 'BIRM' THEN 'AL-BIRM'
       WHEN en.location_c = 'BOST' THEN 'MA-BOST'
       WHEN en.location_c = 'CIRT' THEN 'RTC-TN-CIRT'
       WHEN en.location_c = 'CHAR' THEN 'NC-CHAR'
       WHEN en.location_c = 'CHAT' THEN 'TN-CHAT'
       WHEN en.location_c = 'CLAR' THEN 'TN-CHAR'
       WHEN en.location_c = 'COL' THEN 'TN-COL'
       WHEN en.location_c = 'CMS' THEN 'MS-COL'
       WHEN en.location_c = 'CCRD' THEN 'NC-CCRD'
       WHEN en.location_c = 'COOK' THEN 'TN-COOK'
       WHEN en.location_c = 'DAL' THEN 'TX-DAL'
       WHEN en.location_c = 'RDV' THEN 'RTC-TN-DV'
       WHEN en.location_c = 'DKSN' THEN 'TN-DKSN'
       WHEN en.location_c = 'RDW' THEN 'RTC-TN-DW'
       WHEN en.location_c = 'DOTH' THEN 'AL-DOTH'
       WHEN en.location_c = 'DUR' THEN 'NC-DURH'
       WHEN en.location_c = 'DYER' THEN 'TN-DYER'
       WHEN en.location_c = 'FAYE' THEN 'NC-FAYE'
       WHEN en.location_c = 'GCRT' THEN 'RTC-TN-GCRT'
       WHEN en.location_c = 'GRNB' THEN 'NC-GRNB'
       WHEN en.location_c = 'GRNV' THEN 'NC-GRNV'
       WHEN en.location_c = 'HMS' THEN 'MS-HMS'
       WHEN en.location_c = 'DMS' THEN 'MD-DMS'
       WHEN en.location_c = 'HICK' THEN 'NC-HICK'
       WHEN en.location_c = 'HILL' THEN 'NC-HILL'
       WHEN en.location_c = 'HUNT' THEN 'AL-HUNT'
       WHEN en.location_c = 'INNH' THEN 'RTC-GA-INNH'
       WHEN en.location_c = 'JMS' THEN 'MS-JMS'
       WHEN en.location_c = 'JTN' THEN 'TN-JTN'
       WHEN en.location_c = 'JCTN' THEN 'TN-JCTN'
       WHEN en.location_c = 'KNOX' THEN 'TN-KNOX'
       WHEN en.location_c = 'LAKE' THEN 'FL-LAKE'
       WHEN en.location_c = 'LAWR' THEN 'MA-LAWR'
       WHEN en.location_c = 'MANC' THEN 'NH-MANC'
       WHEN en.location_c = 'MCB' THEN 'MS-MCC'
       WHEN en.location_c = 'MEM' THEN 'TN-MEM'
       WHEN en.location_c = 'MMS' THEN 'MS-MMS'
       WHEN en.location_c = 'MIAM' THEN 'FL-MIAM'
       WHEN en.location_c = 'MIDM' THEN 'TN-MIDM'
       WHEN en.location_c = 'MOBI' THEN 'AL-MOBI'
       WHEN en.location_c = 'MONT' THEN 'AL-MONT'
       WHEN en.location_c = 'MRSN' THEN 'TN-MRSN'
       WHEN en.location_c = 'NASH' THEN 'TN-NASH'
       WHEN en.location_c = 'OCAL' THEN 'FL-OCAL'
       WHEN en.location_c = 'PAR' THEN 'TN-PAR'
       WHEN en.location_c = 'PINE' THEN 'NC-PINE'
       WHEN en.location_c = 'ROAN' THEN 'VA-ROAN'
       WHEN en.location_c = 'SPRG' THEN 'MA-SPRI/HOLY'
       WHEN en.location_c = 'PETE' THEN 'FL-STPET'
       WHEN en.location_c = 'TAMP' THEN 'FL-TAMP'
       WHEN en.location_c = 'TUP' THEN 'MS-TUP'
       WHEN en.location_c = 'WDC' THEN 'DC-WDC'
       WHEN en.location_c = 'WILM' THEN 'NC-WILM'
       WHEN en.location_c = 'WBRN' THEN 'MA-WBRN'
       WHEN en.location_c = 'WORC' THEN 'MA-WORC'
       WHEN en.location_c = 'GM' THEN 'TN-GM'
       WHEN en.location_c = 'GN' THEN 'TN-GN'
   ELSE en.location_c
   END)
       as service_facility_code,
   en.startdate_d AS start_date,
   en.enddate_d AS end_date,
   c.refdate_d AS referral_date,
   ep.enddate_d AS overall_discharge_date, --Episode end date
   CONVERT(nvarchar(20),c.altclientcode_vc) AS org_id,-- TNKIDS#
   UPPER(CONVERT(nvarchar(50), CASE
                         WHEN en.enddate_d = ep.enddate_d THEN ep.accountnumber_vc
                         WHEN en.enddate_d < ep.enddate_d THEN 'TWA'
                         END)) AS discharged_to_type,
   UPPER(CONVERT (nvarchar(20), CASE
                         WHEN ep.accountnumber_vc in ('DORM','INDEP/SUP','INDEP/SELF','INDEP/NR','INDEP/FR') THEN 07
                         WHEN ep.accountnumber_vc in ('JAIL','DET') THEN 01
                         WHEN ep.accountnumber_vc in ('BIOL') THEN 02
                         WHEN ep.accountnumber_vc in ('ADOPT/DCS','ADOPT/PAR','ADOPT/YV') THEN 06
                         WHEN ep.accountnumber_vc in ('REL') THEN 03
                         WHEN ep.accountnumber_vc in ('PSYCH','EMER','RTC') THEN 04
                         ELSE 99
                         END)) AS discharged_to_type_code,
   CONVERT(nvarchar(300),'cd.enrollments') AS original_table_name,
   CONVERT(nvarchar (400), en.alerts_vc) AS remarks,
   CONVERT(varchar(50),  CASE 
                         WHEN en.disreason_c = 'ADMI' THEN 'Administrative'
                         WHEN en.disreason_c = 'AMA' THEN 'Against Medical Advice'
                         WHEN en.disreason_c = 'AWOL' THEN 'Absent Without Leave'
                         WHEN en.disreason_c = 'DCSD' THEN 'Deceased'
                         WHEN en.disreason_c = 'JC' THEN 'Juvenille Court'
                         WHEN en.disreason_c = 'NP' THEN 'No Progress'
                         WHEN en.disreason_c = 'TMED' THEN 'Transfer to Medical Treatment Facility'
                         WHEN en.disreason_c = 'TPSY' THEN 'Transfer to Inpatient Psychiatric Facility'
                         WHEN en.disreason_c = 'TW' THEN 'Transfer within Agency'
                         WHEN en.disreason_c = 'WMA' THEN 'With Medical Advice'
                         ELSE 'Other'
                         END)AS outcome,
   CONVERT(varchar(5),   CASE
                         WHEN en.disreason_c in ('ADMI','AMA','AWOL','NP') THEN 'CBT'
                         WHEN en.disreason_c in ('DCSD','WMA') THEN 'DLR'
                         WHEN en.disreason_c in ('JC') THEN 'RSF'
                         WHEN en.disreason_c in ('TMED','TPSY') THEN 'DMR'
                         WHEN en.disreason_c in ('TW') THEN 'RPA'
                         ELSE 'CBT' 
                         END) AS outcome_code,
--Populate service_facility_unit table and add case statement for loading CDT program_c into client_enrollment room_number 7/27/10 KMH
   UPPER(CONVERT(varchar(10),  CASE
                         WHEN en.program_c = 'BT1L' THEN 'BC1L'
                         WHEN en.program_c = 'BT1R' THEN 'BC1R'
                         WHEN en.program_c = 'BT2L' THEN 'BC2L'
                         WHEN en.program_c = 'BT2R' THEN 'BC2R'
                         WHEN en.program_c = 'BT3' THEN 'BC3'
                         WHEN en.program_c = 'BT3L' THEN 'BC3L'
                         WHEN en.program_c = 'BT3R' THEN 'BC3R'
                         WHEN en.program_c = 'BT4L' THEN 'BC4L'
                         WHEN en.program_c = 'BT4R' THEN 'BC4R'
                         WHEN en.program_c = 'BT5' THEN 'BC5'
                         WHEN en.program_c = 'BT6' THEN 'BC6'
                         WHEN en.program_c = 'CRT1' and en.location_c = 'CIRT' THEN 'BCRT1'
                         WHEN en.program_c = 'CRT2' and en.location_c = 'CIRT' THEN 'BCRT2'
                         WHEN en.program_c = 'CRT3' and en.location_c = 'CIRT' THEN 'BCRT3'
                         WHEN en.program_c = 'CRT4' and en.location_c = 'CIRT' THEN 'BCRT4'
                         WHEN en.program_c = 'CRT1' and en.location_c = 'GCRT' THEN 'GCRT1'
                         WHEN en.program_c = 'CRT2' and en.location_c = 'GCRT' THEN 'GCRT2'
                         WHEN en.program_c = 'CRT3' and en.location_c = 'GCRT' THEN 'GCRT3'
                         WHEN en.program_c = 'CRT4' and en.location_c = 'GCRT' THEN 'GCRT4'
                         WHEN en.program_c = 'DVC' THEN 'DV1'
                         WHEN en.program_c = 'DVM' THEN 'DV2'
                         WHEN en.program_c = 'DVN' THEN 'DV3'
                         WHEN en.program_c = 'DVP' THEN 'DV4'
                         WHEN en.program_c in ('DW1','DW2','DW3','DW4','DW5','DW6','DW7','DW8') and en.location_c = 'RDW' THEN en.program_c
                         WHEN en.program_c in ('IH01','IH02','IH03','IH04','IH05','IH06','IH07') and en.location_c = 'INNH' THEN 'IH3'
                         WHEN en.program_c = 'IH08' and en.location_c = 'INNH' THEN 'IH1'
                         WHEN en.program_c = 'IH09' and en.location_c = 'INNH' THEN 'IH2'
                         ELSE 'NA'                       
                         END)) as room_number

FROM
   ar.client c 
   INNER JOIN cd.enrollments en ON (c.uniqueid_c = en.clientid_c)
   INNER JOIN cd.episode ep ON (ep.uniqueid_c = en.episodeid_c and ep.clientid_c = c.uniqueid_c)

WHERE
   (ep.enddate_d is NULL OR ep.enddate_d >= getdate()-730) and
   en.location_c in (select code from dbo.yv_LKUP_OfficeLocation where state in ('TX', 'FL'))
order by 2

【问题讨论】:

  • 问题标题中的原始文件是什么意思?此时,您没有在包中使用原始文件。另外,您可以更准确地编辑问题吗?您要解决的问题是什么;到目前为止你做了什么;您遇到了哪些错误/障碍?
  • 谢谢。原始文件 = 源文件。你对我的问题有什么不明白的地方?我只是想弄清楚在什么情况下使用 SSIS 函数与 TSQL?我刚刚尝试了上述 TSQL 方法。而且由于这听起来比尝试在 SSIS 中重现相同的逻辑更简单、更有效,因此我可能会改用 TSQL 解决方案。
  • 在 SSIS 工具包中,我们有 Raw File DestinationRaw File Source。因此,这个问题的标题让我相信除了问题中的其他要点之外,您还遇到了原始文件 SSIS 组件的问题
  • MSDN forums 上链接到 OP 的问题,以防出现任何有趣的答案。

标签: tsql ssis


【解决方案1】:

在我等待需求澄清期间提供通用 SSIS 建议。

什么时候应该使用 SSIS 工具与 TSQL 操作符

人们经常倾向于使用开箱即用的转换,因为它看起来是正确的做法。在下拉列表中选择表,添加排序,添加另一个数据源,也排序,合并连接,也许是聚合。

当问题域很小时,比如几万到几十万,处理上的差异可以忽略不计。如果一个包在 2 分钟内运行而不是 1 分钟,或者在处理过程中消耗了 80% 的服务器内存与 40% 的内存,人们可能不会注意到。

但是,当数据量达到临界点时,糟糕的包设计决策会吃掉你的午餐。

排序

当您的源 RDBMS 有对排序数据的请求时,数据库中可能存在聚集索引或其他东西,可以节省实际排序数据的时间。当 SSIS 收到对排序数据的请求时,您将为该操作支付很多倍的费用。

SSIS 中的排序是完全阻塞的异步操作。这意味着流经该点的所有数据都必须到达该转换,在它可以被发送到下游之前进行操作。有一个 bajillion 行通过或一个非常慢的源,当它遇到这些操作之一时,你会真正注意到它。也许您会说,我可以等待,因为我确实需要对数据进行排序,但时间并不是您花费的唯一资源。由于异步转换需要将数据从一个缓冲区复制到另一个缓冲区,因此您的内存需求也会增加一倍。

也许您仍然接受为便于使用 OOB 项目而付出的时间和内存使用成本,但您可能还没有完成支付。你的服务器有 32GB 的内存,SSIS 可以使用它们。每行花费一千字节,并且您有 16M 行数据流过您的数据流。它达到排序并且数据开始堆积。最后一行到达后,您已经为原始数据消耗了 16GB 的内存。排序操作开始排序,它将 16GB 复制到另一个 16GB 内存,哎呀,SSIS 内存不足。您现在支付临时文件存储的第三个价格。当执行引擎面临内存压力时,它最终会开始分页到磁盘。一旦发生这种情况,如果您关心性能,那么游戏就结束了,但您的痛苦可能并非如此。如果您没有为每个数据流设置 BlobTempStoragePath 值,则该文件将被写入默认临时存储位置,可能是 C:\something 或其他。您的系统管理员切出了一个非常精简的 C 分区,因为只有操作系统在那里运行,因此突然将 16GB 交换文件写入该驱动器会消耗所有可用空间,然后操作系统会变得不开心,包失败并且指责开始。 我没去过那里

故事的寓意

尽可能在源系统中做所有事情。上述场景适用于排序,但该课程适用于所有“共享”运算符。

这里有关于更有效解决方案的任何建议吗?

至于如何清理查询,这些映射会让我发疯。您是否有机会创建 N 个查找表(或内联表值函数)来提供存储值和呈现值之间的映射?然后,您可以抽象出所有案例逻辑。

参考

最后,这篇文章中的数字惊人地依赖于硬件和工作负载

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-08-25
    • 2023-03-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多