【问题标题】:PowerShell how add column to CSV file from another CSV file by ID?PowerShell 如何通过 ID 从另一个 CSV 文件向 CSV 文件添加列?
【发布时间】:2019-09-17 16:34:38
【问题描述】:

我有两个 CSV 文件。第一个文件可能包含不同数量的行。每行都有ID。在这种情况下 - place_id。 我想从第二个开始在这个文件中添加列。

"place_id";"osm_type";"osm_id";"place_rank";"boundingbox";"lat";"lon";"display_name";"class";"type";"importance";"icon";"postcode";"city";"town";"village";"hamlet";"allotments";"neighbourhood";"suburb";"city_district";"state_district";"building";"address100";"address26";"address27";"address29";"county";"state";"country";"country_code";"place";"population";"wikidata";"wikipedia";"name";"official_name"
"100073243";"way";"108738557";"19";"56.1330951,56.1377776,35.7857419,35.7966764";"56.1354281";"35.7903646";"Bolshoe Syrkovo, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, RF";"place";"hamlet";"0.45401456808503";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"";"Bolshoe Syrkovo";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"RF";"ru";"hamlet";"19";"Q4092451";"ru:Bolshoe Syrkovo";"Bolshoe Syrkovo";""
"100073263";"way";"108729132";"19";"56.1542386,56.156816,36.3303962,36.3383278";"56.15552975";"36.3343542260811";"Kondratovo, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, RF";"place";"hamlet";"0.385";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"";"Kondratovo";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"RF";"ru";"";"";"";"";"Kondratovo";""
"100073265";"way";"108738571";"19";"56.009293,56.0205996,36.2239313,36.2390323";"56.015194";"36.2290485";"Gryady, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, Rossiya";"place";"village";"0.36089190172262";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"Gryady";"";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"Rossiya";"ru";"village";"841";"Q4151063";"ru:Gryady (Moskovskaya oblast)";"Gryady";""

还有第二个文件。该文件包含地理坐标的完整基础。每行都有place_id 列,与第一个文件中的place_id 行匹配。 第二个文件 - 我想从geojson 列复制字符串并通过place_id 添加到第一个文件。 这个文件比第一个大。 (第一个大约 5 Mb,第二个大约 50 Mb。)

"place_id";"osm_id";"geojson"
"100059669";"111492916";"{""type"":""Polygon"",""coordinates"":[[[37.6221208,56.0629951],[37.6227846,56.0617338],[37.6235702,56.0612884],[37.6241549,56.0610708],[37.625994,56.06052],[37.627407,56.0616613],[37.6250022,56.0628003],[37.624107,56.0632933],[37.6244298,56.06364],[37.6240209,56.0640423],[37.6238138,56.0639879],[37.6238869,56.0635391],[37.6236798,56.0634711],[37.6221208,56.0629951]]]}"
"100066930";"108048163";"{""type"":""Polygon"",""coordinates"":[[[37.488797,54.9187857],[37.489145,54.9178087],[37.4916813,54.9161087],[37.4915675,54.914397],[37.4923037,54.9141008],[37.4938964,54.9139008],[37.4946333,54.9135329],[37.4950753,54.9135998],[37.4958462,54.9135723],[37.4961204,54.9133221],[37.4963465,54.9127529],[37.4976451,54.912619],[37.49836,54.9121783],[37.4984597,54.9124224],[37.4989822,54.9128384],[37.4986734,54.9131341],[37.4984106,54.9135755],[37.4984278,54.9141491],[37.4988218,54.9145627],[37.5001752,54.9148064],[37.5005392,54.9147547],[37.5005076,54.9157027],[37.5005411,54.9169758],[37.5003203,54.9183989],[37.500086,54.9191066],[37.4999331,54.919399],[37.4992204,54.9195132],[37.4991362,54.9199856],[37.4977175,54.9199433],[37.497684,54.9204933],[37.4959374,54.9204279],[37.4937625,54.9202703],[37.493187,54.9202895],[37.4925111,54.9202126],[37.4917951,54.9202741],[37.4903496,54.9202356],[37.4899949,54.920301],[37.4891785,54.9207395],[37.488884,54.9204202],[37.4888506,54.9200203],[37.488797,54.9194703],[37.488797,54.9187857]]]}"
"100073243";"108738557";"{""type"":""Polygon"",""coordinates"":[[[35.7857419,56.1346341],[35.7870207,56.1330951],[35.7960034,56.1354737],[35.7964486,56.136383],[35.7966764,56.1371216],[35.796451,56.1375923],[35.7940459,56.1377776],[35.7872053,56.1362698],[35.7860927,56.135251],[35.7857419,56.1346341]]]}"
"100073263";"108729132";"{""type"":""Polygon"",""coordinates"":[[[36.3303962,56.1556187],[36.3327609,56.1549359],[36.3332297,56.1553915],[36.3371409,56.1542386],[36.3383278,56.1554408],[36.3356724,56.1561707],[36.3352194,56.1557934],[36.3314321,56.156816],[36.3303962,56.1556187]]]}"
"100073265";"108738571";"{""type"":""Polygon"",""coordinates"":[[[36.2239313,56.0144832],[36.2261932,56.011495],[36.2284626,56.0095073],[36.2321529,56.009293],[36.2331509,56.0117168],[36.2341666,56.0135926],[36.2390323,56.0144832],[36.2385065,56.0167726],[36.2357356,56.0167906],[36.2334461,56.0197924],[36.2263531,56.0205996],[36.2251121,56.020199],[36.2239313,56.0144832]]]}"
"100075231";"110068197";"{""type"":""Polygon"",""coordinates"":[[[38.2935489,54.7729509],[38.2939625,54.7719488],[38.2950008,54.7717047],[38.2966022,54.7717389],[38.2968603,54.7712015],[38.2960165,54.7691138],[38.2982481,54.7689281],[38.3005051,54.7687673],[38.3025611,54.7678635],[38.3045996,54.7650658],[38.305887,54.7649297],[38.3081401,54.7650906],[38.3085907,54.7656105],[38.3078585,54.7664648],[38.3092498,54.7671973],[38.3097709,54.7679502],[38.3082259,54.7681977],[38.3082688,54.7688538],[38.3074105,54.7691138],[38.3073891,54.7696708],[38.3081616,54.7712304],[38.3070458,54.7719978],[38.3052433,54.7713294],[38.3036984,54.7705991],[38.3024753,54.7723691],[38.2999862,54.7725548],[38.2993425,54.7731984],[38.2958449,54.7734459],[38.2942355,54.77404],[38.2935489,54.7729509]]]}"
"100083347";"108773218";"{""type"":""Polygon"",""coordinates"":[[[37.363052,55.2929074],[37.3641893,55.2923393],[37.3680087,55.2950118],[37.3709592,55.2961602],[37.3732015,55.2966977],[37.3755511,55.2974185],[37.3748001,55.2984019],[37.3730727,55.2976933],[37.3689743,55.2966549],[37.3660883,55.2951706],[37.363052,55.2929074]]]}"
"100088132";"108787848";"{""type"":""Polygon"",""coordinates"":[[[36.3930954,56.1869244],[36.3949447,56.1858475],[36.4025567,56.1881928],[36.4037609,56.1903944],[36.4019117,56.1907295],[36.3982131,56.1894851],[36.3930954,56.1869244]]]}"
"100088151";"108787862";"{""type"":""Polygon"",""coordinates"":[[[36.4786893,56.0795892],[36.4788543,56.0782741],[36.4790085,56.0775791],[36.4790382,56.0775181],[36.4791316,56.0774071],[36.4790562,56.0772801],[36.4790339,56.0770308],[36.4814648,56.0770996],[36.48379,56.0816509],[36.4817819,56.0817197],[36.478929,56.0802664],[36.4786893,56.0795892]]]}"

我想这对于知识渊博的程序员来说并不难。我不是那样的)

我尝试了很多代码。但没有一个不适合我。 下面我将列出我尝试过的代码。我不擅长编程。而且可能有些代码的意思我没看懂。

请帮忙处理我的案子。

###
Get-ChildItem -Filter .\comb\*.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\combinedcsvs.csv -NoTypeInformation -Append
###

###
$DevData = (Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8)[1..10]
$ProdData = (Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8)[1..10]
# throw one set into a hashtable
# we can use this as a lookup table for the other set
$ProdTable = @{}
foreach($line in $ProdData){
    $ProdTable[$line.place_id] = $line.ID
}
# Output the DevData with the appropriate ProdData value
$DevData | Select-Object @{Label='DevID';Expression={$_.ID}},@{Label='ProdID';Expression={$ProdTable[$_.place_id]}},place_id | Export-Csv .\new2.csv -NoTypeInformation -Delimiter ";" -Encoding:UTF8
###

###
$f1=(Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 -header "place_id","osm_type","osm_id","place_rank","boundingbox","lat","lon","display_name","class","type","importance","icon","postcode","city","town","village","hamlet","allotments","neighbourhood","suburb","city_district","state_district","building","address100","address26","address27","address29","county","state","country","country_code","place","population","wikidata","wikipedia","name","official_name")[1..1]
$f1
$f2=(Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 -header samname,"place_id","osm_id","geojson")[1..1]
$f1|
   %{
      $geojson=$_.geojson
      $m=$f2|?{$_.geojson -eq $geojson}
      $_.place_id=$m.place_id
    }
$f1
###


###
#Make an empty hash table for the first file
$File1Values = @{}
#Import the first file and save the rows in the hash table indexed on "place_id"
Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object {
  $File1Values.Add($_.place_id, $_)
}
#Import the second file and make a custom object with properties from both files
Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object {
  [PsCustomObject]@{
    ABC = $File1Values[$_.KeyColumn].ABC;
    DEF = $File1Values[$_.KeyColumn].DEF;
    UVW = $_.UVW;
    XYZ = $_.XYZ;
  }
} | Export-Csv -Path c:\OutFile.csv
###

###
$Poproperties = @(
'worker_name',
'requester_name',
@{E={$Lookup_Hash.($_.field_834)};L='field_834'},
@{E={$Lookup_Hash.($_.field_835)};L='field_835'},
@{E={$Lookup_Hash.($_.field_836)};L='field_836'},
@{E={$Lookup_Hash.($_.field_837};L='field_837'},
@{E={$Lookup_Hash.($_.field_838)};L='field_838'}
)
Import-Csv -Path C:\S_FilePath | Select-Object -Property $Poproperties
###


###
$Lookup_Hash = Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object -Process { $_.place_id = $_.name }
$S_File = Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 | Select-Object -Property *,@{E={$Lookup_Hash.($_.place_id)};L='place_id'} | Export-Csv ".\pars_full_5_combine_geo.csv" -NoTypeInformation -Delimiter ";" -Encoding:UTF8
###

【问题讨论】:

    标签: powershell csv


    【解决方案1】:

    这是我创建的一个工作示例,它显示了可以完成此操作的一种方法

    我创建了两个 csv 文件

    file1.csv

    "id";"score"
    "1";"90"
    "3";"100"
    

    file2.csv

    "id";"firstname";"lastname"
    "1";"steve";"jobs"
    "2";"bill";"gates"
    "3";"santa";"claus"
    

    然后是我的 powershell 脚本 test.ps1

    $csv1=(import-csv file1.csv -Delimiter ";")
    $csv2=(import-csv file2.csv -Delimiter ";")
    $csv1 |
        ForEach-Object{
            $row = $_
            if($mtch = $csv2|?{$_.id -eq $row.id}){
                    $out = [pscustomobject]@{ id =  $row.id; firstname = $mtch.firstname; lastname = $mtch.lastname; score = $row.score }
                    $out
               }
         } | Export-Csv csv3.csv -NoTypeInformation
    

    这就是我运行脚本的方式(在与 csv 文件相同的目录中

    powershell -ExecutionPolicy RemoteSigned .\test.ps1
    

    这是结果,csv3.csv

    "id","firstname","lastname","score"
    "1","steve","jobs","90"
    "3","santa","claus","100"
    

    【讨论】:

    • 谢谢!这非常适用于小文件!我的文件大约 50 mb。现在已经40分钟了,过程还没有结束……我需要寻找解决方法。谢谢你对我的问题的反应!
    • 为什么不导入数据库?
    • 另外,可以将 csv1 设置为较大的文件,这样您就可以遍历较大的文件并检查较小的文件。
    • 关于导入数据库 - 这是个好问题)我认为如果我找不到可接受的选项,这将是我的决定。我知道如何在 Windows 环境中为我的任务准备所需的 csv 文件。我不使用基地。会看到...我怎样才能使 csv1 文件更大?
    • 我添加了自己的决定,这对我的任务非常有效。)一切都很好)谢谢!
    【解决方案2】:

    添加适合我的任务的代码。我将方案分为 3 个步骤。

    $FileWithOutGeom = Import-Csv ".\FileWithOutGeom.csv" -Delimiter ';' -Encoding UTF8
    
    # step 1. getting all IDs from file without coordinates - sort by ID and select place_id column values. I use join with delimiter '|' to bring data in a suitable format for next step. (for where-obgect -match)
    $ID = [string]::Join("|",( $FileWithOutGeom | sort place_id | Select-Object -ExpandProperty 'place_id'))
    
    # step 2. take second file with all coordinates and select from them only those rows which ID have in first file and sort by ID too
    $FileWithAllGeom = Import-Csv ".\FileWithAllGeom.csv" -Delimiter ';' -Encoding UTF8 | Where-Object -property place_id -Match $ID | sort place_id
    
    # step 3. take first file without geom and add-member - new column name (geojson) and values for this column from step 2 with add increment for each-object
    $FileWithOutGeom | ForEach-Object -Begin {$i = 0} {$_ | Add-Member -MemberType NoteProperty -Name 'geojson' -Value ($FileWithAllGeom)[$i++].geojson -PassThru 
    } | Export-Csv ".\CombinedFile.csv" -NoTypeInformation -Delimiter ";" -Encoding:UTF8
    

    在出口处,我在第一个文件的末尾有一个带有“geojson”列的文件。 对不起,也许是糟糕的代码。我将这段代码从网络中找到的片段组合在一起。 这个方案对我的任务来说工作得非常快。大约 50 mb 和另外 20 mb 的文件 - 在 10 秒内处理完毕。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-09-30
      • 1970-01-01
      • 2022-01-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-29
      相关资源
      最近更新 更多