【问题标题】:How to assign new coordinates into a multiindex in xarray如何将新坐标分配给xarray中的多索引
【发布时间】:2020-05-30 11:32:11
【问题描述】:

我正在尝试为 xarray DataArray 的 multiIndex 分配新坐标。

我有一个 dataArray,它包含 2 个主要维度(“经度”、“纬度”)和一个单一的多索引(“状态”)。

这是 DataArray 结构:

print(dataArray)

<xarray.DataArray (longitude: 5000, latitude: 3000)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
  * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
    states     (latitude, longitude) float64 nan nan nan nan ... nan nan nan nan

“州”多索引仅包含整数值,我想转换它们,或添加第二个带有“命名坐标”的多索引(即:美国、意大利、德国、巴西......)。

一旦有了命名的“状态”多索引,就可以通过其专有名称轻松选择给定状态 - 从可用索引中。

下面是一个可重现的脚本。取自here

import pandas as pd
pd.set_option('display.width', 50000)
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 5000)
import geopandas
from rasterio import features
from affine import Affine
import numpy as np
import xarray as xr
from cartopy.io import shapereader



def transform_from_latlon(lat, lon):
    lat = np.asarray(lat)
    lon = np.asarray(lon)
    trans = Affine.translation(lon[0], lat[0])
    scale = Affine.scale(lon[1] - lon[0], lat[1] - lat[0])
    return trans * scale

def rasterize(shapes, coords, fill=np.nan, **kwargs):
    """Rasterize a list of (geometry, fill_value) tuples onto the given
    xray coordinates. This only works for 1d latitude and longitude
    arrays.
    """
    transform = transform_from_latlon(coords['latitude'], coords['longitude'])
    out_shape = (len(coords['latitude']), len(coords['longitude']))
    raster = features.rasterize(shapes, out_shape=out_shape,
                                fill=fill, transform=transform,
                                dtype=float, **kwargs)
    return xr.DataArray(raster, coords=coords, dims=('latitude', 'longitude'))



if '__main__' == __name__:

    # this shapefile is from natural earth data
    # http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/



    resolution = '10m'
    category = 'cultural'
    name = 'admin_0_countries'

    shpfilename = shapereader.natural_earth(resolution, category, name)

    # read the shapefile using geopandas
    states = geopandas.read_file(shpfilename)

    South_America = states[states['SUBREGION'] == 'South America'].reset_index(drop=True)


    state_ids = {k: i for i, k in enumerate(South_America['NAME_LONG'])}
    shapes = [(shape, n) for n, shape in enumerate(South_America.geometry)]

    LONGITUDE = np.linspace(-145, -15, num=5000)
    LATITUDE = np.linspace(-85, 25, num=3000)


    ds = xr.DataArray(coords=(LONGITUDE, LATITUDE), dims=['longitude', 'latitude'])


    ds['states'] = rasterize(shapes, ds.coords)


    # trying to assign new coordinates to the dimension:

    try:
        ds = ds.assign_coords(states = South_America['NAME_LONG'])
    except ValueError:
        print("message error", "cannot add coordinates with new dimensions to a DataArray")



    # ds = ds.expand_dims({'names':South_America['NAME_LONG']})  # --> this does not work

    Array = np.random.randn(LATITUDE.size, LONGITUDE.size)

    dArray_Brazil = xr.DataArray(Array, coords=(LATITUDE, LONGITUDE), dims=['latitude', 'longitude'])



    import matplotlib.pyplot as plt
    quadmash = dArray_Brazil.plot()
    ax = ds.states.where(ds.states != 'Brazil').plot(ax=quadmash.axes)

    plt.show()

理想情况下,我希望将 DataArray 结构作为以下两个选项之一:

选项 1)

<xarray.DataArray (longitude: 5000, latitude: 3000)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
  * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
    states     (latitude, longitude) string Brazil, USA Germany ...

选项 2)

 <xarray.DataArray (longitude: 5000, latitude: 3000)>
    array([[nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           ...,
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan]])
    Coordinates:
      * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
      * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
        states     (latitude, longitude) float64 nan nan nan nan ... nan nan nan nan
        Named_states     (latitude, longitude) string Brazil, USA Germany ...

【问题讨论】:

    标签: python-3.x matplotlib python-xarray


    【解决方案1】:

    这是替换坐标值的一种方法:

    temp = 15 + 8 * np.random.randn(2, 2, 3)
    precip = 10 * np.random.rand(2, 2, 3)
    lon = [[-99.83, -99.32], [-99.79, -99.23]]
    lat = [[42.25, 42.21], [42.63, 42.59]]
    states = [[1,2],[3,2]]
    
    ds = xr.Dataset({'temperature': (['x', 'y', 'time'],  temp),
            'precipitation': (['x', 'y', 'time'], precip)},
            coords={'lon': (['x', 'y'], lon),
            'lat': (['x', 'y'], lat),
            'time': pd.date_range('2014-09-06', periods=3),
            'states': (['x','y'], states)})
    
    ds
    
    <xarray.Dataset>
    Dimensions:        (time: 3, x: 2, y: 2)
    Coordinates:
        lon            (x, y) float64 -99.83 -99.32 -99.79 -99.23
        lat            (x, y) float64 42.25 42.21 42.63 42.59
      * time           (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
        states         (x, y) int64 1 2 3 2
    Dimensions without coordinates: x, y
    Data variables:
        temperature    (x, y, time) float64 1.096 19.28 16.27 ... 19.25 20.38 4.981
        precipitation  (x, y, time) float64 9.09 7.486 2.288 ... 3.639 0.6625 8.19
    
    transdict = {'1':'Brazil', '2':'Germany', '3':'USA'} # need dictionary for all mappings
    ds.states.values = ds.states.astype(str)
    
    for key, value in transdict.items():
        ds.states.values = np.where(ds.states.values == key, value, ds.states.values)
        ds
    
    <xarray.Dataset>
    Dimensions:        (time: 3, x: 2, y: 2)
    Coordinates:
        lon            (x, y) float64 -99.83 -99.32 -99.79 -99.23
        lat            (x, y) float64 42.25 42.21 42.63 42.59
      * time           (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
        states         (x, y) <U21 'Brazil' 'Germany' 'USA' 'Germany'
    Dimensions without coordinates: x, y
    Data variables:
        temperature    (x, y, time) float64 1.096 19.28 16.27 ... 19.25 20.38 4.981
        precipitation  (x, y, time) float64 9.09 7.486 2.288 ... 3.639 0.6625 8.19
    

    【讨论】:

    • 亲爱的 BWC,感谢您的回归。看起来,没有简单的方法来广播映射操作。加快速度会很棒。尽管如此,您的解决方案确实很棒。它解决了这个问题。我仍将重点介绍某种连接、合并或分配坐标的方法。
    猜你喜欢
    • 2020-04-27
    • 2021-01-14
    • 2019-01-17
    • 1970-01-01
    • 1970-01-01
    • 2018-08-17
    • 1970-01-01
    • 2018-12-02
    • 2020-06-08
    相关资源
    最近更新 更多