尝试使用 pydicom 裁剪图像并保存 dicom，答案

【问题标题】：Trying to crop image and save dicom with pydicom,尝试使用 pydicom 裁剪图像并保存 dicom，
【发布时间】：2021-04-01 23:49:09
【问题描述】：

尝试加载具有 JPEG2000 压缩的胸部 X 射线 DICOM 文件，提取像素阵列，裁剪它，然后另存为新的 DICOM 文件。在 Windows10 和 MacOS 机器上试过这个，但得到了类似的错误。在 conda 环境中运行 Python 3.6.13、GDCM 2.8.0、OpenJpeg 2.3.1、Pillow 8.1.2（在安装 Pillow 和 Pydicom 之前先安装 OpenJPEG 和 GDCM）。

我的初始代码：

file_list = [f.path for f in os.scandir(basepath)]
ds = pydicom.dcmread(file_list[0])
arr = ds.pixel_array
arr = arr[500:1500,500:1500]
ds.Rows = arr.shape[0]
ds.Columns = arr.shape[1]
ds.PixelData = arr.tobytes()
outputpath = os.path.join(basepath, "test.dcm")
ds.save_as(outputpath)

后续错误：ValueError: With tag (7fe0, 0010) got exception: (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required. See pydicom.encaps.encapsulate() for more information

然后我尝试将ds.PixelData 行修改为ds.PixelData = pydicom.encaps.encapsulate([arr.tobytes()])，这会创建.dcm 而不会出错，但是当我打开.dcm 进行查看时，它没有显示任何图像（全黑）。

我的下一个尝试是查看是否需要以某种方式压缩回 JPEG2000，因此我尝试了：

arr = Image.fromarray(arr)
output = io.BytesIO()
arr.save(output, format='JPEG2000')

然后我得到错误：OSError: encoder jpeg2k not available。我也试过 format='JPEG' 但它告诉我OSError: cannot write mode I;16 as JPEG ...

非常感谢任何帮助！

【问题讨论】：

忘了说我也试过原始代码（没有封装），但在pydicom.dcmread行之后添加了一行ds.decompress()，并且还能够保存一个.dcm，但最终得到又是一张空白图片。还应该提到导入使用import pydicom、from PIL import Image、import io 和import numpy as np
解压缩数据集并写回未压缩的像素数据无疑是最简单的方法 - 压缩数据集可能会很棘手。如果你得到的图像是黑色的，我猜问题出在数据或数据表示上。
我注意到，如果我使用 pydicom 的 apply_voi_lut() 函数，我能够将图像从黑色变为某些可见的图像，但是对比度/亮度混乱并且无法更改任何 DICOM 查看器中的图像对比度。也许我需要修改一些 DICOM 标签才能使其正常工作，但认为 ds.decompress() 会自动调整相关标签。认为我能够使用 from imagecodecs import jpeg2k_encode 找到解决方案，一旦我确认它按预期工作，我会回复
apply_modality_lut/apply_voi_lut() 用于自己显示 DICOM 数据，例如从图像数据创建绘图数据 - DICOM 查看器将自己执行此操作，因此他们应该获得未更改的值。

标签： python python-imaging-library dicom pydicom jpeg2000

【解决方案1】：

能够通过使用imagecodecs 库和jpeg2k_encode 函数来完成这项工作。一个潜在的陷阱是您需要 .copy() 数组以满足函数的 C 连续要求，如果需要，您可以通过运行 arr_crop.flag 来确认。这是最适合我的更新代码：

import os
import numpy as np
import matplotlib.pyplot as plt
import pydicom
from pydicom.encaps import encapsulate
from pydicom.uid import JPEG2000
from imagecodecs import jpeg2k_encode

file_list = [f.path for f in os.scandir(basepath)]
ds = pydicom.dcmread(file_list[0])
arr = ds.pixel_array
#Need to copy() to meet jpeg2k_encodes C contiguous requirement
arr_crop = arr[500:1500,500:1500].copy() 
# jpeg2k_encode to perform JPEG2000 compression
arr_jpeg2k = jpeg2k_encode(arr_crop)
# convert from bytearray to bytes before saving to PixelData
arr_jpeg2k = bytes(arr_jpeg2k)
ds.Rows = arr_crop.shape[0]
ds.Columns = arr_crop.shape[1]
ds.PixelData = encapsulate([arr_jpeg2k])
outputpath = os.path.join(basepath, "test.dcm")
ds.save_as(outputpath)

我最后还使用了interactivecrop 包来相对快速地获得我需要的作物指数（一个提示，以防未来人们在 jupyter 中尝试这个）。如果有帮助，这里有一段代码（在上述代码之前运行）：

from interactivecrop.interactivecrop import main as crop
file_names = [os.path.split(f)[1].split(".")[0] for f in file_list]
image_list = []
for x in file_list:
    ds = pydicom.dcmread(x)
    arr = ds.pixel_array
    image_list.append(arr)
crop(image_list, file_names, optimize=True)
#After cropping all images, will print a dictionary
#copied and pasted this dictionary to a new cell as crop_dict
#use the below function to convert the output to actual indices
def convert_crop_to_index(fname, crop_dict):
    x = [crop_dict[fname][1], crop_dict[fname][1] + crop_dict[fname][3]]
    y = [crop_dict[fname][0], crop_dict[fname][0] + crop_dict[fname][2]]
    return x, y
arr_crop = arr[x[0]:x[1],y[0]:y[1]].copy()

一直无法弄清楚为什么 ds.decompress() 并保存解压缩的 dicom 会生成全黑图像。我觉得这应该是最简单的方法，但上述方法最终对我有用，所以我很高兴能够弄清楚。

【讨论】：

GDCM 和包含 JP2 标头的 JPEG2K 像素数据似乎存在问题（它们不应该这样做）。另一个例子见this issue。
这很有趣，谢谢分享！不知道如何直接处理 PixelData 的字节形式。它可能与这个特定问题无关，因为我在查看/解码 jpeg2k 编码数据时没有问题，而是在写入/编码部分。