将原始图像与可能不需要原始图像的已编辑图像进行比较的替代方法答案

【问题标题】：Alternative method to compare original image with edited one that may not need the original将原始图像与可能不需要原始图像的已编辑图像进行比较的替代方法
【发布时间】：2015-07-29 20:49:50
【问题描述】：

不久前我做了一个 python 脚本来将数据存储在图像中，但是它有一个小问题，我只是想知道是否有人能够想到另一种方法。

一个非常基本的想法是它会腌制一些东西，然后在第一个版本中，它直接将 ASCII 数字写为像素（因为一切都在 0 和 255 之间）。这会导致图像看起来有点像电视噪音。

在写入实际图像时，它会检测每个像素需要调整的最小位数，因此人眼不会注意到它，它会拆分数据并从其中添加或减去一些位每个像素，第一个像素存储它使用的方法。然后我将 URL 作为文件存储在图像中，并可以通过使用第一个像素中给出的规则将 URL 中的原始图像与当前图像进行比较来反转它。

一些python伪代码，以防我解释得不好：

original_image = (200, 200, 200, 100, 210, 255...)
stuff_to_store = "test"
#Convert anything into a list of bytes
data_numbers = [bin(ord(x)) for x in cPickle.dumps(stuff_to_store)]

#This is calculated by the code, but for now it's 2
bytes_per_pixel = 2
store_mode = 'subtract'

#Join the bytes and split them every 2nd character
new_bytes = "".join(data_bytes)
new_bytes_split = [new_bytes[i:i+bytes_per_pixel] for i in range(0, len(new_bytes), bytes_per_pixel)]

#Edit the pixels (by subtraction in this case)
pixel_data = []
for i in range(len(original_image)):
    pixel_data = original_image[i] - int(new_bytes_split[i])

但是，由于脚本的全部目的是通过修改像素来存储内容，因此将原始图像 URL 存储为文件感觉有点作弊。我想将 URL 存储为前几个像素，但只要图像不是灰色的，它就会以一条明显的线结束。此外，这种方式效率极低，因为它需要两张图像才能正常工作，所以如果有人知道如何避免这样做，那就太好了。

原始代码是here 如果有人感兴趣，我在学习编写文档之前就这样做了，所以有点难以弄清楚，现在就问这个，因为我打算重写它并想做得更好。

【问题讨论】：

请注意，使用简单的加法或减法来修改像素是不安全的，因为像素值可能会超出范围。为防止这种情况发生，您需要使用模运算。或者使用不同的操作，例如按位异或^。但是有一种更好的方法来处理这种steganography：将数据位存储在图像像素的最低有效位中。
我知道，我只是将其作为一个基本示例，我认为我针对不同情况做了大约 8 种不同的模式（例如拉向/推离 128）。因此，您建议不要只调整最低有效位，而是建议完全覆盖它们？听起来绝对可行:)
是的，完全覆盖 LSB。在 Python 中摆弄位并不容易或特别有效，但也不算太糟糕。 FWIW，几个月前我写了一个 Python 2 程序，它使用 PIL 进行这种类型的隐写术。我猜我可以把它作为这个问题的答案发布......
很好，我也用过 PIL，我想我是在圣诞节做的，然后停止了工作，但我很想看看你的工作原理。只是出于好奇，它如何影响大量数据的图像，比如每像素 5+ 位？我的让它看起来有点像一张 ISO 很高的照片
为了避免对图像产生明显的影响，我仅对每个颜色通道的每个像素使用 1 位。在我这个程序的第一个版本中，我只将数据存储在蓝色像素中，因为人类视觉对蓝色最不敏感。我在嵌入数据之前对其进行压缩，不仅是为了节省空间，而且因为压缩后的数据近似于白噪声，因此也降低了数据被人眼注意到的风险。

标签： python image

【解决方案1】：

这是一种将数据嵌入到每通道 8 位 RGB 图像文件中像素的每个颜色通道的最低有效位的一种方法，使用 PIL 进行图像处理。

下面的代码说明了 Python 中的比特流处理。它的效率相当高（就这样的操作可以在 Python 中是有效的而言），但它在必要时牺牲了可读性和使用简单性的效率。 :)

#! /usr/bin/env python

''' Steganography with PIL (really Pillow)

    Encodes / decodes bits of a binary data file into the LSB of each color 
    value of each pixel of a non-palette-mapped image.

    Written by PM 2Ring 2015.02.03
'''

import sys
import getopt
import struct
from PIL import Image


def readbits(bytes):
    ''' Generate single bits from bytearray '''
    r = range(7, -1, -1)
    for n in bytes:
        for m in r:
            yield (n>>m) & 1

def encode(image_bytes, mode, size, dname, oname):
    print 'Encoding...'
    with open(dname, 'rb') as dfile:
        payload = bytearray(dfile.read())

    #Prepend encoded data length to payload
    datalen = len(payload)
    print 'Data length:', datalen

    #datalen = bytearray.fromhex(u'%06x' % datalen)
    datalen = bytearray(struct.pack('>L', datalen)[1:])
    payload = datalen + payload

    databits = readbits(payload)
    for i, b in enumerate(databits):
        image_bytes[i] = (image_bytes[i] & 0xfe) | b

    img = Image.frombytes(mode, size, str(image_bytes))
    img.save(oname)


def bin8(i): 
    return bin(i)[2:].zfill(8)

bit_dict = dict((tuple(int(c) for c in bin8(i)), i) for i in xrange(256))

def decode_bytes(data):
    return [bit_dict[t] for t in zip(*[iter(c&1 for c in data)] * 8)]

def decode(image_bytes, dname):
    print 'Decoding...'
    t = decode_bytes(image_bytes[:24])
    datalen = (t[0] << 16) | (t[1] << 8) | t[2]
    print 'Data length:', datalen

    t = decode_bytes(image_bytes[24:24 + 8*datalen])

    with open(dname, 'wb') as dfile:
        dfile.write(str(bytearray(t)))


def process(iname, dname, oname):
    with Image.open(iname) as img:
        mode = img.mode
        if mode == 'P':
            raise ValueError, '%s is a palette-mapped image' % fname
        size = img.size
        image_bytes = bytearray(img.tobytes())
    #del img

    print 'Data capacity:', len(image_bytes) // 8 - 24

    if oname:
        encode(image_bytes, mode, size, dname, oname)
    elif dname:
        decode(image_bytes, dname)


def main():
    #input image filename
    iname = None
    #data filename
    dname = None
    #output image filename
    oname = None

    def usage(msg=None):
        s = msg + '\n\n' if msg else ''
        s += '''Embed data into or extract data from the low-order bits of an image file.

Usage:

%s [-h] -i input_image [-d data_file] [-o output_image]

To encode, you must specify all 3 file names.
To decode, just specify the input image and the data file names.
If only the the input image is given, its capacity will be printed,
i.e., the maximum size (in bytes) of data that it can hold.

Uses PIL (Pillow) to read and write the image data.
Do NOT use lossy image formats for output, eg JPEG, or the data WILL get scrambled.
The program will abort if the input image is palette-mapped, as such images
are not suitable.
'''
        print >>sys.stderr, s % sys.argv[0]
        raise SystemExit, msg!=None

    try:
        opts, args = getopt.getopt(sys.argv[1:], "hi:d:o:")
    except getopt.GetoptError, e:
        usage(e.msg)

    for o, a in opts:
        if o == '-h': usage(None)
        elif o == '-i': iname = a
        elif o == '-d': dname = a
        elif o == '-o': oname = a

    if iname:
        print 'Input image:', iname
    else:
        usage('No input image specified!')

    if dname:
        print 'Data file:', dname

    if oname:
        print 'Output image:', oname

    process(iname, dname, oname)


if __name__ == '__main__':
    main()

【讨论】：