【问题标题】:How can I input specific arguments to argparse?如何向 argparse 输入特定参数?
【发布时间】:2020-05-08 03:46:02
【问题描述】:

我在网上找到了可以解析特定类型文本文件的代码,如下所示:

# RELION; version 3.0

data_images

loop_ 
_rlnCoordinateX #1 
_rlnCoordinateY #2 
_rlnHelicalTubeID #3 
_rlnAngleTiltPrior #4 
_rlnAnglePsiPrior #5 
_rlnHelicalTrackLength #6 
_rlnAnglePsiFlipRatio #7 
_rlnImageName #8 
_rlnMicrographName #9 
_rlnMagnification #10 
_rlnDetectorPixelSize #11 
_rlnCtfMaxResolution #12 
_rlnCtfFigureOfMerit #13 
_rlnVoltage #14 
_rlnDefocusU #15 
_rlnDefocusV #16 
_rlnDefocusAngle #17 
_rlnSphericalAberration #18 
_rlnCtfBfactor #19 
_rlnCtfScalefactor #20 
_rlnPhaseShift #21 
_rlnAmplitudeContrast #22 
_rlnOriginX #23 
_rlnOriginY #24 
 3041.398896  3692.419723            1    90.000000    63.534898     0.000000     0.500000 000001@Extract/job011/Movies/Microtubules_02563.mrcs MotionCorr/job003/Movies/Microtubules_02563.mrc 10000.000000     5.480000     5.830000     0.124704   300.000000  7457.819824  6964.129883    33.520000     2.700000     0.000000     1.000000     0.000000     0.100000     0.031176 2.475269e-32 
 3068.235643  3638.511334            1    90.000000    63.534898    60.218978     0.500000 000002@Extract/job011/Movies/Microtubules_02563.mrcs MotionCorr/job003/Movies/Microtubules_02563.mrc 10000.000000     5.480000     5.830000     0.124704   300.000000  7457.819824  6964.129883    33.520000     2.700000     0.000000     1.000000     0.000000     0.100000     0.000000     0.000000 
 3095.072390  3584.602946            1    90.000000    63.534898   120.437956     0.500000 000003@Extract/job011/Movies/Microtubules_02563.mrcs MotionCorr/job003/Movies/Microtubules_02563.mrc 10000.000000     5.480000     5.830000     0.124704   300.000000  7457.819824  6964.129883    33.520000     2.700000     0.000000     1.000000     0.000000     0.100000     0.000000     0.000000 
 3121.909136  3530.694558            1    90.000000    63.534898   180.656934     0.500000 000004@Extract/job011/Movies/Microtubules_02563.mrcs MotionCorr/job003/Movies/Microtubules_02563.mrc 10000.000000     5.480000     5.830000     0.124704   300.000000  7457.819824  6964.129883    33.520000     2.700000     0.000000     1.000000     0.000000     0.100000     0.000000     0.000000 

代码(两个类和最后几行调用它们):

import os
import sys
import argparse
from collections import OrderedDict, namedtuple


class Column:
    def __init__(self, name, type=None):
        self._name = name
        # Get the type from the LABELS dict, assume str by default
        self._type = type

    def __str__(self):
        return self._name

    def __cmp__(self, other):
        return self._name == str(other)


class Table:
    """
    Class to hold and manipulate tabular data for EM processing programs.
    """
    def __init__(self, **kwargs):
        self.clear()

        if 'fileName' in kwargs:
            if 'columns' in kwargs:
                raise Exception("Please provide either 'columns' or 'fileName',"
                                " but not both.")
            fileName = kwargs.get('fileName')
            tableName = kwargs.get('tableName', None)
            self.read(fileName, tableName)
        elif 'columns' in kwargs:
            self._createColums(kwargs['columns'])

    def clear(self):
        self.Row = None
        self._columns = OrderedDict()
        self._rows = []

    def clearRows(self):
        """ Remove all the rows from the table, but keep its columns. """
        self._rows = []

    def addRow(self, *args, **kwargs):
        self._rows.append(self.Row(*args, **kwargs))

    def readStar(self, inputFile, tableName=None):
        """
        :param inputFile: Provide the input file from where to read the data.
            The file pointer will be moved until the last data line of the
            requested table.
        :return:
        """
        self.clear()
        dataStr = 'data_%s' % (tableName or '')

        self._findDataLine(inputFile, dataStr)

        # Find first column line and parse all columns
        line, foundLoop = self._findLabelLine(inputFile)
        colNames = []
        values = []

        while line.startswith('_'):
            parts = line.split()
            colNames.append(parts[0][1:])
            if not foundLoop:
                values.append(parts[1])
            line = inputFile.readline().strip()

        self._createColums(colNames)

        if not foundLoop:
            self.addRow(*values)
        else:
            # Parse all data lines
            while line:
                self.addRow(*line.split())
                line = inputFile.readline().strip()

    def read(self, fileName, tableName=None):
        with open(fileName) as f:
            self.readStar(f, tableName)

    def writeStar(self, outputFile, tableName=None, singleRow=False):
        """
        Write a Table in Star format to the given file.
        :param outputFile: File handler that should be already opened and
            in the position to write.
        :param tableName: The name of the table to write.
        :param singleRow: If True, don't write loop_, just label - value pairs.
        """
        outputFile.write("\ndata_%s\n\n" % (tableName or ''))

        if self.size() == 0:
            return

        if singleRow:
            m = max([len(c) for c in self._columns.keys()]) + 5
            lineFormat = "_{:<%d} {:>10}\n" % m
            row = self._rows[0]
            for col, value in row._asdict().iteritems():
                outputFile.write(lineFormat.format(col, value))
            outputFile.write('\n\n')
            return

        outputFile.write("loop_\n")

        # Write column names
        for col in self._columns.values():
            outputFile.write("_%s \n" % col)

        # Take a hint for the columns width from the first row

        widths = [len(str(v)) for v in self._rows[0]]
        # Check middle and last row, just in case ;)
        for index in [len(self)//2, -1]:
            for i, v in enumerate(self._rows[index]):
                w = len(str(v))
                if w > widths[i]:
                    widths[i] = w

        lineFormat = " ".join("{:>%d} " % (w + 1) for w in widths)

        # Write data rows
        for row in self._rows:
            outputFile.write(lineFormat.format(*row))
            outputFile.write('\n')

        outputFile.write('\n')

    def write(self, output_star, tableName=None):
        with open(output_star, 'w') as output_file:
            self.writeStar(output_file, tableName)

    def printStar(self, tableName=None):
        self.writeStar(sys.stdout, tableName)

    def size(self):
        return len(self._rows)

    def getColumns(self):
        return self._columns.values()

    def getColumnValues(self, colName):
        """
        Return the values of a given column
        :param colName: The name of an existing column to retrieve values.
        :return: A list with all values of that column.
        """
        if colName not in self._columns:
            raise Exception("Not existing column: %s" % colName)
        return [getattr(row, colName) for row in self._rows]

    def __len__(self):
        return self.size()

    def __iter__(self):
        for item in self._rows:
            yield item

    def __getitem__(self, item):
        return self._rows[item]

    def __setitem__(self, key, value):
        self._rows[key] = value

    # --------- Internal implementation methods ------------------------

    def _addColumn(self, nameOrTuple):
        """
        :param nameOrTuple: This parameter should be either a string or
            a tuple (string, type).
        """
        if isinstance(nameOrTuple, str):
            col = Column(nameOrTuple)
        elif isinstance(nameOrTuple, tuple):
            col = Column(nameOrTuple[0], nameOrTuple[1])
        else:
            raise Exception("Invalid input as column, "
                            "should be either string or tuple.")
        self._columns[str(col)] = col

    def _createColums(self, columnList):
        self.clear()
        for col in columnList:
            self._addColumn(col)
        self._createRowClass()

    def _createRowClass(self):
        self.Row = namedtuple('Row', [str(c) for c in self._columns])

    def _findDataLine(self, inputFile, dataStr):
        """ Raise an exception if the desired data string is not found.
        Move the line pointer after the desired line if found.
        """
        line = inputFile.readline()
        while line:
            if line.startswith(dataStr):
                return line
            line = inputFile.readline()

        raise Exception("%s block was not found")

    def _findLabelLine(self, inputFile):
        line = ''
        foundLoop = False

        l = inputFile.readline()
        while l:
            if l.startswith('_'):
                line = l
                break
            elif l.startswith('loop_'):
                foundLoop = True
            l = inputFile.readline()

        return line.strip(), foundLoop




if __name__ == '__main__':

    parser = argparse.ArgumentParser(
        description="Script to manipulate metadata files.")

    add = parser.add_argument  # shortcut
    add("input", help="Input metadata filename. ", nargs='?', default="")
    add("output",
        help="Output metadata filename, if no provided, print to stdout. ",
        nargs='?', default="")

    add("-l", "--limit", type=int, default=0,
        help="Limit the number of rows processed, useful for testing. ")

    #add("-v", "--verbosity", action="count", default=0)

    args = parser.parse_args()

    if '@' in args.input:
        tableName, fileName = args.input.split('@')
    else:
        tableName, fileName = None, args.input

    if not os.path.exists(fileName):
        raise Exception("Input file '%s' does not exists. " % fileName)

    tableIn = Table(fileName=fileName, tableName=tableName)




    # Create another table with same columns
    tableOut = Table(columns=[str(c) for c in tableIn.getColumns()])

    limit = args.limit

    for i, row in enumerate(tableIn):
        if limit > 0 and i == limit:
            break

        tableOut.addRow(*row)

    if args.output:
        tableOut.write(args.output, tableName)
    else:
        tableOut.printStar(tableName)

问题和几个问题:我想用这段代码来改变某个列的值并重写上面的文本文件(Ps这是一个简单的python脚本,但我想用面向对象来完成这个任务上面的代码)。

我还有一些关于这段代码的问题。我提供给这个脚本的唯一参数是文本文件,但是如果 args.output:tableOut.write(args.output, tableName) 我在这里看到这一行,这意味着其他参数是可能的,这将导致重写文件,如果您只是在上面的文本文件上运行它,而不是仅仅打印它,因为代码会执行。此外,如果 args.input: 中的“@”,这个条件的目的是什么,为什么我要在这个脚本的参数中输入 @?! (代码不附带文档)

我对解决方案的尝试(因为我不经常练习面向对象的python):我想也许我可以将此函数添加到 Table 类中,然后将以下行添加到最终脚本中以更改某个列值到用户的输入值。

    def _changeColumnsValue(self, colName, colValue):
        if colName not in self._columns:
            raise Exception("Not existing column: %s" % colName)
        for row in self._rows:
            setattr(row, colName, colValue) 


    change_value  = input('Which column would you like to change, and to what value? Example input: _rlnVoltage 200')
    tableOut._changeColumnsValue(change_value.split()[0], change_value.split()[1])

【问题讨论】:

标签: python-3.x oop argparse


【解决方案1】:

parser的帮助(可以用'-h'获得):

In [3]: parser.print_help()                                                                      
usage: ipython3 [-h] [-l LIMIT] [input] [output]

Script to manipulate metadata files.

positional arguments:
  input                 Input metadata filename.
  output                Output metadata filename, if no provided, print to
                        stdout.

optional arguments:
  -h, --help            show this help message and exit
  -l LIMIT, --limit LIMIT
                        Limit the number of rows processed, useful for
                        testing.

如果没有参数,args.inputargs.output 都会得到它们的默认值,一个空字符串。

在一个字符串中,它被分配给input;如果有两个字符串,则将第二个分配给output

argparse 部分就是这样。

至于@ 显然你可以指定两个字符串:

tableName@fileName
filename

我猜如果您不提供 input 字符串,您会收到 Input file '%s' does not exists 错误。

这些的含义取决于Table函数。

if args.output:
    tableOut.write(args.output, tableName)
else:
    tableOut.printStar(tableName)

如果output不是默认值“”,则将该字符串传递给write,否则使用printStar

我可以想象将change_value 参数添加到parser

add('--change_value', nargs=2, help='column and value that you want to change')

以后

if args.change_value is not None:
      col_name, col_value = args.change_value
      tableOut._changeColumnsValue(col_name, col_value)

【讨论】:

  • 谢谢,我会试试这个。我还有另一个问题。由于不是很清楚这个脚本的目的是什么,我想知道是否可以在输入文件上调用一些 Table 方法来从这个脚本中获得一些有用的输出。我假设它会像你解释的那样通过 argparse ,但是还有另一种方法吗?如果可能的话,你能否再举一个使用 argparse 的例子。
  • 如果你 import 模块它将加载两个类,ColumnTable。然后可以从您自己的脚本中调用Table,就像在使用argparse 的部分中所做的那样。事实上,为了探索代码,我会打开一个交互式 python 会话,导入模块,并尝试调用 Table 等。argparse 部分只是指定这些文件和表名的一种方式。我还没弄清楚Table 做了什么。
猜你喜欢
  • 2012-07-12
  • 2013-11-26
  • 2014-10-17
  • 2022-01-26
  • 2019-03-12
  • 2013-03-28
  • 2014-01-30
  • 2023-01-01
  • 1970-01-01
相关资源
最近更新 更多