【发布时间】:2015-10-07 10:25:39
【问题描述】:
我想要一些关于如何为Gene ontology (.obo)解析此文件的帮助/建议
我正在 D3 中创建一个可视化,需要创建一个 JSON 格式的“树”文件 -
{
"name": "flare",
"description": "flare",
"children": [
{
"name": "analytic",
"description": "analytics",
"children": [
{
"name": "cluster",
"description": "cluster",
"children": [
{"name": "Agglomer", "description": "AgglomerativeCluster", "size": 3938},
{"name": "Communit", "description": "CommunityStructure", "size": 3812},
{"name": "Hierarch", "description": "HierarchicalCluster", "size": 6714},
{"name": "MergeEdg", "description": "MergeEdge", "size": 743}
]
}, etc..
这种格式似乎很容易在 python 的字典中复制,每个条目有 3 个字段:名称、描述和子项[]。
我的问题实际上是如何提取数据。上面链接的文件的“对象”结构如下:
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
我需要 id、is_a 和 name 字段。我曾尝试使用 python 来解析这个,但我似乎无法找到一种方法来定位每个对象。
有什么想法吗?
【问题讨论】: