What is Long-Read Sequencing?

Long-read sequencing, also called third-generation sequencing, is a DNA sequencing technique currently being researched which can determine the nucleotide sequence of long sequences of DNA between 10,000 and 100,000 base pairs at a time. This removes the need to cut up and then amplify DNA which is normally required in other DNA sequencing techniques.

长读测序，又称第三代测序，是目前正在研究的一种DNA测序技术，可以同时测定1万到10万个碱基对之间的DNA长序列的核苷酸序列。
这就不需要切割然后扩增DNA，而这在其他DNA测序技术中通常是必需的。

Image Credits: Gio.tto / Shutterstock.com

History of DNA sequencing

One of the most basic forms of DNA sequencing is Sanger sequencing. This method can sequence relatively small fragments of DNA of up to about 900 base pairs. Fragments of DNA are replicated many times, all of varying lengths and all with a fluorescent tag on one end. These tagged fragments can be mapped out to determine the exact sequence of the original DNA.

The more modern forms of DNA sequencing are called next-generation sequencing. These techniques are faster, cheaper and can much more efficiently determine long DNA sequences compared to Sanger sequencing. This is achieved through high-throughput analysis of many different DNA fragments at once.

These DNA fragments tend to range from 50-700 base pairs in length, but the techniques used can determine DNA sequences made up of millions of base pairs.

Long-read sequencing, sometimes also called third-generation sequencing, is a very recent DNA sequencing technique that can read the DNA sequence of much longer DNA fragments at a time. These normally range from between 10,000 and 100,000 base pairs but have been shown to be able to read even 1-2 million base pairs at a time.

How does long-read sequencing work?

Long-read sequencing has been described as solving a jigsaw puzzle with large pieces. The DNA fragments produced in this technique are easier to assemble into a complete DNA sequence than in other sequencing techniques.

There are two main technologies within scientific research which utilize long-read sequencing: Oxford Nanopore sequencing, and PacBio single-molecule real-time (SMRT) sequencing. These techniques implement different methodologies, but are both capable of sequencing long lengths of DNA.

Nanopore sequencing measures changes in ionic current when single-stranded DNA fragments are moved through a nanopore, which are very small proteins forming pores are embedded within a membrane. Different DNA sequences will produce different levels of resistance when they pass through these pores, so the exact nucleotide sequence can be determined.

SMRT sequencing works by detecting different levels of fluorescence that are generated when a target DNA sequencing is replicated with modified nucleotides. This occurs in a series of wells and is limited by the quality of the DNA polymerase in use.

Advantages of long-read sequencing

Long-read sequencing has several distinct advantages compared to next-generation sequencing technologies.

One of the major advantages is that long-read sequencing can much more accurately sequence DNA containing repeats, which is where the same sections of DNA repeated within the genome. Sanger sequencing and next-generation sequencing often struggle with these repeats when assembling their DNA fragments.

These repeats, or copy number variations, are much easier to detect in long-read sequencing which is very important. For example in Huntingdon’s disease, the copy number of the DNA sequence ‘CAG’ dictates if a person is likely to develop the disease. Determining this copy number can have large implications in the diagnosis or prediction of genetic disease.

This sequencing technology can also more accurately detect larger-scale mutations, where long sections of DNA are deleted or moved. These structural variants often have roles in genetic disorders but have not been extensively studied in the past due to the lack of technology available.

What has been achieved with long-read sequencing?

In 2018, Jain et al. and other researchers from the University of California used long-read sequencing to accurately map the human Y chromosome centromere. The centromere is a very important section of all chromosomes which has a vital role within division, and its dysregulation has been linked to cancer formation and several different genetic syndromes like Down’s Syndrome and Turner Syndrome.

Nanopore sequencing has been used to detect and identify pathogens within clinical environments in as short as 6 hours from when the samples were taken.

Nanopore sequencing was also used during the ebola outbreak to rapidly and efficiently test blood samples for presence of the virus. The equipment was flown into West Africa and used directly on-site to monitor the epidemic.

DNA测序的历史
DNA测序最基本的形式之一是桑格测序。
该方法能相对较小的片段DNA序列的约900个碱基对。
DNA片段被复制了很多次，所有的片段都有不同的长度，并且在一端都有荧光标记。
这些标记的片段可以被绘制出来，以确定原始DNA的确切序列。

更现代的DNA测序被称为下一代测序。
与桑格测序相比，这些技术更快，更便宜，更有效地确定长DNA序列。
这是通过一次高通量分析许多不同的DNA片段来实现的。

这些DNA片段的长度一般在50-700个碱基对之间，但所使用的技术可以确定由数百万个碱基对组成的DNA序列。

长读测序，有时也称为第三代测序，是一种最近的DNA测序技术，可以读取更长的DNA片段的DNA序列一次。
这些通常的范围从10000到100000个碱基对但已被证明能够读甚至1 - 2百万碱基对。

长读测序是如何工作的?
长读序列被描述为解决一个大片段的拼图。
这种技术产生的DNA片段比其他测序技术更容易组装成完整的DNA序列。

在科学研究中有两种主要的技术可以利用长时间测序:牛津纳米孔测序和PacBio单分子实时测序(SMRT)。
这些技术采用不同的方法，但都能对长DNA进行测序。

有关的故事
RNA测序显示精子微生物组
海岸基因组利用3D打印技术自动化DNA纯化过程——通过与Sculpteo的合作，加速癌症研究、产前筛查和许多其他领域
生成人工DNA基因组和编码新的能力
纳米孔测序测量单链DNA片段穿过纳米孔时离子电流的变化，纳米孔是嵌入膜内形成小孔的非常小的蛋白质。
不同的DNA序列在通过这些小孔时产生不同程度的抗性，因此可以确定确切的核苷酸序列。

SMRT测序的工作原理是检测不同水平的荧光，当目标DNA测序被修改的核苷酸复制时产生荧光。
这发生在一系列井，并受到使用中的DNA聚合酶的质量的限制。

长读测序的优点
与下一代测序技术相比，长读测序有几个明显的优势。

其主要优点之一是，长读测序可以更准确地对含有重复序列的DNA进行排序，重复序列是指相同的DNA片段在基因组中重复。
桑格测序和下一代测序在组装它们的DNA片段时经常遇到这些重复序列的问题。

这些重复，或拷贝数变异，在长读序列中更容易检测，这是非常重要的。
例如，在亨廷顿病中，DNA序列“CAG”的拷贝数决定了一个人是否有可能患上这种疾病。
确定这个拷贝数对基因疾病的诊断和预测有很大的影响。

这种测序技术还可以更准确地检测更大规模的突变，即DNA的长片段被删除或移动。
这些结构变异通常在遗传疾病中发挥作用，但由于缺乏可用的技术，在过去没有得到广泛的研究。

长读测序取得了什么成果?
2018年，Jain等人和加州大学的其他研究人员使用长读测序技术精确绘制了人类Y染色体着丝点。
着丝粒是所有染色体中一个非常重要的部分，在分裂中起着至关重要的作用，它的失调与癌症的形成和一些不同的遗传综合症如唐氏综合症和特纳综合症有关。

纳米孔测序已被用于检测和鉴定病原体在临床环境中，在短短6小时内，从样本被取。

在埃博拉爆发期间还使用了纳米孔测序，以快速和有效地检测血液样本是否存在病毒。
这些设备被空运到西非，直接用于现场监测疫情。

Sources

Heather, J. M., & Chain, B. (2016). The sequence of sequencers: The history of sequencing DNA. Genomics. https://doi.org/10.1016/j.ygeno.2015.11.003

PHG Foundation. Long read sequencing technologies. (2018). www.phgfoundation.org/.../long-read-sequencing-ready-for-implementation

Koren, S., & Phillippy, A. M. (2015). One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Current opinion in microbiology. https://doi.org/10.1016/j.mib.2014.11.014

Amarasinghe, S. L., et al., (2020). Opportunities and challenges in long-read sequencing data analysis. Genome biology. https://doi.org/10.1186/s13059-020-1935-5

Eid, J., et al., (2009). Real-time DNA sequencing from single polymerase molecules. Science. https://doi.org/10.1126/science.1162986

Jain, M., et al., (2016). The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome biology. https://doi.org/10.1186/s13059-016-1103-0