<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Garg, Shilpa</style></author><author><style face="normal" font="default" size="100%">Fungtammasan, Arkarachai</style></author><author><style face="normal" font="default" size="100%">Carroll, Andrew</style></author><author><style face="normal" font="default" size="100%">Chou, Mike</style></author><author><style face="normal" font="default" size="100%">Schmitt, Anthony</style></author><author><style face="normal" font="default" size="100%">Zhou, Xiang</style></author><author><style face="normal" font="default" size="100%">Mac, Stephen</style></author><author><style face="normal" font="default" size="100%">Peluso, Paul</style></author><author><style face="normal" font="default" size="100%">Hatas, Emily</style></author><author><style face="normal" font="default" size="100%">Ghurye, Jay</style></author><author><style face="normal" font="default" size="100%">Maguire, Jared</style></author><author><style face="normal" font="default" size="100%">Mahmoud, Medhat</style></author><author><style face="normal" font="default" size="100%">Cheng, Haoyu</style></author><author><style face="normal" font="default" size="100%">Heller, David</style></author><author><style face="normal" font="default" size="100%">Zook, Justin M</style></author><author><style face="normal" font="default" size="100%">Moemke, Tobias</style></author><author><style face="normal" font="default" size="100%">Marschall, Tobias</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Aach, John</style></author><author><style face="normal" font="default" size="100%">Chin, Chen-Shan</style></author><author><style face="normal" font="default" size="100%">Church, George M</style></author><author><style face="normal" font="default" size="100%">Li, Heng</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Chromosome-scale, haplotype-resolved assembly of human genomes.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Biotechnol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Biotechnol</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Algorithms</style></keyword><keyword><style  face="normal" font="default" size="100%">Chromosomes, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Haplotypes</style></keyword><keyword><style  face="normal" font="default" size="100%">Heterozygote</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2021 03</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">39</style></volume><pages><style face="normal" font="default" size="100%">309-312</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/33288905?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Ranallo-Benavidez, T Rhyker</style></author><author><style face="normal" font="default" size="100%">Lemmon, Zachary</style></author><author><style face="normal" font="default" size="100%">Soyk, Sebastian</style></author><author><style face="normal" font="default" size="100%">Aganezov, Sergey</style></author><author><style face="normal" font="default" size="100%">Salerno, William J</style></author><author><style face="normal" font="default" size="100%">McCoy, Rajiv C</style></author><author><style face="normal" font="default" size="100%">Lippman, Zachary B</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Optimized sample selection for cost-efficient long-read population sequencing.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Res</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Res</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2021 May</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">31</style></volume><pages><style face="normal" font="default" size="100%">910-918</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">5</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/33811084?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">De Coster, Wouter</style></author><author><style face="normal" font="default" size="100%">Weissensteiner, Matthias H</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Towards population-scale long-read sequencing.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Rev Genet</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Rev Genet</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Computational Biology</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">High-Throughput Nucleotide Sequencing</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Industrial Development</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2021 09</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">22</style></volume><pages><style face="normal" font="default" size="100%">572-587</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">9</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/34050336?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sekar, Shobana</style></author><author><style face="normal" font="default" size="100%">Tomasini, Livia</style></author><author><style face="normal" font="default" size="100%">Proukakis, Christos</style></author><author><style face="normal" font="default" size="100%">Bae, Taejeong</style></author><author><style face="normal" font="default" size="100%">Manlove, Logan</style></author><author><style face="normal" font="default" size="100%">Jang, Yeongjun</style></author><author><style face="normal" font="default" size="100%">Scuderi, Soraya</style></author><author><style face="normal" font="default" size="100%">Zhou, Bo</style></author><author><style face="normal" font="default" size="100%">Kalyva, Maria</style></author><author><style face="normal" font="default" size="100%">Amiri, Anahita</style></author><author><style face="normal" font="default" size="100%">Mariani, Jessica</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Urban, Alexander E</style></author><author><style face="normal" font="default" size="100%">Vaccarino, Flora M</style></author><author><style face="normal" font="default" size="100%">Abyzov, Alexej</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Complex mosaic structural variations in human fetal brains.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Res</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Res</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 12</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">30</style></volume><pages><style face="normal" font="default" size="100%">1695-1704</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to ∼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in ∼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (∼6200 vs. ∼4000 bp), implying that they may have similar functional consequences.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">12</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/33122304?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Weissensteiner, Matthias H</style></author><author><style face="normal" font="default" size="100%">Bunikis, Ignas</style></author><author><style face="normal" font="default" size="100%">Catalán, Ana</style></author><author><style face="normal" font="default" size="100%">Francoijs, Kees-Jan</style></author><author><style face="normal" font="default" size="100%">Knief, Ulrich</style></author><author><style face="normal" font="default" size="100%">Heim, Wieland</style></author><author><style face="normal" font="default" size="100%">Peona, Valentina</style></author><author><style face="normal" font="default" size="100%">Pophaly, Saurabh D</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Suh, Alexander</style></author><author><style face="normal" font="default" size="100%">Warmuth, Vera M</style></author><author><style face="normal" font="default" size="100%">Wolf, Jochen B W</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Discovery and population genomics of structural variation in a songbird genus.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Commun</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Commun</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Animals</style></keyword><keyword><style  face="normal" font="default" size="100%">Chromosome Inversion</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Deletion</style></keyword><keyword><style  face="normal" font="default" size="100%">Genetic Variation</style></keyword><keyword><style  face="normal" font="default" size="100%">Genetics, Population</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomic Structural Variation</style></keyword><keyword><style  face="normal" font="default" size="100%">Genotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Phylogeny</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword><keyword><style  face="normal" font="default" size="100%">Retroelements</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword><keyword><style  face="normal" font="default" size="100%">Songbirds</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 07 07</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">11</style></volume><pages><style face="normal" font="default" size="100%">3403</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Structural variation (SV) constitutes an important type of genetic mutations providing the raw material for evolution. Here, we uncover the genome-wide spectrum of intra- and interspecific SV segregating in natural populations of seven songbird species in the genus Corvus. Combining short-read (N = 127) and long-read re-sequencing (N = 31), as well as optical mapping (N = 16), we apply both assembly- and read mapping approaches to detect SV and characterize a total of 220,452 insertions, deletions and inversions. We exploit sampling across wide phylogenetic timescales to validate SV genotypes and assess the contribution of SV to evolutionary processes in an avian model of incipient speciation. We reveal an evolutionary young (~530,000 years) cis-acting 2.25-kb LTR retrotransposon insertion reducing expression of the NDP gene with consequences for premating isolation. Our results attest to the wealth and evolutionary significance of SV segregating in natural populations and highlight the need for reliable SV genotyping.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/32636372?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Alonge, Michael</style></author><author><style face="normal" font="default" size="100%">Wang, Xingang</style></author><author><style face="normal" font="default" size="100%">Benoit, Matthias</style></author><author><style face="normal" font="default" size="100%">Soyk, Sebastian</style></author><author><style face="normal" font="default" size="100%">Pereira, Lara</style></author><author><style face="normal" font="default" size="100%">Zhang, Lei</style></author><author><style face="normal" font="default" size="100%">Suresh, Hamsini</style></author><author><style face="normal" font="default" size="100%">Ramakrishnan, Srividya</style></author><author><style face="normal" font="default" size="100%">Maumus, Florian</style></author><author><style face="normal" font="default" size="100%">Ciren, Danielle</style></author><author><style face="normal" font="default" size="100%">Levy, Yuval</style></author><author><style face="normal" font="default" size="100%">Harel, Tom Hai</style></author><author><style face="normal" font="default" size="100%">Shalev-Schlosser, Gili</style></author><author><style face="normal" font="default" size="100%">Amsellem, Ziva</style></author><author><style face="normal" font="default" size="100%">Razifard, Hamid</style></author><author><style face="normal" font="default" size="100%">Caicedo, Ana L</style></author><author><style face="normal" font="default" size="100%">Tieman, Denise M</style></author><author><style face="normal" font="default" size="100%">Klee, Harry</style></author><author><style face="normal" font="default" size="100%">Kirsche, Melanie</style></author><author><style face="normal" font="default" size="100%">Aganezov, Sergey</style></author><author><style face="normal" font="default" size="100%">Ranallo-Benavidez, T Rhyker</style></author><author><style face="normal" font="default" size="100%">Lemmon, Zachary H</style></author><author><style face="normal" font="default" size="100%">Kim, Jennifer</style></author><author><style face="normal" font="default" size="100%">Robitaille, Gina</style></author><author><style face="normal" font="default" size="100%">Kramer, Melissa</style></author><author><style face="normal" font="default" size="100%">Goodwin, Sara</style></author><author><style face="normal" font="default" size="100%">McCombie, W Richard</style></author><author><style face="normal" font="default" size="100%">Hutton, Samuel</style></author><author><style face="normal" font="default" size="100%">Van Eck, Joyce</style></author><author><style face="normal" font="default" size="100%">Gillis, Jesse</style></author><author><style face="normal" font="default" size="100%">Eshed, Yuval</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">van der Knaap, Esther</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author><author><style face="normal" font="default" size="100%">Lippman, Zachary B</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato.</style></title><secondary-title><style face="normal" font="default" size="100%">Cell</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Cell</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Alleles</style></keyword><keyword><style  face="normal" font="default" size="100%">Crops, Agricultural</style></keyword><keyword><style  face="normal" font="default" size="100%">Cytochrome P-450 Enzyme System</style></keyword><keyword><style  face="normal" font="default" size="100%">Ecotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Epistasis, Genetic</style></keyword><keyword><style  face="normal" font="default" size="100%">Fruit</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Duplication</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Expression Regulation, Plant</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Plant</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomic Structural Variation</style></keyword><keyword><style  face="normal" font="default" size="100%">Genotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Inbreeding</style></keyword><keyword><style  face="normal" font="default" size="100%">Lycopersicon esculentum</style></keyword><keyword><style  face="normal" font="default" size="100%">Molecular Sequence Annotation</style></keyword><keyword><style  face="normal" font="default" size="100%">Phenotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Plant Breeding</style></keyword><keyword><style  face="normal" font="default" size="100%">Quantitative Trait Loci</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 07 09</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">182</style></volume><pages><style face="normal" font="default" size="100%">145-161.e23</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/32553272?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Zarate, Samantha</style></author><author><style face="normal" font="default" size="100%">Carroll, Andrew</style></author><author><style face="normal" font="default" size="100%">Mahmoud, Medhat</style></author><author><style face="normal" font="default" size="100%">Krasheninina, Olga</style></author><author><style face="normal" font="default" size="100%">Jun, Goo</style></author><author><style face="normal" font="default" size="100%">Salerno, William J</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author><author><style face="normal" font="default" size="100%">Boerwinkle, Eric</style></author><author><style face="normal" font="default" size="100%">Gibbs, Richard A</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Parliament2: Accurate structural variant calling at scale.</style></title><secondary-title><style face="normal" font="default" size="100%">Gigascience</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Gigascience</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 12 21</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">9</style></volume><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;&lt;b&gt;BACKGROUND: &lt;/b&gt;Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples.&lt;/p&gt;&lt;p&gt;&lt;b&gt;FINDINGS: &lt;/b&gt;We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in &lt;1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available.&lt;/p&gt;&lt;p&gt;&lt;b&gt;CONCLUSION: &lt;/b&gt;Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">12</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/33347570?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Majidian, Sina</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">PhaseME: Automatic rapid assessment of phasing quality and phasing improvement.</style></title><secondary-title><style face="normal" font="default" size="100%">Gigascience</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Gigascience</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 07 01</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">9</style></volume><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;&lt;b&gt;BACKGROUND: &lt;/b&gt;The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess.&lt;/p&gt;&lt;p&gt;&lt;b&gt;FINDINGS: &lt;/b&gt;Here we present PhaseME, a versatile method to provide insights into and improvement of sample phasing results based on linkage data. We showcase the performance and the importance of PhaseME by comparing phasing information obtained from Pacific Biosciences including both continuous long reads and high-quality consensus reads, Oxford Nanopore Technologies, 10x Genomics, and Illumina sequencing technologies. We found that 10x Genomics and Oxford Nanopore phasing can be significantly improved while retaining a high N50 and completeness of phase blocks. PhaseME generates reports and summary plots to provide insights into phasing performance and correctness. We observed unique phasing issues for each of the sequencing technologies, highlighting the necessity of quality assessments. PhaseME is able to decrease the Hamming error rate significantly by 22.4% on average across all 5 technologies. Additionally, a significant improvement is obtained in the reduction of long switch errors. Especially for high-quality consensus reads, the improvement is 54.6% in return for only a 5% decrease in phase block N50 length.&lt;/p&gt;&lt;p&gt;&lt;b&gt;CONCLUSIONS: &lt;/b&gt;PhaseME is a universal method to assess the phasing quality and accuracy and improves the quality of phasing using linkage information. The package is freely available at https://github.com/smajidian/phaseme.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">7</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/32706368?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Wenger, Aaron M</style></author><author><style face="normal" font="default" size="100%">Peluso, Paul</style></author><author><style face="normal" font="default" size="100%">Rowell, William J</style></author><author><style face="normal" font="default" size="100%">Chang, Pi-Chuan</style></author><author><style face="normal" font="default" size="100%">Hall, Richard J</style></author><author><style face="normal" font="default" size="100%">Concepcion, Gregory T</style></author><author><style face="normal" font="default" size="100%">Ebler, Jana</style></author><author><style face="normal" font="default" size="100%">Fungtammasan, Arkarachai</style></author><author><style face="normal" font="default" size="100%">Kolesnikov, Alexey</style></author><author><style face="normal" font="default" size="100%">Olson, Nathan D</style></author><author><style face="normal" font="default" size="100%">Töpfer, Armin</style></author><author><style face="normal" font="default" size="100%">Alonge, Michael</style></author><author><style face="normal" font="default" size="100%">Mahmoud, Medhat</style></author><author><style face="normal" font="default" size="100%">Qian, Yufeng</style></author><author><style face="normal" font="default" size="100%">Chin, Chen-Shan</style></author><author><style face="normal" font="default" size="100%">Phillippy, Adam M</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author><author><style face="normal" font="default" size="100%">Myers, Gene</style></author><author><style face="normal" font="default" size="100%">DePristo, Mark A</style></author><author><style face="normal" font="default" size="100%">Ruan, Jue</style></author><author><style face="normal" font="default" size="100%">Marschall, Tobias</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Zook, Justin M</style></author><author><style face="normal" font="default" size="100%">Li, Heng</style></author><author><style face="normal" font="default" size="100%">Koren, Sergey</style></author><author><style face="normal" font="default" size="100%">Carroll, Andrew</style></author><author><style face="normal" font="default" size="100%">Rank, David R</style></author><author><style face="normal" font="default" size="100%">Hunkapiller, Michael W</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Biotechnol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat. Biotechnol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 Oct</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">37</style></volume><pages><style face="normal" font="default" size="100%">1155-1162</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions &lt;50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of &gt;15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">10</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/31406327?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Tusso, Sergio</style></author><author><style face="normal" font="default" size="100%">Nieuwenhuis, Bart P S</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Davey, John W</style></author><author><style face="normal" font="default" size="100%">Jeffares, Daniel C</style></author><author><style face="normal" font="default" size="100%">Wolf, Jochen B W</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Ancestral Admixture Is the Main Determinant of Global Biodiversity in Fission Yeast.</style></title><secondary-title><style face="normal" font="default" size="100%">Mol Biol Evol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Mol. Biol. Evol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 09 01</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">36</style></volume><pages><style face="normal" font="default" size="100%">1975-1989</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Mutation and recombination are key evolutionary processes governing phenotypic variation and reproductive isolation. We here demonstrate that biodiversity within all globally known strains of Schizosaccharomyces pombe arose through admixture between two divergent ancestral lineages. Initial hybridization was inferred to have occurred ∼20-60 sexual outcrossing generations ago consistent with recent, human-induced migration at the onset of intensified transcontinental trade. Species-wide heritable phenotypic variation was explained near-exclusively by strain-specific arrangements of alternating ancestry components with evidence for transgressive segregation. Reproductive compatibility between strains was likewise predicted by the degree of shared ancestry. To assess the genetic determinants of ancestry block distribution across the genome, we characterized the type, frequency, and position of structural genomic variation using nanopore and single-molecule real-time sequencing. Despite being associated with double-strand break initiation points, over 800 segregating structural variants exerted overall little influence on the introgression landscape or on reproductive compatibility between strains. In contrast, we found strong ancestry disequilibrium consistent with negative epistatic selection shaping genomic ancestry combinations during the course of hybridization. This study provides a detailed, experimentally tractable example that genomes of natural populations are mosaics reflecting different evolutionary histories. Exploiting genome-wide heterogeneity in the history of ancestral recombination and lineage-specific mutations sheds new light on the population history of S. pombe and highlights the importance of hybridization as a creative force in generating biodiversity.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">9</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/31225876?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Chiang, Theodore</style></author><author><style face="normal" font="default" size="100%">Liu, Xiuping</style></author><author><style face="normal" font="default" size="100%">Wu, Tsung-Jung</style></author><author><style face="normal" font="default" size="100%">Hu, Jianhong</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">White, Simon</style></author><author><style face="normal" font="default" size="100%">Schaid, Daniel</style></author><author><style face="normal" font="default" size="100%">Andrade, Mariza de</style></author><author><style face="normal" font="default" size="100%">Jarvik, Gail P</style></author><author><style face="normal" font="default" size="100%">Crosslin, David</style></author><author><style face="normal" font="default" size="100%">Stanaway, Ian</style></author><author><style face="normal" font="default" size="100%">Carrell, David S</style></author><author><style face="normal" font="default" size="100%">Connolly, John J</style></author><author><style face="normal" font="default" size="100%">Hakonarson, Hakon</style></author><author><style face="normal" font="default" size="100%">Groopman, Emily E</style></author><author><style face="normal" font="default" size="100%">Gharavi, Ali G</style></author><author><style face="normal" font="default" size="100%">Fedotov, Alexander</style></author><author><style face="normal" font="default" size="100%">Bi, Weimin</style></author><author><style face="normal" font="default" size="100%">Leduc, Magalie S</style></author><author><style face="normal" font="default" size="100%">Murdock, David R</style></author><author><style face="normal" font="default" size="100%">Jiang, Yunyun</style></author><author><style face="normal" font="default" size="100%">Meng, Linyan</style></author><author><style face="normal" font="default" size="100%">Eng, Christine M</style></author><author><style face="normal" font="default" size="100%">Wen, Shu</style></author><author><style face="normal" font="default" size="100%">Yang, Yaping</style></author><author><style face="normal" font="default" size="100%">Muzny, Donna M</style></author><author><style face="normal" font="default" size="100%">Boerwinkle, Eric</style></author><author><style face="normal" font="default" size="100%">Salerno, William</style></author><author><style face="normal" font="default" size="100%">Venner, Eric</style></author><author><style face="normal" font="default" size="100%">Gibbs, Richard A</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Atlas-CNV: a validated approach to call single-exon CNVs in the eMERGESeq gene panel.</style></title><secondary-title><style face="normal" font="default" size="100%">Genet Med</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genet. Med.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 Sep</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">21</style></volume><pages><style face="normal" font="default" size="100%">2135-2144</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;&lt;b&gt;PURPOSE: &lt;/b&gt;To provide a validated method to confidently identify exon-containing copy-number variants (CNVs), with a low false discovery rate (FDR), in targeted sequencing data from a clinical laboratory with particular focus on single-exon CNVs.&lt;/p&gt;&lt;p&gt;&lt;b&gt;METHODS: &lt;/b&gt;DNA sequence coverage data are normalized within each sample and subsequently exonic CNVs are identified in a batch of samples, when the target log ratio of the sample to the batch median exceeds defined thresholds. The quality of exonic CNV calls is assessed by C-scores (Z-like scores) using thresholds derived from gold standard samples and simulation studies. We integrate an ExonQC threshold to lower FDR and compare performance with alternate software (VisCap).&lt;/p&gt;&lt;p&gt;&lt;b&gt;RESULTS: &lt;/b&gt;Thirteen CNVs were used as a truth set to validate Atlas-CNV and compared with VisCap. We demonstrated FDR reduction in validation, simulation, and 10,926 eMERGESeq samples without sensitivity loss. Sixty-four multiexon and 29 single-exon CNVs with high C-scores were assessed by Multiplex Ligation-dependent Probe Amplification (MLPA).&lt;/p&gt;&lt;p&gt;&lt;b&gt;CONCLUSION: &lt;/b&gt;Atlas-CNV is validated as a method to identify exonic CNVs in targeted sequencing data generated in the clinical laboratory. The ExonQC and C-score assignment can reduce FDR (identification of targets with high variance) and improve calling accuracy of single-exon CNVs respectively. We propose guidelines and criteria to identify high confidence single-exon CNVs.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">9</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/30890783?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Leija-Salazar, Melissa</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Toffoli, Marco</style></author><author><style face="normal" font="default" size="100%">Mullin, Stephen</style></author><author><style face="normal" font="default" size="100%">Mokretar, Katya</style></author><author><style face="normal" font="default" size="100%">Athanasopoulou, Maria</style></author><author><style face="normal" font="default" size="100%">Donald, Aimee</style></author><author><style face="normal" font="default" size="100%">Sharma, Reena</style></author><author><style face="normal" font="default" size="100%">Hughes, Derralynn</style></author><author><style face="normal" font="default" size="100%">Schapira, Anthony H V</style></author><author><style face="normal" font="default" size="100%">Proukakis, Christos</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Evaluation of the detection of GBA missense mutations and other variants using the Oxford Nanopore MinION.</style></title><secondary-title><style face="normal" font="default" size="100%">Mol Genet Genomic Med</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Mol Genet Genomic Med</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 03</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">7</style></volume><pages><style face="normal" font="default" size="100%">e564</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;&lt;b&gt;BACKGROUND: &lt;/b&gt;Mutations in GBA cause Gaucher disease when biallelic and are strong risk factors for Parkinson's disease when heterozygous. GBA analysis is complicated by the nearby pseudogene. We aimed to design and validate a method for sequencing GBA using long reads.&lt;/p&gt;&lt;p&gt;&lt;b&gt;METHODS: &lt;/b&gt;We sequenced GBA on the Oxford Nanopore MinION as an 8.9 kb amplicon from 102 individuals, including patients with Parkinson's and Gaucher diseases. We used NanoOK for quality metrics, NGMLR to align data (after comparing with GraphMap), Nanopolish and Sniffles to call variants, and WhatsHap for phasing.&lt;/p&gt;&lt;p&gt;&lt;b&gt;RESULTS: &lt;/b&gt;We detected all known missense mutations in these samples, including the common p.N409S (N370S) and p.L483P (L444P) in multiple samples, and nine rarer ones, as well as a splicing and a truncating mutation, and intronic SNPs. We demonstrated the ability to phase mutations, confirm compound heterozygosity, and assign haplotypes. We also detected two known risk variants in some Parkinson's patients. Rare false positives were easily identified and filtered, with the Nanopolish quality score adjusted for the number of reads a very robust discriminator. In two individuals carrying a recombinant allele, we were able to detect and fully define it in one carrier, where it included a 55-base pair deletion, but not in another one, suggesting a limitation of the PCR enrichment method. Missense mutations were detected at the correct zygosity, except for the case where the RecNciI one was missed.&lt;/p&gt;&lt;p&gt;&lt;b&gt;CONCLUSION: &lt;/b&gt;The Oxford Nanopore MinION can detect missense mutations and an exonic deletion in this difficult gene, with the added advantages of phasing and intronic analysis. It can be used as an efficient research tool, but additional work is required to exclude all recombinants.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/30637984?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Dennenmoser, Stefan</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author><author><style face="normal" font="default" size="100%">Altmüller, Janine</style></author><author><style face="normal" font="default" size="100%">Zytnicki, Matthias</style></author><author><style face="normal" font="default" size="100%">Nolte, Arne W</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Genome-wide patterns of transposon proliferation in an evolutionary young hybrid fish.</style></title><secondary-title><style face="normal" font="default" size="100%">Mol Ecol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Mol. Ecol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 Mar</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">28</style></volume><pages><style face="normal" font="default" size="100%">1491-1505</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Hybridization can induce transposons to jump into new genomic positions, which may result in their accumulation across the genome. Alternatively, transposon copy numbers may increase through nonallelic (ectopic) homologous recombination in highly repetitive regions of the genome. The relative contribution of transposition bursts versus recombination-based mechanisms to evolutionary processes remains unclear because studies on transposon dynamics in natural systems are rare. We assessed the genomewide distribution of transposon insertions in a young hybrid lineage (&quot;invasive Cottus&quot;, n = 11) and its parental species Cottus rhenanus (n = 17) and Cottus perifretum(n = 9) using a reference genome assembled from long single molecule pacbio reads. An inventory of transposable elements was reconstructed from the same data and annotated. Transposon copy numbers in the hybrid lineage increased in 120 (15.9%) out of 757 transposons studied here. The copy number increased on average by 69% (range: 10%-197%). Given the age of the hybrid lineage, this suggests that they have proliferated within a few hundred generations since admixture began. However, frequency spectra of transposon insertions revealed no increase in novel and rare insertions across assembled parts of the genome. This implies that transposons were added to repetitive regions of the genome that remain difficult to assemble. Future studies will need to evaluate whether recombination-based mechanisms rather than genomewide transposition may explain the majority of the recent transposon proliferation in the hybrid lineage. Irrespectively of the underlying mechanism, the observed overabundance in repetitive parts of the genome suggests that gene-rich regions are unlikely to be directly affected.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">6</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/30520198?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Luo, Ruibang</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Lam, Tak-Wah</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">A multi-task convolutional deep neural network for variant calling in single molecule sequencing.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Commun</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Commun</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Base Sequence</style></keyword><keyword><style  face="normal" font="default" size="100%">Computational Biology</style></keyword><keyword><style  face="normal" font="default" size="100%">DNA Mutational Analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome-Wide Association Study</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">Genotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Genotyping Techniques</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">INDEL Mutation</style></keyword><keyword><style  face="normal" font="default" size="100%">Nanopores</style></keyword><keyword><style  face="normal" font="default" size="100%">Neural Networks (Computer)</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword><keyword><style  face="normal" font="default" size="100%">Software</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 03 01</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">10</style></volume><pages><style face="normal" font="default" size="100%">998</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/30824707?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Alonge, Michael</style></author><author><style face="normal" font="default" size="100%">Soyk, Sebastian</style></author><author><style face="normal" font="default" size="100%">Ramakrishnan, Srividya</style></author><author><style face="normal" font="default" size="100%">Wang, Xingang</style></author><author><style face="normal" font="default" size="100%">Goodwin, Sara</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Lippman, Zachary B</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">RaGOO: fast and accurate reference-guided scaffolding of draft genomes.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Biol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Biol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 Oct 28</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">20</style></volume><pages><style face="normal" font="default" size="100%">224</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO .&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/31661016?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Mahmoud, Medhat</style></author><author><style face="normal" font="default" size="100%">Gobet, Nastassia</style></author><author><style face="normal" font="default" size="100%">Cruz-Dávalos, Diana Ivette</style></author><author><style face="normal" font="default" size="100%">Mounier, Ninon</style></author><author><style face="normal" font="default" size="100%">Dessimoz, Christophe</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Structural variant calling: the long and the short of it.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Biol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Biol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2019 11 20</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">20</style></volume><pages><style face="normal" font="default" size="100%">246</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/31747936?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Rescheneder, Philipp</style></author><author><style face="normal" font="default" size="100%">Smolka, Moritz</style></author><author><style face="normal" font="default" size="100%">Fang, Han</style></author><author><style face="normal" font="default" size="100%">Nattestad, Maria</style></author><author><style face="normal" font="default" size="100%">von Haeseler, Arndt</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Accurate detection of complex structural variations using single-molecule sequencing.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Methods</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Methods</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">DNA Mutational Analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">High-Throughput Nucleotide Sequencing</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2018 Jun</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">15</style></volume><pages><style face="normal" font="default" size="100%">461-468</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">6</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/29713083?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Nattestad, Maria</style></author><author><style face="normal" font="default" size="100%">Goodwin, Sara</style></author><author><style face="normal" font="default" size="100%">Ng, Karen</style></author><author><style face="normal" font="default" size="100%">Baslan, Timour</style></author><author><style face="normal" font="default" size="100%">Sedlazeck, Fritz J</style></author><author><style face="normal" font="default" size="100%">Rescheneder, Philipp</style></author><author><style face="normal" font="default" size="100%">Garvin, Tyler</style></author><author><style face="normal" font="default" size="100%">Fang, Han</style></author><author><style face="normal" font="default" size="100%">Gurtowski, James</style></author><author><style face="normal" font="default" size="100%">Hutton, Elizabeth</style></author><author><style face="normal" font="default" size="100%">Tseng, Elizabeth</style></author><author><style face="normal" font="default" size="100%">Chin, Chen-Shan</style></author><author><style face="normal" font="default" size="100%">Beck, Timothy</style></author><author><style face="normal" font="default" size="100%">Sundaravadanam, Yogi</style></author><author><style face="normal" font="default" size="100%">Kramer, Melissa</style></author><author><style face="normal" font="default" size="100%">Antoniou, Eric</style></author><author><style face="normal" font="default" size="100%">McPherson, John D</style></author><author><style face="normal" font="default" size="100%">Hicks, James</style></author><author><style face="normal" font="default" size="100%">McCombie, W Richard</style></author><author><style face="normal" font="default" size="100%">Schatz, Michael C</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Res</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Res.</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Breast Neoplasms</style></keyword><keyword><style  face="normal" font="default" size="100%">Female</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Amplification</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Rearrangement</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomic Structural Variation</style></keyword><keyword><style  face="normal" font="default" size="100%">High-Throughput Nucleotide Sequencing</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">MCF-7 Cells</style></keyword><keyword><style  face="normal" font="default" size="100%">Oncogenes</style></keyword><keyword><style  face="normal" font="default" size="100%">Receptor, ErbB-2</style></keyword><keyword><style  face="normal" font="default" size="100%">Repetitive Sequences, Nucleic Acid</style></keyword><keyword><style  face="normal" font="default" size="100%">Transcriptome</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2018 08</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">28</style></volume><pages><style face="normal" font="default" size="100%">1126-1135</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important  oncogene (also known as ), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">8</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/29954844?dopt=Abstract</style></custom1></record></records></xml>