<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Costello, Maura</style></author><author><style face="normal" font="default" size="100%">Fleharty, Mark</style></author><author><style face="normal" font="default" size="100%">Abreu, Justin</style></author><author><style face="normal" font="default" size="100%">Farjoun, Yossi</style></author><author><style face="normal" font="default" size="100%">Ferriera, Steven</style></author><author><style face="normal" font="default" size="100%">Holmes, Laurie</style></author><author><style face="normal" font="default" size="100%">Granger, Brian</style></author><author><style face="normal" font="default" size="100%">Green, Lisa</style></author><author><style face="normal" font="default" size="100%">Howd, Tom</style></author><author><style face="normal" font="default" size="100%">Mason, Tamara</style></author><author><style face="normal" font="default" size="100%">Vicente, Gina</style></author><author><style face="normal" font="default" size="100%">Dasilva, Michael</style></author><author><style face="normal" font="default" size="100%">Brodeur, Wendy</style></author><author><style face="normal" font="default" size="100%">DeSmet, Timothy</style></author><author><style face="normal" font="default" size="100%">Dodge, Sheila</style></author><author><style face="normal" font="default" size="100%">Lennon, Niall J</style></author><author><style face="normal" font="default" size="100%">Gabriel, Stacey</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms.</style></title><secondary-title><style face="normal" font="default" size="100%">BMC Genomics</style></secondary-title><alt-title><style face="normal" font="default" size="100%">BMC Genomics</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">DNA</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Library</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">High-Throughput Nucleotide Sequencing</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2018 May 08</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">19</style></volume><pages><style face="normal" font="default" size="100%">332</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;&lt;b&gt;BACKGROUND: &lt;/b&gt;Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps.&lt;/p&gt;&lt;p&gt;&lt;b&gt;RESULTS: &lt;/b&gt;Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping.&lt;/p&gt;&lt;p&gt;&lt;b&gt;CONCLUSIONS: &lt;/b&gt;Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/29739332?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Regier, Allison A</style></author><author><style face="normal" font="default" size="100%">Farjoun, Yossi</style></author><author><style face="normal" font="default" size="100%">Larson, David E</style></author><author><style face="normal" font="default" size="100%">Krasheninina, Olga</style></author><author><style face="normal" font="default" size="100%">Kang, Hyun Min</style></author><author><style face="normal" font="default" size="100%">Howrigan, Daniel P</style></author><author><style face="normal" font="default" size="100%">Chen, Bo-Juen</style></author><author><style face="normal" font="default" size="100%">Kher, Manisha</style></author><author><style face="normal" font="default" size="100%">Banks, Eric</style></author><author><style face="normal" font="default" size="100%">Ames, Darren C</style></author><author><style face="normal" font="default" size="100%">English, Adam C</style></author><author><style face="normal" font="default" size="100%">Li, Heng</style></author><author><style face="normal" font="default" size="100%">Xing, Jinchuan</style></author><author><style face="normal" font="default" size="100%">Zhang, Yeting</style></author><author><style face="normal" font="default" size="100%">Matise, Tara</style></author><author><style face="normal" font="default" size="100%">Abecasis, Goncalo R</style></author><author><style face="normal" font="default" size="100%">Salerno, Will</style></author><author><style face="normal" font="default" size="100%">Zody, Michael C</style></author><author><style face="normal" font="default" size="100%">Neale, Benjamin M</style></author><author><style face="normal" font="default" size="100%">Hall, Ira M</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Commun</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Commun</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Human Genetics</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Whole Genome Sequencing</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2018 10 02</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">9</style></volume><pages><style face="normal" font="default" size="100%">4038</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/30279509?dopt=Abstract</style></custom1></record></records></xml>