<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Shen, Feichen</style></author><author><style face="normal" font="default" size="100%">Kidd, Jeffrey M</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.</style></title><secondary-title><style face="normal" font="default" size="100%">Genes (Basel)</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genes (Basel)</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Algorithms</style></keyword><keyword><style  face="normal" font="default" size="100%">Computational Biology</style></keyword><keyword><style  face="normal" font="default" size="100%">DNA Copy Number Variations</style></keyword><keyword><style  face="normal" font="default" size="100%">Evolution, Molecular</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Duplication</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2020 01 29</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">11</style></volume><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">2</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/32013076?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Huang, Yi-Fei</style></author><author><style face="normal" font="default" size="100%">Gulko, Brad</style></author><author><style face="normal" font="default" size="100%">Siepel, Adam</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data.</style></title><secondary-title><style face="normal" font="default" size="100%">Nat Genet</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Nat Genet</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Animals</style></keyword><keyword><style  face="normal" font="default" size="100%">Base Sequence</style></keyword><keyword><style  face="normal" font="default" size="100%">Computational Biology</style></keyword><keyword><style  face="normal" font="default" size="100%">Evolution, Molecular</style></keyword><keyword><style  face="normal" font="default" size="100%">Genetic Variation</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Mammals</style></keyword><keyword><style  face="normal" font="default" size="100%">Metagenomics</style></keyword><keyword><style  face="normal" font="default" size="100%">Phenotype</style></keyword><keyword><style  face="normal" font="default" size="100%">Primates</style></keyword><keyword><style  face="normal" font="default" size="100%">Vertebrates</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2017</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2017 Apr</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">49</style></volume><pages><style face="normal" font="default" size="100%">618-624</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">4</style></issue><custom1><style face="normal" font="default" size="100%">https://www.ncbi.nlm.nih.gov/pubmed/28288115?dopt=Abstract</style></custom1></record></records></xml>