So you've read the paper, are excited by the results, and now you want to explore some of your own questions in our data sets or using our software. On this page you can download data sets and software generated by our lab, as well as find links to resources provided by other groups that we have found to be pretty useful. Can't find what you are after? E-mail Trevor Pemberton and he'll do his best to point you in the right direction.
Variation at 138 orthologous human-chimpanzee microsatellites
- M Kwong, TJ Pemberton (2014) "Sequence differences at orthologous microsatellites inflate estimates of human-chimpanzee differentiation." BMC Genomics 15:p.990
For 138 microsatellite loci from Marshfield Screening Set 13, Additional File 5 from the above paper contains a data set of genotypes in 5,795 individuals from 267 worldwide human populations together with 84 chimpanzees reported by Pemberton et al. converted from PCR fragment lengths (in nucleotides) to numbers of repeats by calibration with the human and chimpanzee genome reference sequences as described in the above paper. In this data set, all human and chimpanzee genotypes are commensurable between species.Additional File 5.
Microsatellite variation in 267 worldwide human populations
- TJ Pemberton, M DeGiorgio, NA Rosenberg (2013) "Population structure in a comprehensive genomic data set on human microsatellite variation." G3:Genes|Genomes|Genetics 3(5):pp.903-919
Supplementary File S1 from the above paper, which contains two genotype data sets: (1) 645 microsatellite loci in 5,795 individuals from 267 worldwide human populations, and (2) 246 microsatellite loci in 5,795 individuals from 267 worldwide human populations together with 84 chimpanzees.
Supplementary File S1. [Updated 24 March 2016 to include sex assignments for all individuals]
Please note that, under the terms of the IRB approval provided by the Papua New Guinea (PNG) Medical Research Advisory Committee in Port Moresby, the Pacific Islander data included in the above dataset can only be used in accordance with the following conditions:
1. Individual anonymity must be maintained.
2. The data or samples must not be used in for-profit research.
3. There should be no stigmatization of individuals or groups within these data.
4. Copies of any resulting manuscripts should be forwarded to the PNG Medical Research Advisory Committee immediately upon publication. Please email a PDF of the manuscript to the Director of the Institute of Medical Research, Dr. William Pomat, requesting that he forward it to the PNG Medical Research Advisory Committee.
Any individuals who wish to use these data for activites that would violate any of the above conditions must remove the Pacific Islander data prior to conducting their analyses. Please see the "pembertonEtAl2013.subsets.txt" file included in Supplementary File S1 for the list of samples that should be removed.
Should you have any questions or concerns, please contact Dr. Trevor Pemberton.
Genome-wide homozygosity in 64 worldwide HGDP and HapMap populations
- TJ Pemberton, D Absher, MW Feldman, RM Myers, NA Rosenberg, JZ Li (2012) "Genomic patterns of homozygosity in worldwide human populations." American Journal of Human Genetics 91(2):pp.275-292
- Explore Genomic Patterns in the Autozygous Segments reported by this Study
Supplementary Tables S2-S5 from the above paper, which contain genome-wide homozygosity frequencies given separately for each of seven geographic regions (Africa, Middle East, Europe, Central/South Asia, East Asia, Oceania, and the Americas) as well as across all individuals in the data set.
Supplementary Table S2. Class A ROH frequencies at each SNP in the data set.
Supplementary Table S3. Class B ROH frequencies at each SNP in the data set.
Supplementary Table S4. Class C ROH frequencies at each SNP in the data set.
Supplementary Table S5. ROH frequencies at each SNP in the data set calculated over all three size classes.
Provided below is a file listing all ROH detected across all individuals in the 64 worldwide populations - 53 HGDP-CEPH populations and 11 HapMap Phase III populations - grouped by their length classification, as described in the above paper.
List of all ROH detected. [UCSC Genome Browser BED format]
Provided below are genome-wide homozygosity frequencies given separately for each of 64 worldwide populations - 53 HGDP-CEPH populations and 11 HapMap Phase III populations - and for each of seven geographic regions (Africa, Middle East, Europe, Central/South Asia, East Asia, Oceania, and the Americas), as well as across all individuals in the data set, as described in the above paper.
Genomic homozygosity frequency data separately in each population. [UCSC Genome Browser format]
Genomic homozygosity frequency data separately in each geographic region. [UCSC Genome Browser format]
Microsatellite and mtDNA HVS1 variation in Asian Indians
- TJ Pemberton, F-Y Li, EK Hanson, NU Mehta, S Choi, J Ballantyne, JW Belmont, NA Rosenberg, C Tyler-Smith, PI Patel (2012) "Impact of restricted marital practices on genetic variation in an endogamous Gujarati group." American Journal of Physical Anthropology 149(1):92-103
Genotypes for 729 autosomal microsatellite loci, 471 autosomal insertion/deletion loci, 24 Y-chromosomal microsatellite loci, and 26 mitochondrial HVSI variable sites genotyped in 607, 607, 140, and 138 Asian Indian individuals, respectively, as described in the above paper.Microsatellite and mtDNA HVS1 genotypes.
The mtDNA HVS1 sequences in which single-nucleotide variants were called.
2,810 single-nucleotide polymorphisms in 1,107 individuals from 63 human populations
- L Huang, M Jakobsson, TJ Pemberton, JK Pritchard, SA Tishkoff, NA Rosenberg (2011) "Haplotype variation and genotype imputation in African populations." Genetic Epidemiology 35(8):pp.766-780
- TJ Pemberton*, M Jakobsson*, DF Conrad, G Coop, JD Wall, JK Pritchard, PI Patel, NA Rosenberg (2008) "Using population mixtures to optimize the utility of genomic databases: association study design in India." Annals of Human Genetics 72(4):pp.535-546
Genotypes for 2,810 SNPs in 1,107 individuals from 63 worldwide human populations - 927 individuals from the 53 HGDP-CEPH populations, 30 individuals from two Asian Indian language groups (Bengali and Tamil), and 150 individuals from 8 African populations. These are the exact data described in Huang et al., which subsumed the data described in Pemberton et al. who did not consider the 8 African populations.Genotype data used in Huang et al.
Genotype data used in Pemberton et al.
Standardized subsets of the HapMap Phase III individuals controlling for relatedness
- TJ Pemberton, C Wang, JZ Li, NA Rosenberg (2010) "Inference of unexpected genetic relatedness among individuals in HapMap Phase III." American Journal of Human Genetics 87(4):pp.457-464
Standardized subsets of the HapMap Phase 3 individuals, accounting for duplicated samples and pairs of close relatives, as described in the above paper.HAP1161 and HAP1117 standardized subsets (tab-delimited text format).
Sequence properties of 627 human microsatellites
- TJ Pemberton, CI Sandefur, M Jakobsson, NA Rosenberg (2009) "Sequence determinants of human microsatellite variability." BMC Genomics 10:p.612
For 627 microsatellite loci from Marshfield screening sets 13 and 52, these files provide sequence properties, such as the structure of the repeat motif and the GC content of the flanking region, as described in the above paper. Also provided are genotypes for these 627 microsatellites in 1,048 individuals from 53 worldwide human populations reported by Ramachandran et al. converted from PCR fragment lengths (in nucleotides) to numbers of repeats, by calibration with the human genome reference sequence.Microsatellite sequence properties tables (tab-delimited text format).
HGDP-CEPH microsatellite genotype data.
Gene expression during enamel/root formation in the developing mouse tooth
- TJ Pemberton*, FY Li*, S Oka, GA Mendoza-Fandino, YH Hsu, P Bringas Jr, Y Chai, ML Snead, R Mehrian-Shai, PI Patel (2007) "Identification of novel genes expressed during mouse tooth development by microarray gene expression analysis." Developmental Dynamics 236(8):pp.2245-2257
Affymetrix Mouse Genome Expression 430 2.0 microarray gene expression data obtained for mouse molar teeth extracted from Swiss Webster mouse pups between 1 and 10 days postnatal, as described in the above paper.The mouse tooth gene expression data.
The control tissue gene expression data.
- ZA Szpiech, A Blant, TJ Pemberton (2017) "GARLIC: Genomic Autozygosity Regions Likelihood-based Inference and Classification." Bioinformatics 33(13):pp.2059-2062
- A Blant, M Kwong, ZA Szpiech, TJ Pemberton (2017) "Weighted likelihood inference of genomic autozygosity patterns in dense genotype data." BioRxiv doi:10.1101/177352
[ BioRxiv ]
GARLIC is a program for calling and classifying genomic regions of autozygosity (ROA)--sometimes referred to as runs of homozygosity (ROH)--in genotype data in the popular TPED/TFAM format. It implements the ROA calling and classification methods of Pemberton et al. (2012) and Blant et al. (2017).GARLIC software on GitHub.
- Human Genome Diversity Panel
- 121 African populations
- 29 Native American populations
- 13 Latin American Mestizo populations
- 84 Chimpanzees and Bonobos
- Single-nucleotide-polymorphism (SNP)
- Human Genome Diversity Panel
- 13 worldwide populations (Jorde)
- The International HapMap Project
- Singapore Genome Variation Project
- POPRES: Population Reference Sample
- Next-Generation Sequencing (NGS)
- The 1000 Genomes Project
- Complete Genomics