Mitochondrial DNA Data
A total of 1703 samples (80%) were successful in the mtDNAgenotyping for the 34 markers, equally from all parts of the country (Supporting Table 1). Five haplogroup frequencies (Table 2) had a statistically signiﬁcant correlation with latitude:
U5b (r=0.69, pn < 0.0001, pc= 0.0069),
I (r =0.49, pn = 0.019, pc = 0.40),
U∗ (xU2–7,K) (r = 0.44, pn= 0.035, pc = 0.58), and
U5a (r = 0.44, pn = 0.036, pc= 0.61)
were more abundant in the north, and X had a higher frequency in the south (r = −0.42, pn = 0.043, pc= 0.72). Even though only one of the corrected p-values was below 0.05, the empirical probability of obtaining ﬁve r’s of this magnitude by chance was only 0.0065. The 95% conﬁdence intervals of the frequencies of these haplogroups are shown in Supporting Figure 1. None of the haplogroups had a statistically signiﬁcant correlation with the proportionof immigrants, although H6 and L∗ (xM,N) were nominally signiﬁcant. Genetic diversity measured as the mean number of pair-wise differences (π) was 3.27 for the whole country with subpopulation values ranging from 2.75 to 3.59, being usually lower in Götaland (Table 2). In the principal componentanalysis of the Swedish subpopulations alone (Fig. 3a, Fig.4a, Supporting Figure 2a), the second component showed astatistically signiﬁcant correlation with latitude (r = −0.50, pn = 0.020) mostly due to the contribution of haplogroup HV∗ (xH1,2,3,V), and PC3 was correlated with the proportion of immigrants (r =−0.45, pn =0.039). The SAMOVA analysis supported the PCA, separating individual populations from the main group (data not shown), with the proportionof variance explained by these groupings being below 2%.
In PCA with the other populations from the Baltic Sea region (Fig. 3b, Fig. 4b, Supporting Figure 2b), the Swedish populations, including the sample of Swedes without recent immigration in their familial background (Lappalainen et al. 2008) cluster close to Karelia and the Baltic states. According to the AMOVA analysis, the grouping of counties to the regions of Norrland, Svealand and Götaland accounted for a small but statistically signiﬁcant proportion of the variance (0.32%, p= 0.0020). A Mantel test revealed a signiﬁcant correlationcoefﬁcient, r =0.28 (p=
0.019), between geographical and genetic distances of all counties. We investigated the contribution of genetic drift and immigration in the distribution of variation in the country in more detail; we performed the Mantel test also without the four counties with π < 3 to exclude those most affected by drift, and without Stockholm, Malmö and Gothenburg which have large immigrant populations. Neither of these had a major effect on the Mantel test statistic (r = 0.27, p = 0.037 and r = 0.34, p = 0.008, respectively). The spatial autocorrelation analysis (Fig. 5) based on the birth hospital rather than county level information yielded a statistically signiﬁcant correlogram (p < 0.01) both for the total mtDNA data and also without the counties of the lowest diversities. We observed 98 mtDNA haplotypes that had coding region genotypes characteristic for more than one haplogroup (Supporting Table 2). Analysing 4637 previously published mtDNA genomes revealed that 74 of these haplotypes have been observed before, and novel haplotypes had in almostall cases arisen from markers where recurrent mutation has been observed in earlier studies. There were an additional 25 haplotypes with recurrent mutations in the HVS. There were no discrepancies between the results from Sequenom and there-genotyping by RFLP assays.
A total of 883 samples, 41% of the samples of unknown gender, were successful in the Y-chromosomal genotyping for the 14 SNPs (Supporting Table 1). The haplogroup frequencies are given in Table 3. Four haplogroups had a positive correlation with the proportion of immigrants: I1b (r =0.67, pn =0.003, pc =0.072), R1b (r =0.63, pn =0.006, pc =0.14), F∗ (r = 0.56, pn = 0.017, pc =
0.26), and K∗ (r = 0.54, pn = 0.039, pc = 0.32), whereas for [b]I1a[/b] the correlation was strongly negative (r =−0.79, pn
=0.0003, pc=0.011). The empirical probability of obtaining these ﬁve r ’s by chance was <0.0001. Furthermore, R1a1 had high frequencies in some western counties, and N3 was common in the eastern parts of northern and central Sweden. The conﬁdence intervals for the above mentioned haplogroup frequencies are shown in Supporting Figure 3. None of the haplogroups showed a statistically signiﬁcant north-south cline, although for I1c and R1b the correlation was nominally signiﬁcant. The mean number of pairwise differences (π
) of the total Y-chromosomal data set was 2.70, with the lowest valuein Västra Götaland and the highest in Halland (Table 3). In the principal component analysis of the Swedes (Fig. 3c,Fig. 4c, Supporting Figure 2c), the ﬁrst PC had a strong correlation with the proportion of immigrants (r = −0.80, pn = 0.0001) mostly due to frequency variation of I1a, also observed in the correlation analysis of haplogroup frequencies. Haplogroup R1a1 contributed most to the divergence of four western and southern populations in PC2. The SAMOVA results supported the separation of the populations most diverged in the PCA (data not shown).
The neighboring populations showed a clear difference from the Swedes in PCA (Fig. 3d, Fig. 4d, Supporting Figure 2d) mostly due to the lower frequency of N3 among the Swedes. The AMOVA analysis did not support the grouping to Norrland, Svealand and Götaland (−0.19% of variation, p = 0.68), and even the proportion of variance among the subpopulations in one group was nonsigniﬁcant (0.46%, p=0.13). No correlation between geographic and genetic distances was observed in the Mantel test of all populations (r = −0.06, p = 0.622). While exclusion of the three largest cities had little effect in the Mantel test (r = −0.15, p = 0.86), removing four populations with most signs of genetic drift measured by mean number of pairwise differences (π < 2.60) increased the correlation to 0.38 (p = 0.011). This pattern was supported by the autocorrelation analysis, where this subset of populations yielded a statistically signiﬁcant (p < 0.05) correlogram (Fig. 5), while the total dataset had no pattern of isolation by distance.