Views of the Working group on future DNA testing strategies relating to archaeological human remains
There has been considerable discussion on the methods of sequencing ancient DNA, which are typically 1240K SNP capture arrays, which has sometimes been described, perhaps rather misleadingly as genome wide sequencing, or Whole Genome Sequencing (WGS). Initially there was a considerable cost differential between these two methods, one of the reasons why 1240K tests have been most commonly adopted, but this differential has reduced considerably, making WGS testing more feasible. The advantages and disadvantages of these two approaches can be summarised as follows –
1240K Advantages:
- High throughput genome-wide data (DNA sequences from across the whole genome)
- Coverage of non-random but significantly informative areas of the genome
- Most genome-wide ancient data derives from 1240K Capture and provides immediate comparability, particularly for samples with poor DNA quality
- Data usually good enough to address archaeological questions about genetic sex, ancestry and relatedness
- Economical method of acquiring a minimum standard of information from an ancient sample
- Currently the only practical/economic way of obtaining useful data from samples where DNA is very poorly preserved
1240K Disadvantages:
- Less useful for higher-resolution studies of ancestry and natural selection
- Very partial and may not report on significant data due to lack of comprehensive coverage
- The Capture Array was designed using populations that are better-represented in the extant genetic data, particularly people with recent ancestry from Europe, and so show some bias and may not work as well for groups with recent ancestry from outside Europe or ancient populations from Europe whose ancestry is different from the present-day inhabitants. There can also be bias towards sequences represented in the standard reference genome
- These biases can effect the efficacy of imputation approaches which computationally fill in missing parts of ancient genomes based on predictions from the variants that do occur.
- Removal of ‘off-target’ sequences, including most human sequences and those related to infectious disease and the depositional environment.
- 1240K Capture data is limited to predefined variants, which means that the related data may not be able to be used to address all future research questions, i.e. this data has a reduced legacy value. For interest, if new variants directly relevant to human evolution are discovered in the future, it would not be possible to investigate them in data generated by Capture if they are not present on the 1240K Array.
WGS Advantages:
- ‘Gold standard’ of sequencing: much more complete coverage of the genome, with possibility of reporting on additional data compared to 1240K sequencing
- More reliable for questions of natural selection and fine-scale ancestry
- The acquisition of sequences from across the genome without any predetermination, potentially covering all possible genetic variants, means that WGS data should be applicable to any conceivable question of that data in the future, and so the data has a much greater legacy value
- No predetermination of sequences removes one source of bias that can affect downstream processes such as imputation
WGS Disadvantages:
- As most ancient data has been generated using 1240K Capture for comparability purposes studies often have to downsample to 1240K data for their analysis anyway (although the raw WGS data is usually released).
- As only a proportion (and sometimes only a small proportion) of sequences will be ‘on-target’ i.e. from the target organism, and sequencing is often the most expensive part of archaeogenetics, WGS is expensive and in some ways less efficient compared to Capture. However, this problem is increasingly being ameliorated by drops in the costs of sequencing.
- If a sample is poorly preserved, WGS data may not be easily comparable to published data (either Capture or WGS) because of the lack of overlap in sites. Sites represented in the data generated by 1240K Capture all have the potential to overlap, and therefore in these circumstances Capture may represent a much more efficient way forward.
Since sequencing is a destructive process, it is important to retrieve the maximum possible data at an early stage, considering that re-testing may not be an option. For all samples that have been subject to enrichment using the 1240k capture array, there are usually laboratory products (DNA extracts and libraries) which can be returned to for WGS.
However, on balance, we feel there is now a strong case for the general adoption of a WGS testing strategy for archaeological human remains. It is interesting to note that several organisations have already accepted WGS as standard and we would urge you to give this very serious consideration as the future policy of National Museums Scotland.
As already mentioned, we feel it is important that initial testing extracts as much data from archaeological human remains as possible, considering re-testing may not be possible. Such data has the potential to be of value to many fields of study, including anthropology, archaeology, genetic genealogy, genetics, health and history.