Contamination – What to keep in mind when doing NGS sequencing

The intention of this post is to bring a certain problem up onto the surface which is often ignored or even unknown: Contamination in the sequences acquired from Next Generation Sequencing.

Contamination due to handling of the samples before sequencing

Of course, all of us know that a clean lab is top priority for good work, especially if you handle DNA or RNA sequences. But it is even more important to keep a clean cell culture. Mycoplasma is an annoying candidate for establishing itself in all your cells in culture and even altering your cells. Mycoplasma is hard to spot and even harder to get rid of and of course, you do not want to sequence Mycoplasma instead of the cells in culture.

Contamination due to sequencing itself

But even if you have a clean lab and no contamination in your cell culture, contamination may happen during sequencing itself. Today available NGS machines are very capable, but nor error-free. The sequencing itself is a statistical process and it may happen, that the index-reads of samples in a lane with other samples is misread and you get reads from other users of this lane – a rare occasion, but it happens sometimes.

A second possibility of contamination is from the plate itself. Normally they are used multiple times and if now washed very carefully, the reads from the user of the run before may pop up into your reads.

[Side note: Of course, it is easy to see contaminations between species like Mycoplasma in human cell culture, but it is not possible to identify human contamination in human samples. Please keep in mind that this contamination may also happen and those contaminations may result in wrong results of your whole experiment even if you thought everything was alright!]

Why should I check this?

The advantages are obvious – the “wrong” sequences aka your contamination – are ignored by the tools which will align the sequences to the source genome. You will lose a little up to a lot of sequences during this process, which may be vital for the analysis you planned to do. If you planned for 20,000,000 sequences to cover your genome to a certain extent, maybe to be able to see some rare SNVs, then you do not want to have only 5,000,000 (with which you are not able to see the SNV).

What to do

The sequences you got from the sequencing machine can – and should! – be easily checked for contamination. For this check, you will need only a little number of sequences randomly taken from the whole bunch. Those “few” sequences can then be checked for their origin and with our new tool, which will be launched soon as a web service, be easily depicted. In only one figure you can clearly see if and which contamination has taken place.

If you have any question about the science behind these tips, or any other question about contamination in sequencing, please contact us. If you have any other suggestions for blog post topics, please let us know.