Evenness – How to view this with TruePlane
January 16th, 2018
The intention of this post is to help you using our evenness tool TruePlane which will give you an idea about the evenness of the NGS sequences you have.
What you need to use TruePlane
To use TruePlane, you will need sequences. You can use those sequences which were directly given by the next generation sequencing machine or sequences processed by you. The format in which the sequences should be is fastq, not in the zipped file format “gz”. For a total overview, you will need to supply all the sequences you have (see our splitting tool TrueSplit for easier upload), but you can get a hint of the evenness also with less sequences, as you can see when you try our example, which can be downloaded directly from the TruePlane webpage. Up until now, this tool is restricted to human paired end samples only!
The input screen
Here you can see our input screen. We need you to provide the desired data for our statistics and your email address to send you a notification when the tool has finished the analysis or some error occurred. You will need to provide paired-ended sequence files. In most of the cases they are named R1 for forward and R2 for reverse reads.
You will get a pdf as result with several different pages:
Chromosome size vs aligned reads
The first page gives you a hint about the connection between the unmasked chromosome size and the number of reads which were aligned here. If you have a complete evenness in you reads, then you would expect to get a high correlation between the size of the chromosome and the number of reads mapped (something like on every nucleotide is one read, so you would have a lower amount of reads on those chromosomes with a lower size). Of course, there may always be some bias through the alignment itself, which is why we use some uncertainty in this analysis. Basically, if your chromosomes are on a line, you have a good evenness. If you see some chromosome with a very high or very low number of reads compared to the size and the rest of the chromosomes, then you have some problem with the evenness.
Statistical test of size vs aligned reads
To give you a more concrete analysis, we included a statistical test of the above explained comparison of unmasked chromosome size and number of reads. These are the results of a simple linear model to check for statistical significance. A low p-value and a high R-square value indicate that it is highly unlikely, that the number of reads on a chromosome is independent from chromosome size.
To further narrow this down, we show you the residuals of the model on page 3. With these residuals, you can see if there is any problem in the model – which would be the case if you can identify a clear pattern here (to see no pattern at all, try our example set).
Single chromosome coverage
The rest of the pages are views of the coverage on the chromosomes. Of course, this is not a detailed analysis, it just gives you an indication together with the first pages if there is something wrong with the evenness of your sequences. For these views, we count the reads in a window of 1000 nucleotides and display this count.
If you have any question about the science behind these tips, or any other question about contamination in sequencing, please contact us. If you have any other suggestions for blog post topics, please let us know.