TPJ Editor choice - June 2024: Use the needle in the haystack: Spike-ins as a normalization for RNAseq

Discover the Plant Journal editor's choice by Gwendolyn Kirschner

Use the needle in the haystack: Spike-ins as a normalization for RNAseq

In the mid-2000s, high throughput sequencing replaced the use of micro-arrays, and has been used since then to study gene expression, alternative splicing and cis-regulatory elements. Sample preparation affects the total number of transcripts in the sample, and thereby the sequencing coverage. Commonly used tools for analysing differential expression normalize for read depth based on the assumption that expression of most genes does not change between samples, i.e. that the overall RNA abundance between samples is the same. However, many experimental conditions can affect the level of transcription or RNA stability, and therefore total RNA abundance.

For a more reliable differential expression analysis, Laosuntisuk et al. explored the use of RNA spike-ins as a way of normalizing the samples. RNA spike-ins are synthetic control RNAs that are added to the isolated RNA of each sample, and serve as a scale for the subsequent differential expression analysis (Figure 1). They are commonly used for RNAseq experiments in yeast, animal and human samples, but less frequently in plants.

First, the authors used a synthetic dataset to demonstrate the effect of RNA spike-ins. They created a dataset of samples in two conditions, in which the samples of the second condition comprised more reads per sample. The classical ‘Median of Ratio’ normalization method calculates the median of the read count ratio in one sample to a geometric mean across all samples. By this method, transcript abundance was scaled between the two conditions to the same level, and therefore did not represent the difference in total transcript abundance under the second condition. Using RNA spike-ins, however, reflected the increased transcript abundance in the samples under the second condition, suggesting that the spike-in control removed the technical variations but maintained global differences in transcript abundance.

Then, the authors tested the spike-in normalization on a dataset of RNA from sorghum leaves, sampled under control and chilling stress conditions at both morning and evening time points. The samples collected at the evening had a higher average read count than the ones collected in the morning. As in the artificial dataset, the classical normalization method removed the variation between replicates and aligned all samples to the same level, while spike-in normalization removed the variation between replicates but kept the variation between samples. Additionally, the authors detected upregulated genes in the evening sample with the spike-in normalization that were not detected by the traditional method. This suggests that the reduced variation between samples of a specific condition, and the increase of the average expression in all evening samples influenced the statistical significance of their difference, the total change in expression level, or both. Gene Ontology enrichment analysis showed that evening-specific functions and cellular processes may be under-represented or missed using traditional DEG analysis. Furthermore, the spike-in normalization identified more chilling down-regulated genes in the morning, but more chilling up-regulated genes in the evening, while the traditional method detected similar numbers both morning and evening.

In summary, this suggests that the use of RNA spike-ins improves DEG identification when the overall gene expression is altered between different samples. This might be particularly valuable for single cell RNAseq experiments, in which transcript abundance is highly variable between samples and not all transcripts can be captured and sequenced due to cell-specific inefficiencies in reverse transcription.

TPJ - June 2024.png — **Figure 1: RNA spike-ins as a normalization method for RNAseq.**

If one experimental condition (Condition B) alters the expression of a specific group of genes, and the dataset is normalized based on transcript abundance, the expression of the other genes is underrepresented under that condition (left); RNA spike-ins maintain the proportion between the gene groups (right).

Author

Gwendolyn Kirschner

TPJ Research Highlight Editor, The Plant Journal

Plant molecular biologist with a great passion for root development and signalling pathways. After completing my PhD studies at the University of Düsseldorf, Germany, I worked at the University of Bonn, Germany, and currently at the King Abdullah University of Science and Technology (KAUST), Saudi-Arabia. My work has focused on studying the molecular mechanisms guiding root development of model plants and cereals, such as barley and rice, but also the parasitic plant Striga and date palm.

Email:: [email protected]
Website:: scholar.google.com/citations?user=1stejVkAAAAJ&hl=en

TPJ Editor choice - June 2024: Use the needle in the haystack: Spike-ins as a normalization for RNAseq

Related topics

You may also be interested in

Royal Society Publishing: New Open Access theme Issue on Life in Natural Microcosms

Just a few days left to register for the “New Insights from Plant Structural Biology" webinar

June 2026 TPJ Editor choice: Friendly fungi: Volatile compounds released during plant-fungal interaction stimulate plant growth

Call for Papers: Contribute to the Plant Biotechnology journal upcoming Special Issue

Register now for the International SPS Symposium 2026

May 2026 TPJ Editor choice: From orphan to global player: what 10,617 cowpea accessions reveal about geographic structure