Current location - Trademark Inquiry Complete Network - Overdue credit card - Drawing of "Symmetric Scatter Plot" (R Language)
Drawing of "Symmetric Scatter Plot" (R Language)

In transcriptome analysis, after calculating differentially expressed genes between two groups, how is it usually expressed? Your first thought might be to use a volcano chart. Indeed, the volcano plot is the most frequently used. In the volcano plot, the profile of differentially expressed genes can be easily identified and judged based on the Fold Change value of the gene between the two groups and the significance p value. The volcano chart is essentially a scatter plot. Usually the horizontal and vertical coordinates represent the Fold Change after log2 transformation and the p value or p adjustment value information after -log10 transformation (left of the figure below). When it comes to scatter plots, there is another common style for displaying differentially expressed genes: the horizontal and vertical axes can represent the mean expression of two groups of genes respectively. This style can make it easier to intuitively compare the differential status of genes in the two groups.

In this tutorial, let us learn how to draw a "symmetrical scatter plot" like the one on the right to show the differential gene expression pattern between groups.

The example file "gene_diff.txt" is a set of gene differential expression analysis results, which records genes with significantly inconsistent expression between the treatment group (treat) and the control group (control). The identification standards are plt; 0.01 and |log2 Fold Change|≥1.

Among them, gene_id is the name of the gene; control and treat represent the average expression value of the gene in the two groups; log2FoldChange is the gene expression difference fold after log2 transformation; pvalue is the significance p value of the differential gene; diff is Differential genes screened based on plt;0.01 and |log2 Fold Change|≥1, "up" in this column means up-regulated, "down" means down-regulated, and "none" means non-differential genes.

Next, through this example file, we will demonstrate the process of using R language to draw a "symmetrical scatter plot" of differential gene expression.

First do some preprocessing on the data.

For example, if the gene expression values ??are too different in magnitude, perform a logarithmic transformation; the gene names are sorted according to whether they are differential genes to avoid being obscured by insignificant gene points in subsequent plots, that is, the sorted The goal is to place the points of these significant genes at the top of the graph.

After that, you can use the preprocessed data to draw graphs.

The first type is to color genes according to up-regulated, down-regulated or insignificant types to facilitate the identification of differential genes from the graph. We used the ggplot2 method to draw differential gene scatter plots.

The two coordinate axes represent the treatment group (treat) and the control group (control) respectively. The points in the figure represent the average expression value of each gene in the two groups (log transformation has been performed). Compared with the control group, the up-regulated genes are shown in red and the down-regulated genes are shown in green. The dashed line in the figure represents the threshold line when |log2FC|=1.

In this figure, we can easily observe the overall distribution status and quantitative comparison information of differential genes.

The p-value information is not displayed in the above figure. So another way of thinking is that the color represents the p value, so that you can get a gradient in the graph. It is also drawn using the ggplot2 method. Compared with the above process, there is only a difference in color specification.

Similar to the figure above, the two coordinate axes represent the treatment group (treat) and the control group (control) respectively. The points in the figure represent the average expression value of each gene in the two groups (log has been made conversion), the dotted line in the figure represents the threshold line when |log2FC|=1.

The difference from the above picture is that at this time, the genes are colored according to the significance p value, never significant gt; the significant display is blue gt; the red gradient, a kind of gradient information is obtained. In this way, it can be easily seen that the gene with the greater the difference in expression value between the two groups has a smaller p value, and the trend of the two is consistent. The focus is on describing the relationship between the difference fold and the p value.