Skip to the content.

CNR Bioinformatics Workshop

Back to curriculum

3. Secondary Analysis

We are going to analyze aligned scRNAseq data in R, the most popular language for this type of work at the moment.

3.1 Install R libraries with Miniconda

First, let’s install some packages with conda. Fire up the cmd.

conda activate myEnv
conda install -c conda-forge r-base
conda install -c conda-forge r-seurat
conda install -c conda-forge r-ggplot2
conda install -c conda-forge r-dplyr

3.2 Import R libraries, set working directory and data

library(Seurat)
Attaching SeuratObject

Attaching sp

3.3 Load data

df <- Read10X("./outs/filtered_feature_bc_matrix/")
df0 <- CreateSeuratObject(counts = df,
                         assay = "RNA",
                         project = "hCO",
                         min.cells = 10,
                         min.features = 200)
Warning message:
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
df0
An object of class Seurat 
16527 features across 2342 samples within 1 assay 
Active assay: RNA (16527 features, 0 variable features)
df0$percent.mt <- PercentageFeatureSet(df0, pattern = "^MT-")
options(repr.plot.width = 10, repr.plot.height = 5)
VlnPlot(df0,
       log = T,
       pt.size = 0,
       features = c("nFeature_RNA", "nCount_RNA", "percent.mt"))

png

Keep cells with more than 1000 UMI and less than 10% of reads mapped to mitochondrial genome

df0 <- df0[ , df0$nCount_RNA > 1000 & df0$percent.mt < 10]
df0
An object of class Seurat 
16527 features across 2125 samples within 1 assay 
Active assay: RNA (16527 features, 0 variable features)

Remove mitochondrial genes

MTs <- grep("^MT-", rownames(df0), value = T)
df0 <- df0[!(rownames(df0) %in% MTs)]
df0
An object of class Seurat 
16514 features across 2125 samples within 1 assay 
Active assay: RNA (16514 features, 0 variable features)

3.4 Normalization between different read depth (total UMI counts) by cell

df0 <- NormalizeData(df0, normalization.method = "LogNormalize", scale.factor = 10000)

3.5 Find highly variable genes

df0 <- FindVariableFeatures(df0, selection.method = "vst", nfeatures = 2000)
hvg <- VariableFeatures(df0)
top50 <- head(VariableFeatures(df0), 50)
options(repr.plot.width = 10, repr.plot.height = 4)
plot1 <- VariableFeaturePlot( df0, log = T )
plot2 <- LabelPoints(plot = plot1, points = top50, repel = TRUE)
plot1 + NoLegend() + plot2 + NoLegend()
When using repel, set xnudge and ynudge to 0 for optimal results

Warning message:
"ggrepel: 29 unlabeled data points (too many overlaps). Consider increasing max.overlaps"

png

3.6 Scaling data

df0 <- ScaleData(df0, features = hvg)
Centering and scaling data matrix

3.7 Perform principal component analysis

df0 <- RunPCA(df0, npcs = 50, features = hvg, verbose = F)
options(repr.plot.width = 5, repr.plot.height = 5)
ElbowPlot(df0, ndims = 50)

png

3.8 2D representation with UMAP

df0 <- RunUMAP(df0, dims = 1:30)
Warning message:
"The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session"
11:03:49 UMAP embedding parameters a = 0.9922 b = 1.112

11:03:49 Read 2125 rows and found 30 numeric columns

11:03:49 Using Annoy for neighbor search, n_neighbors = 30

11:03:49 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

11:03:49 Writing NN index file to temp file C:\Users\ZLI2\AppData\Local\Temp\3\RtmpALR9le\file290435047be8

11:03:49 Searching Annoy index using 1 thread, search_k = 3000

11:03:49 Annoy recall = 100%

11:03:50 Commencing smooth kNN distance calibration using 1 thread

11:03:50 Initializing from normalized Laplacian + noise

11:03:50 Commencing optimization for 500 epochs, with 82866 positive edges

11:03:58 Optimization finished
options(repr.plot.width = 10, repr.plot.height = 10)
DimPlot(df0, pt.size = 1,reduction = "umap")

png

df0 <- FindNeighbors(df0, reduction = "umap", dims = 1:2)
Computing nearest neighbor graph

Computing SNN
df0 <- FindClusters(df0)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 2125
Number of edges: 44905

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9030
Number of communities: 20
Elapsed time: 0 seconds
options(repr.plot.width = 10, repr.plot.height = 10)
DimPlot(df0, label = T, group.by = "seurat_clusters", reduction = "umap")

png

Adjust resolution to get more or fewer clusters

df0 <- FindClusters(df0, resolution = 0.1)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 2125
Number of edges: 44905

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9652
Number of communities: 8
Elapsed time: 0 seconds
DimPlot(df0, label = T, group.by = "seurat_clusters", reduction = "umap")

png

3.9 Differential gene expression analysis

DEX <- FindAllMarkers(df0, only.pos = T)
Calculating cluster 0

For a more efficient implementation of the Wilcoxon Rank Sum Test,
(default method for FindMarkers) please install the limma package
--------------------------------------------
install.packages('BiocManager')
BiocManager::install('limma')
--------------------------------------------
After installation of limma, Seurat will automatically use the more 
efficient implementation (no further action necessary).
This message will be shown once per session

Calculating cluster 1

Calculating cluster 2

Calculating cluster 3

Calculating cluster 4

Calculating cluster 5

Calculating cluster 6

Calculating cluster 7
DEX
A data.frame: 7124 × 7
p_valavg_log2FCpct.1pct.2p_val_adjclustergene
<dbl><dbl><dbl><dbl><dbl><fct><chr>
THSD7A6.045978e-1961.86973580.9360.3449.984329e-1920THSD7A
CDH78.096795e-1881.88788900.6770.0881.337105e-1830CDH7
CADM19.910354e-1651.45155270.9610.5941.636596e-1600CADM1
NTS1.028108e-1562.38117020.7530.1921.697817e-1520NTS
LINC006436.132768e-1561.21989390.8210.2191.012765e-1510LINC00643
CNTNAP21.002729e-1461.33811750.9750.5781.655906e-1420CNTNAP2
NEUROD61.564023e-1451.32104390.9940.5732.582828e-1410NEUROD6
BCL11B5.386061e-1411.29897530.9500.4898.894541e-1370BCL11B
GABRG21.876424e-1331.14507370.7760.2313.098726e-1290GABRG2
KAZN2.155009e-1321.25004930.8570.3853.558781e-1280KAZN
GPR223.700312e-1311.32313710.7710.2196.110695e-1270GPR22
GABRB31.038149e-1301.22828070.8680.3991.714400e-1260GABRB3
DOK62.392019e-1291.19333330.8300.3203.950180e-1250DOK6
OPCML2.384368e-1270.95938670.6660.1563.937545e-1230OPCML
MAP1B1.467895e-1250.95868510.9980.9742.424082e-1210MAP1B
GUCY1A34.140991e-1250.91080900.5130.0696.838433e-1210GUCY1A3
CELF41.823213e-1241.20766580.8960.4183.010853e-1200CELF4
CRYM2.752345e-1231.18465650.7030.1714.545222e-1190CRYM
PCSK21.134087e-1220.83808550.6030.1121.872832e-1180PCSK2
PDE1A1.246173e-1221.11442710.9410.4282.057930e-1180PDE1A
FEZF27.187010e-1211.12844460.9110.4571.186863e-1160FEZF2
VSNL11.687290e-1201.37334700.6450.1422.786390e-1160VSNL1
LMO32.402370e-1181.17535940.9180.4573.967274e-1140LMO3
GRIA21.041490e-1171.12113220.9670.5181.719916e-1130GRIA2
LPL2.722715e-1171.54279830.7420.2844.496291e-1130LPL
ENC16.710271e-1161.32808150.9660.6161.108134e-1110ENC1
BCL11A7.457177e-1151.05864760.9810.6821.231478e-1100BCL11A
LIN7A1.133347e-1130.95623240.7730.2871.871609e-1090LIN7A
EDIL31.759098e-1131.68500540.6550.2032.904974e-1090EDIL3
ADAMTS37.684031e-1130.73524530.4820.0671.268941e-1080ADAMTS3
VEZT10.0063679940.45446070.5940.57517VEZT
EDF10.0063969410.27686220.9280.92517EDF1
CACNB30.0064163180.31978750.3040.20617CACNB3
KCTD1610.0064779610.43579450.2460.14617KCTD16
ZFAND610.0065560650.31088030.6090.52617ZFAND6
FDPS20.0065659190.42124220.6810.68417FDPS
SLC35B40.0067065300.40908990.3330.24117SLC35B4
KAT6B10.0067193250.37024660.6670.63517KAT6B
SECISBP20.0067410260.41912030.5800.55417SECISBP2
PPIB20.0067973690.33603250.8410.86217PPIB
ACAT220.0068378450.32733650.5940.50517ACAT2
TM7SF210.0069477300.33362450.4780.39417TM7SF2
ASCL120.0070457820.48274410.2610.14617ASCL1
FAM129B0.0073811910.25108920.1010.03917FAM129B
TOMM220.0074270350.26217660.6520.60917TOMM22
FKBP1A0.0076074800.37419910.7390.72617FKBP1A
SERINC510.0078750800.43973360.3330.23217SERINC5
SRSF220.0079774500.32774160.7100.70017SRSF2
FADS110.0080171620.38829290.5940.57117FADS1
ZNF9310.0080429450.25574330.2170.12417ZNF93
HNRNPU10.0085786800.34049480.9420.88917HNRNPU
UQCRC120.0088972870.40979740.5940.56717UQCRC1
ADH530.0092962250.29252110.7390.76017ADH5
USP30.0093848210.39556590.4930.45217USP3
CHD60.0096134830.44031420.4780.41017CHD6
DUBR0.0096584390.34390890.1880.10617DUBR
STK17A0.0096947490.55279150.4490.38017STK17A
EZH210.0097403670.29613420.3620.26117EZH2
COMMD50.0098852920.30976480.3040.21617COMMD5
ATP5L0.0099985820.27544770.9570.94217ATP5L

3.10 Save files

write.csv(DEX, "DEX.csv")
saveRDS(df0, file = "df0.rds")

« Previous Next »