JavaScript is disabled. Please enable to continue!

Mobile search icon
Bioinformatics Services >> RNA Sequencing >> Denovo & Reference Based RNASeq

Denovo & Reference Based RNASeq

RNA-Seq is becoming a prominent method to study the entire set of RNA transcripts (mRNA, rRNA, tRNA, and other non-coding RNA). It is highly sensitive method to calculate gene expression by sequencing of cDNA libraries converted from mRNA. In distinction with the genome, the transcriptome actively changes. Whole transcriptome analysis enables the discovery of rare genes, mRNA profiling, gene expression analysis, splice junctions, gene fusions, uncover the pathways in which genes has expressed and detection of novel transcripts in both coding and non-coding RNA in model and non-model organism. Comparison of transcriptomes allows study of differentially expressed genes in distinct cell populations, or in response to therapeutics and different treatments.

Eurofins genomics offers comprehensive RNASeq sequencing on Illumina platform. We offer customized experimental design of RNA sequencing for plant, animal, bacteria etc. We have developed standard pipeline for advanced bioinformatics analysis of RNA sequencing data and we provide support of experienced scientist for end to end analysis. RNASeq data analysis can be done on both model and non-model species. De novo transcriptome analysis performed for non-model organism whereas referenced based mapping preferred for model organism. Eurofins has built up a proprietary software pipeline for de novo assembly of short Illumina reads to provide cost-efficient methods to customer.

Eurofins provides various RNASeq services as listed below:

  • Bacterial/fungus/Animal/Plant Denovo RNASeq
  • Reference Based RNASeq

 

Workflow and deliverables

a. Quality check of raw reads:

The raw reads will be subjected to quality filtration and adapter trimming. The primer sequences, poly(A) tails and reads produced from ribosomal DNA templates will be removed. The high quality data will be used for downstream analysis.

b. Denovo Assembly:

The De-novo transcriptome assembly of clean high quality reads will be carried out using the evaluated assemblers. Reads will be assembled using optimized parameters such as Kmer length, coverage cutoff, standard deviation, expected coverage and read tracking, etc. These reads will assemble into transcripts. The assembly will be evaluated, based on transcriptome length obtained, transcript N50, length distribution of transcripts, etc.

The non-redundant transcripts will be further clustered together. The clustering, results in sequences that can no longer extended. Such sequences are defined as unigenes.

c. Coding sequence (CDS) Prediction

TransDecoder will be used to predict coding sequences from unigenes. TransDecoder identifies candidate coding regions within unigene sequences.

d. Functional annotation

The predicted CDS will be annotated against NCBI non redundant protein database (Nr), Swissprot, Kyoto Encyclopedia of Genes and Genomes(KEGG), Cluster of Orthologous Group(COG) databases using Basic local alignment search tool (BlastX).

e. Functional Annotation of KEGG Pathway

To identify the potential involvement of the predicted CDS in biological pathways, CDS will be mapped to reference canonical pathways in KEGG. All the CDS are classified mainly under five categories: Metabolism, Cellular processes, Genetic information processing, Environmental information processing. The output of KEGG analysis includes KEGG Orthology (KO) assignments and Corresponding Enzyme commission (EC) numbers and metabolic pathways of predicted CDS using KEGG automated annotation server KASS(http://www.genome.jp/kaas-bin/kaas_main).

f. Functional annotation of Gene Ontology Analysis

OmicBox analysis of the samples will reveal the most abundant GO terms related to biological processes, Molecular function and Cellular components and their respective number of genes as well as other highly represented terms will also be represented in the annotated transcriptome. GO terms will be assigned to CDS for functional categorization.

g. Transcription Factor analysis

The transcription factors (TFs) are sequence specific DNA binding proteins interacting with the promoter regions of target genes to modulate their expression. These proteins play a very important role in regulation of plant development, reproduction, intercellular signaling, response to environment, cell cycle and are also important in the modulation of secondary metabolites biosynthesis. To obtain transcription factor, the predicted CDS will be searched against plant transcription factor database.

h. Simple Sequence Repeat (SSR) Identification

SSRs, also known as microsatellites, are tandem repeated motifs of 1–6 bases and serve as the most important molecular markers in population and conservation genetics, molecular epidemiology and pathology, and gene mapping. SSRs were detected from assembled unigene.

i. Differential Gene Expression Analysis

Differential expression analysis will be carried out using FPKM value. FPKM will be calculated on the basis of number of reads onto the particular transcript. DESeq will be used to calculate baseMean value and to identify significantly expressed transcripts between the experimental condition and control condition.

Deliverables

Denovo based RNASeq

  • Quality filtration of reads
  • Denovo assembly generating transcript/unigene
  • Summary statistics
  • CDS prediction
  • Functional annotation using NCBI NRdb, Swiss prot, COG
  • KEGG-pathway analysis
  • Gene ontology analysis
  • SSR Identification
  • Differential gene expression analysis(if more than one sample)
  • Heat Map, Volcano plot, scatter plot
  • Comprehensive report with publication standard methodology, graphs and tables.

Reference based RNASeq

  • Quality filtration of reads
  • Mapping on the reference genome
  • Alignment summary statistics
  • Differential gene expression analysis based on RPKM/FPKM
  • Functional annotation of differentially expressed genes
  • List of upregulated and downregulated genes
  • Statistical significant genes
  • HeatMap, volcano Plot, scatter plot,
  • Pathway analysis
  • SNP analysis
  • SSR Identification
  • .bam file for visualization of alignment
  • Comprehensive report with publication standard methodology, graphs and tables.