Procedure of fundamental data analysis

From CelliP-en

Jump to: navigation, search

Here, we will describe the procedures to construct a pipeline that normalize, identify differentially expressed genes by the t-test, and cluster the data. The example file can be obtained at http://xip.hgc.jp/samples/prostateTudoPronto.csv . This data sample is composed of 57 columns (each column represents one subject) and 24,000 rows (each row represents one microarray probe). The first 32 columns (from the left) are gene expressions obtained from normal people, while the other 25 are from people with prostate tumor. This pipeline uses the R server, thus please set up previously the R server.

Contents

Normalization

Usually, before any microarray data analysis, the data is normalized. Here, the normalization is carried out by the Fast Loess algorithm. The component Normalize fast Loess is used in this step.

Normalize fast Loess を行うパイプライン

XML file of the entire pipeline

Assembling pipeline

As shown in the figure above, connect the following components: Input EDF to R, Normalize fast Loess, Export RMatrix to JMatrix, and General JData viewer & editor. The parameters of each component are set up as shown below.

Inputting of the parameters to each component

Run and Result

‎Normalize fast Loess が行われた後の行列

The normalized data is composed of a large matrix of 24,000~ rows and 57 columns.

t test

Then, we will apply the t-test to the normalized data. We will apply the t-test to one probe. The component that will be used is the T-test.

T-test を行うパイプライン

XML file of the entire pipeline

Assembling pipeline

Delete the component Normalize fast Loess from the pipeline and add the following components T-test, R evaluated result to log, Export RPrimitive to JPrimitive x 2, Export RVector to JVector x 2, General JData viewer & editor x 4 in the canvas. Connect them as illustrated in the figure above. The components R evaluated result to log, Export RPrimitive to JPrimitive x 2, Export RVector to JVector x 2, General JData viewer & editor x 4 are used only to facilitate the visualization of the results. The parameters of each component are displayed below.

Inputting of the parameters to each component

Run and Result

T-test t value.png T-test p-value.png T-test 95.png T-test mean.png

The results of the t-test is visualized at the following tabs: "t value", "p-value", "95 percent confidence interval", and "mean of x, mean of y".

Clustering

Now, we will cluster the data using the hierarchical clustering algorithm in the normalized data. Due to the high processing time required to cluster 24,000 probes, here, we will limit to 1,000 probes. The file containing 1,000 probes can be obtained at http://xip.hgc.jp/samples/test1000.csv . The components that will be used are the Hierarchical clustering and the Hierarchical clustering plot.

Hierarchical clustering plot を行うパイプライン

XML file of the entire pipeline

Assembling pipeline

Construct the pipeline as illustrated in the figure above by using the following components: Edit R script x 2, Hierarchical clustering, and Hierarchical clustering plot.

Inputting of the parameters to each component

Run and Result

Hierarchical clustering plot View.png
Hierarchical clustering plot View Part.png

By running the pipeline, the result of the clustering algorithm is displayed in the dialog window. The labels 0 to 31 represent normal samples while the labels 32 to 56 are the tumor samples. By analyzing the plot in the right side, it is possible to notice that the tumor and normal samples are clustered.

Personal tools
In other languages