|
BioInfoBank Library Acta, Volumne:12, Issue:1
Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Gene expression technology, namely microarrays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of signifi cant help in the development of an effi cient cancer diagnosis and classifi cation platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classifi cation accuracy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classification. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results.
In order to select a small subset of informative genes from gene expression data for cancer classifi cation, many researchers have recently analyzed gene expression data using various computational intelligence methods. However, due to the small number of samples compared with the huge number of genes (high-dimension), irrelevant genes, and noisy genes, many of the computational methods face diffi culties in selecting such a small subset. Therefore, we propose an enhancement of binary particle swarm optimization to select the small subset of informative genes that is relevant for classifying cancer samples more accurately. In this method, three approaches have been introduced to increase the probability of the bits in a particle’s position being zero. By performing experiments on two gene expression data sets, we have found that the performance of the proposed method is superior to previous related works, including the conventional version of binary particle swarm optimization (BPSO), in terms of classifi cation accuracy and the number of selected genes. The proposed method also produces lower running times compared with BPSO.
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs to be addressed is the selection of a small subset of genes from the thousands of genes in the data that contribute to a disease. This selection process is difficult due to the availability of a small number of samples compared with the huge number of genes, many irrelevant genes, and noisy genes. Therefore, this article proposes an improved binary particle swarm optimization to select a near-optimal (small) subset of informative genes that is relevant for the cancer classifi cation. Experimental results show that the performance of the proposed method is superior to the standard version of particle swarm optimization (PSO) and other previous related work in terms of classification accuracy and the number of selected genes.
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in the data that contributes to a disease. This selection process is difficult because of the availability of the small number of samples compared to the huge number of genes, many irrelevant genes, and noisy genes. Therefore, this paper proposes an improved binary particle swarm optimisation to select a near-optimal (smaller) subset of informative genes that is relevant for cancer classification. Experimental results show that the performance of the proposed method is superior to a standard version of particle swarm optimisation and other related previous works in terms of classification accuracy and the number of selected genes.
Microarray data are expected to be useful for cancer classifi cation. However, the process of gene selection for the classifi cation contains a major problem due to properties of the data such as the small number of samples compared with the huge number of genes (higher dimensional data), irrelevant genes, and noisy data. Hence, this article aims to select a near-optimal (small) subset of informative genes that is most relevant for the cancer classification. To achieve this aim, an iterative approach based on genetic algorithms has been proposed. Experimental results show that the performance of the proposed approach is superior to other previous related work, as well as to four methods tried in this work. In addition, a list of informative genes in the best gene subsets is also presented for biological usage.
Gene expression technology namely microarray, offers the ability to measure the expression levels of thousands of genes simultaneously in a biological organism. Microarray data are expected to be of significant help in the development of efficient cancer diagnosis and classification platform. The main problem that needs to be addressed is the selection of a small subset of genes that contributes to a disease from the thousands of genes measured on microarray that are inherently noisy. Most approaches from previous works have selected the numbers of genes manually and thus, have caused difficulty, especially for beginner biologists. Hence, this paper aims to automatically select a small subset of informative genes that is most relevant for the cancer classification. In order to achieve this aim, a recursive genetic algorithm has been proposed. Experimental results show that the gene subset is small in size and yield better classification accuracy as compared with other previous works as well as four methods experimented in this work. A list of informative genes in the best subsets is also presented for biological usage.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnoses and classification platforms. In order to select a small subset of informative genes from the data for cancer classification, recently, many researchers are analyzing gene expression data using various computational intelligence methods. However, due to the small number of samples compared to the huge number of genes (high dimension), irrelevant genes, and noisy genes, many of the computational methods face difficulties to select the small subset. Thus,we propose an improved (modified) binary particle swarm optimization to select the small subset of informative genes that is relevant for the cancer classification. In this proposed method, we introduce particles’ speed for giving the rate at which a particle changes its position, and we propose a rule for updating particle’s positions. By performing experiments on ten different gene expression datasets, we have found that the performance of the proposed method is superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also produces lower running times compared to BPSO.
Microarray technology has provided biologists with the ability to measure the expression levels of thousands of genes in a single experiment. One of the urgent issues in the use of microarray data is the selection of a smaller subset of genes from the thousands of genes in the data that contributes to a disease. This selection process is difficult due to many irrelevant genes, noisy genes, and the availability of the small number of samples compared to the huge number of genes (higher-dimensional data). In this study, we propose an iterative method based on hybrid genetic algorithms to select a near-optimal (smaller) subset of informative genes in classification of the microarray data. The experimental results show that our proposed method is capable in selecting the near-optimal subset to obtain better classification accuracies than other related previous works as well as four methods experimented in this work. Additionally, a list of informative genes in the best gene subsets is also presented for biological usage.
Recent advances in microarray technology allow scientists to measure expression levels of thousands of genes simultaneously in human tissue samples. This technology has been increasingly used in cancer research because of its potential for classification of the tissue samples based only on gene expression levels. A major problem in these microarray data is that the number of genes greatly exceeds the number of tissue samples. Moreover, these data have a noisy nature. It has been shown from literature review that selecting a small subset of informative genes can lead to an improved classification accuracy. Thus, this paper aims to select a small subset of informative genes that is most relevant for the cancer classification. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed on two well-known microarray data. The experimental results have shown that the gene subsets are very small in size and yield better classification accuracy as compared with other previous works as well as four methods experimented in this work. In addition, a list of informative genes in the best subsets is also presented for biological usage.
Gene expression data measured by microarray machines are useful for cancer classification. However, it faces with several problems in selecting genes for the classification due to many irrelevant genes, noisy data, and the availability of a small number of samples compared to a huge number of genes (high-dimensional data). Hence, this paper proposes a two-stage gene selection method to select a smaller (near-optimal) subset of informative genes that is most relevant for the cancer classification. It has two stages: 1) pre-selecting genes using a filter method to produce a subset of genes; 2) optimising the gene subset using a multi-objective hybrid method to automatically yield a smaller subset of informative genes. Three gene expression data sets are used to test the effectiveness of the proposed method. Experimental results show that the performance of the proposed method is superior to other experimental methods and related previous works.
Microarray technology has provided biologists with the ability to measure the expression levels of thousands of genes in a single experiment. One of the urgent issues in the use of microarray data is the selection of a small subset of genes from the thousands of genes in the data that contributes to a disease. This selection process is difficult due to many irrelevant genes, noisy genes, and the availability of the small number of samples compared to the huge number of genes (high-dimensional data). In this study, we propose a three-stage gene selection method to select a small subset of informative genes that is most relevant for the cancer classification. It has three stages: 1) pre-selecting genes using a filter method to produce a subset of genes; 2) optimising the gene subset using a multi-objective hybrid method to yield near-optimal gene subsets; 3) analysing the frequency of appearance of each gene in the different near-optimal gene subsets to produce a small subset of informative genes. The experimental results show that our proposed method is capable in selecting the small subset to obtain better classification accuracies than other related previous works as well as five methods experimented in this work. Additionally, a list of informative genes in the final gene subsets is also presented for biological usage.
A microarray machine offers the capacity to measure the expression levels of thousands of genes simultaneously. It is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for cancer classifi cation. However, the urgent problems in the use of gene expression data are the availability of a huge number of genes relative to the small number of available samples, and the fact that many of the genes are not relevant to the classifi cation. It has been shown that selecting a small subset of genes can lead to improved accuracy in the classifi cation. Hence, this paper proposes a solution to the problems by using a multi objective strategy in a genetic algorithm. This approach was tried on two benchmark gene expression data sets. It obtained encouraging results on those data sets as compared with an approach that used a single-objective strategy in a genetic algorithm.
Gene expression data are expected to be of significant help in the development of effi cient cancer diagnosis and classifi cation platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classifi cation accuracy. A model for gene selection and classifi cation has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifi er. We show that the classification accuracy of the proposed model is useful for the cancer classifi cation of one widely used gene expression benchmark data set.
Pathway analysis has lead to a new era in genomic research by providing further biological process information compared to traditional single gene analysis. Beside the advantage, pathway analysis provides some challenges to the researchers, one of which is the quality of pathway data itself. The pathway data usually defined from biological context free, when it comes to a specific biological context (e.g. lung cancer disease), typically only several genes within pathways are responsible for the corresponding cellular process. It also can be that some pathways may be included with uninformative genes or perhaps informative genes were excluded. Moreover, many algorithms in pathway analysis neglect these limitations by treating all the genes within pathways as significant. In previous study, a hybrid of support vector machines and smoothly clipped absolute deviation with groups-specific tuning parameters (gSVM-SCAD) was proposed in order to identify and select the informative genes before the pathway evaluation process. However, gSVM-SCAD had showed a limitation in terms of the performance of classification accuracy. In order to deal with this limitation, we made an enhancement to the tuning parameter method for gSVM-SCAD by applying the B-Type generalized approximate cross validation (BGACV). Experimental analyses using one simulated data and two gene expression data have shown that the proposed method obtains significant results in identifying biologically significant genes and pathways, and in classification accuracy.
Microarray data are expected to be useful for cancer classification. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in the data that contributes to a cancer disease. This selection process is difficult due to many irrelevant genes, noisy data, and the availability of the small number of samples compared to the huge number of genes (higher-dimensional data). Hence, this paper aims to select a smaller subset of informative genes that is the most relevant for the cancer classification. To achieve the aim, a cyclic hybrid method has been proposed. Five real microarray data sets are used to test the effectiveness of the method. Experimental results show that the performance of the proposed method is superior to other experimental methods and related previous works in terms of classification accuracy and the number of selected genes. In addition, a scatter gene graph and a list of informative genes in the best gene subsets are also presented for biological usage.
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.
|
Polish News |
|
||
|
|