Abstract:
In our research, we want to make it easier for scientists to use gene data from different experiments. Nowadays, researchers use various methods to collect gene information, resulting in datasets that represent the same body parts but are gathered differently. The first issue we tackle is making sure that the gene information is comparable across these different datasets. We use a specific method/tool called Consensus Clustering followed by Average Linkage Hierarchical Clustering to make sure the gene data is presented in a way that makes it easy to compare and analyze. The second challenge is dealing with different names for genes used in different studies. This is made more complicated by the fact that different studies might cover different sets of genes. To solve this problem, we use something called the Llama model. This model helps us create a common language for gene names, so we can easily combine information from datasets that use different gene names. By solving these challenges, our research helps create a single, organized dataset. This dataset becomes a solid foundation for making more accurate computer models and gaining better insights into biology using the latest technology in genomics.