Responsive image

Zeitschriftenbeitrag (peer-reviewed)

2021

Göttl, Q.; Tönges, Y.; Grimm, D.; Burger, J. (2021): Automated Flowsheet Synthesis Using Hierarchical Reinforcement Learning: Proof of Concept. Chemie Ingenieur Technik 2021.
| Volltext

Recently we showed that reinforcement learning can be used to automatically generate process flowsheets without heuristics or prior knowledge. For this purpose, SynGameZero, a novel two-player game has been developed. In this work we extend SynGameZero by structuring the agent's actions in several hierarchy levels, which improves the approach in terms of scalability and allows the consideration of more sophisticated flowsheet problems. We successfully demonstrate the usability of our novel framework for the fully automated synthesis of an ethyl tert-butyl ether process.

Matschinske, J.; Alcaraz, N.; Benis, A.; Golebiewski, M.; Grimm, D.; Heumos, L.; Kacprowski, T.; Lazareva, O.; List, M.; Louadi, Z.; Pauling, J.; Pfeifer, N.; Röttger, R.; Schwämmle, V.; Sturm, G.; Traverso, A.; Van Steen, K.; Vaz de Freitas, M.; Villalba Silva, G.; Wee, L.; Wenke, N.; Zanin, M.; Zolotareva, O.; Baumbach, J.; Blumenthal, D. (2021): The AIMe registry for artificial intelligence in biomedical research. Nature Methods 2021 (18).
| Volltext

We present the AIMe registry, a community-driven reporting platform for AI in biomedicine. It aims to enhance the accessibility, reproducibility and usability of biomedical AI models, and allows future revisions by the community. 

Göttl, Q.; Grimm, D.; Burger, J. (2021): Automated Synthesis of Steady-State Continuous Processes using Reinforcement Learning. Frontiers of Chemical Science and Engineering.
| Volltext

Automated flowsheet synthesis is an important field in computer-aided process engineering. The present work demonstrates how reinforcement learning can be used for automated flowsheet synthesis without any heuristics of prior knowledge of conceptual design. The environment consists of a steady-state flowsheet simulator that contains all physical knowledge. An agent is trained to take discrete actions and sequentially built up flowsheets that solve a given process problem. A novel method named SynGameZero is developed to ensure good exploration schemes in the complex problem. Therein, flowsheet synthesis is modelled as a game of two competing players. The agent plays this game against itself during training and consists of an artificial neural network and a tree search for forward planning. The method is applied successfully to a reaction-distillation process in a quaternary system.

2020

Genze, N.; Bharti, R.; Grieb, M.; Schultheiss, S.; Grimm, D. (2020): Accurate Machine Learning-Based Germination Detection, Prediction and Quality Assessment of Three Grain Crops. Plant Methods 2020 (16).
| Volltext

Background

Assessment of seed germination is an essential task for seed researchers to measure the quality and performance of seeds. Usually, seed assessments are done manually, which is a cumbersome, time consuming and error-prone process. Classical image analyses methods are not well suited for large-scale germination experiments, because they often rely on manual adjustments of color-based thresholds. We here propose a machine learning approach using modern artificial neural networks with region proposals for accurate seed germination detection and high-throughput seed germination experiments.

Results

We generated labeled imaging data of the germination process of more than 2400 seeds for three different crops, Zea mays (maize), Secale cereale (rye) and Pennisetum glaucum (pearl millet), with a total of more than 23,000 images. Different state-of-the-art convolutional neural network (CNN) architectures with region proposals have been trained using transfer learning to automatically identify seeds within petri dishes and to predict whether the seeds germinated or not. Our proposed models achieved a high mean average precision (mAP) on a hold-out test data set of approximately 97.9%, 94.2% and 94.3% for Zea maysSecale cerealeand Pennisetum glaucum respectively. Further, various single-value germination indices, such as Mean Germination Time and Germination Uncertainty, can be computed more accurately with the predictions of our proposed model compared to manual countings.

Conclusion

Our proposed machine learning-based method can help to speed up the assessment of seed germination experiments for different seed cultivars. It has lower error rates and a higher performance compared to conventional and manual methods, leading to more accurate germination indices and quality assessments of seeds.

Gumpinger, A.; Rieck, B.; Grimm, D.; Borgwardt, K. (2020): Network-guided search for genetic heterogeneity between gene pairs. Bioinformatics 2021 (1), S.57-65.
Volltext

Togninalli, M.; Seren, Ü.; Freudenthal, J.; Monroe, J.; Meng, D.; Nordborg, M.; Weigel, D.; Borgwardt, K.; Korte, A.; Grimm, D. (2020): AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucleic Acids Research 48.
| Volltext

Abstract

Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.

2019

Bharti, R.; Grimm, D. (2019): Current challenges and best-practice protocols for microbiome analysis. Briefings in Bioinformatics.
| Volltext

Abstract

Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).

Brown, P.; ..., ..; Grimm, D.; ..., ..; Zhou, Y. (2019): Large expert-curated database for benchmarking document similarity detection in biomedical literature search. Database 2019.
| Volltext

Abstract

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcHconsortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Castellanos-Rizaldos, E.; Zhang, X.; Tadigotla, V.; Grimm, D.; Karlovich, C.; Raez, L.; Skog, J. (2019): Exosome-based detection of activating and resistance EGFR mutations from plasma of non-small cell lung cancer patients. Oncotarget 10 (30), S.2911-2920.
| Volltext

Non-small cell lung cancer (NSCLC) is the most prevalent form of lung cancer and its molecular landscape has been extensively studied. The most common genetic alterations in NSCLC are mutations within the epidermal growth factor receptor (EGFR) gene, with frequencies between 10-40%. There are several molecular targeted therapies for patients harboring these mutations.

Liquid biopsies constitute a flexible approach to monitor these mutations in real time as opposed to tissue biopsies that represent a single snap-shot in time. However, interrogating cell free DNA (cfDNA) has inherent biological limitations, especially at early or localized disease stages, where there is not enough tumor material released into the patient’s circulation.

We developed a qPCR- based test (ExoDx EGFR) that interrogates mutations within EGFR using Exosomal RNA/DNA and cfDNA (ExoNA) derived from plasma in a cohort of 110 NSCLC patients.

The performance of the assay yielded an overall sensitivity of 90% for L858R, 83% for T790M and 73% for exon 19 indels with specificities of 100%, 100%, and 96% respectively. In a subcohort of patients with extrathoracic disease (M1b and MX) the sensitivities were 92% (L858R), 95% (T790M), and 86% (exon 19 indels) with specificity of 100%, 100% and 94% respectively.

2018

Castellanos-Rizaldos, E.; Grimm, D.; Tadigotla, V.; Hurley, J.; Healy, J.; Neal, P.; Sher, M.; Venkatesan, R.; Karlovich, C.; Raponi, M.; Krug, A.; Noerholm, M.; Tannous, J.; Tannous, B.; Raez, L.; Skog, J. (2018): Exosome-Based Detection of EGFR T790M in Plasma from Non–Small Cell Lung Cancer Patients. Clinical Cancer Research 24 (12).
| Volltext

Purpose: About 60% of non–small cell lung cancer (NSCLC) patients develop resistance to targeted epidermal growth factor receptor (EGFR) inhibitor therapy through the EGFR T790M mutation. Patients with this mutation respond well to third-generation tyrosine kinase inhibitors, but obtaining a tissue biopsy to confirm the mutation poses risks and is often not feasible. Liquid biopsies using circulating free tumor DNA (cfDNA) have emerged as a noninvasive option to detect the mutation; however, sensitivity is low as many patients have too few detectable copies in circulation. Here, we have developed and validated a novel test that overcomes the limited abundance of the mutation by simultaneously capturing and interrogating exosomal RNA/DNA and cfDNA (exoNA) in a single step followed by a sensitive allele-specific qPCR.

Experimental Design: ExoNA was extracted from the plasma of NSCLC patients with biopsy-confirmed T790M-positive (N = 102) and T790M-negative (N = 108) samples. The T790M mutation status was determined using an analytically validated allele-specific qPCR assay in a Clinical Laboratory Improvement Amendment laboratory.

Results: Detection of the T790M mutation on exoNA achieved 92% sensitivity and 89% specificity using tumor biopsy results as gold standard. We also obtained high sensitivity (88%) in patients with intrathoracic disease (M0/M1a), for whom detection by liquid biopsy has been particularly challenging.

Conclusions: The combination of exoRNA/DNA and cfDNA for T790M detection has higher sensitivity and specificity compared with historical cohorts using cfDNA alone. This could further help avoid unnecessary tumor biopsies for T790M mutation testing.

Falcke, J.; Bose, N.; Artyukhin, A.; Rödelsperger, C.; Markov, G.; Yim, J.; Grimm, D.; Claassen, M.; Panda, O.; Baccile, J.; Zhang, Y.; Le, H.; Jolic, D.; Schroeder, F.; Sommer, R. (2018): Linking Genomic and Metabolomic Natural Variation Uncovers Nematode Pheromone Biosynthesis. Cell Chemical Biology 6 (21), S.787-796.
| Volltext

In the nematodes Caenorhabditis elegans and Pristionchus pacificus, a modular library of small molecules control behavior, lifespan, and development. However, little is known about the final steps of their biosynthesis, in which diverse building blocks from primary metabolism are attached to glycosides of the dideoxysugar ascarylose, the ascarosides. We combine metabolomic analysis of natural isolates of P. pacificuswith genome-wide association mapping to identify a putative carboxylesterase, Ppa-uar-1, that is required for attachment of a pyrimidine-derived moiety in the biosynthesis of ubas#1, a major dauer pheromone component. Comparative metabolomic analysis of wild-type and Ppa-uar-1 mutants showed that Ppa-uar-1 is required specifically for the biosynthesis of ubas#1 and related metabolites. Heterologous expression of Ppa-UAR-1 in C. elegans yielded a non-endogenous ascaroside, whose structure confirmed that Ppa-uar-1 is involved in modification of a specific position in ascarosides. Our study demonstrates the utility of natural variation-based approaches for uncovering biosynthetic pathways.

Exposito-Alonso, M.; Becker, C.; Schuenemann, V.; Reiter, E.; Setzer, C.; Slovak, R.; Brachi, B.; Hagmann, J.; Grimm, D.; Chen, J.; Busch, W.; Bergelson, J.; Ness, R.; Weigel, D. (2018): The rate and potential relevance of new mutations in a colonizing plant lineage. PLoS Genetics 14 (2).
| Volltext

By following the evolution of populations that are initially genetically homogeneous, much can be learned about core biological principles. For example, it allows for detailed studies of the rate of emergence of de novo mutations and their change in frequency due to drift and selection. Unfortunately, in multicellular organisms with generation times of months or years, it is difficult to set up and carry out such experiments over many generations. An alternative is provided by “natural evolution experiments” that started from colonizations or invasions of new habitats by selfing lineages. With limited or missing gene flow from other lineages, new mutations and their effects can be easily detected. North America has been colonized in historic times by the plant Arabidopsis thaliana, and although multiple intercrossing lineages are found today, many of the individuals belong to a single lineage, HPG1. To determine in this lineage the rate of substitutions—the subset of mutations that survived natural selection and drift–, we have sequenced genomes from plants collected between 1863 and 2006. We identified 73 modern and 27 herbarium specimens that belonged to HPG1. Using the estimated substitution rate, we infer that the last common HPG1 ancestor lived in the early 17th century, when it was most likely introduced by chance from Europe. Mutations in coding regions are depleted in frequency compared to those in other portions of the genome, consistent with purifying selection. Nevertheless, a handful of mutations is found at high frequency in present-day populations. We link these to detectable phenotypic variance in traits of known ecological importance, life history and growth, which could reflect their adaptive value. Our work showcases how, by applying genomics methods to a combination of modern and historic samples from colonizing lineages, we can directly study new mutations and their potential evolutionary relevance.

Togninalli, M.; Seren, Ü.; Meng, D.; Fitz, J.; Nordborg, M.; Weigel, D.; Borgwardt, K.; Korte, A.; Grimm, D. (2018): The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog. Nucleic Acids Research 46 (1).
| Volltext

The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10−4, of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.

2017

Krug, A.; Enderle, D.; Karlovich, C.; Priewasser, T.; Bentink, S.; Spiel, A.; Brinkmann, K.; Emenegger, J.; Grimm, D.; Castellanos-Rizaldos, E.; Goldman, J.; Sequist, L.; Soria, J.; Camidge, D.; Gadgeel, S.; Wakelee, H.; Raponi, M.; Noerholm, M.; Skog, J. (2017): Improved EGFR mutation detection using combined exosomal RNA and circulating tumor DNA in NSCLC patient plasma. Annals of Oncology 29 (3), S.700-706.
Volltext

Grimm, D.; Roqueiro, D.; Salomé, P.; Kleeberger, S.; Greshake, B.; Zhu, W.; Liu, C.; Lippert, C.; Stegle, O.; Schölkopf, B.; Weigel, D.; Borgwardt, K. (2017): easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies. The Plant Cell 29 (1).
| Volltext

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.

Seren, Ü.; Grimm, D.; Fitz, J.; Weigel, D.; Nordborg, M.; Borgwardt, K.; Korte, A. (2017): AraPheno: a public database for Arabidopsis thaliana phenotypes. Nucleic Acids Research 45.
Volltext

2016

Danelle, S.; Chae, E.; Grimm, D.; Pizarro, C.; Habring-Müller, A.; Vasseur, F.; Rakitsch, B.; Borgwardt, K.; Koenig, D.; Weigel, D. (2016): Genetic architecture of nonadditive inheritance in Arabidopsis thaliana hybrids. Proceedings of the National Academy of Sciences (PNAS) 2016.
Volltext

McGaughran, A.; Rödelsperger, C.; Grimm, D.; Meyer, J.; Moreno, E.; Morgan, K.; Leaver, M.; Serobyan, V.; Rakitsch, B.; Borgwardt, K.; Sommer, R. (2016): Genomic profiles of diversification and genotype-phenotype association in island nematode lineages. Molecular Biology and Evolution.
Volltext

Alonso-Blanco, C.; .., ..; Grimm, D.; .., ..; Weigel, D.; Zhou, X. (2016): 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell.
Volltext

Kawakatsu, T.; .., ..; Grimm, D.; .., ..; Ecker, J. (2016): Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions. Cell.
Volltext

2015

Eduati, F.; Mangravite, L.; .., ..; Grimm, D.; .., ..; Saez-Rodriguez, J. (2015): Prediction of human population responses to toxic compounds by a collaborative competition. Nature Biotechnology 2015.
Volltext

Llinares-López, F.; Grimm, D.; Bodenham, D.; Gieraths, U.; Sugiyama, M.; Rowan, B.; Borgwardt, K. (2015): Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics 2015.
| Volltext

Motivation: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals.

Results: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thalianagenome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping.

Conclusions: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes.

Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html.

Grimm, D.; Azencott, C.; Aicheler, F.; Gieraths, U.; MacArthur, D.; Samocha, K.; Cooper, D.; Stenson, P.; Smoller, J.; Duncan, L.; Borgwardt, K. (2015): The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.. Human Mutation 2015, S.513-523.
| Volltext

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen‐2, SIFT, FatHMM, MutationTaster‐2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

Wang, C.; Liu, C.; Roqueiro, D.; Grimm, D.; Schwab, R.; Becker, C.; Lanz, C.; Weigel, D. (2015): Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Research 2015.
Volltext

2013

Azencott, C.; Grimm, D.; Sugiyama, M.; Kawahara, Y.; Borgwardt, K. (2013): Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 2013.
Volltext

Grimm, D.; Hagmann, J.; Koenig, D.; Weigel, D.; Borgwardt, K. (2013): Accurate indel prediction using paired-end short reads. BMC Genomics 2013.
Volltext


Beitrag zu wissenschaftlicher Konferenz/Tagung (peer-reviewed)

2021

Haselbeck, F.; Grimm, D. (2021): EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data. 44th German Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence 2021.
| Volltext

Timeseriesforecastingisagrowingdomainwithdiverseapplications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned forecast- ing model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS- GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR combines online change point detection with a refitting of the prediction model using data aug- mentation for samples prior to a change point. Our experiments on simulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8% lower RMSE on different real-world datasets compared to methods with a similar computational resource consumption. Fur- thermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online forecasting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.

Göttl, Q.; Grimm, D.; Burger, J. (2021): Automated Process Synthesis Using Reinforcement Learning. Proceedings of the 31st European Symposium on Computer Aided Process Engineering (ESCAPE31) 50, S.209-214.
Volltext

2014

Sugiyama, M.; Azencott, C.; Grimm, D.; Kawahara, Y.; Borgwardt, K. (2014): Multi-Task Feature Selection on Multiple Networks via Maximum Flows Read More: https://epubs.siam.org/doi/abs/10.1137/1.9781611973440.23. Proceedings of the 2014 SIAM International Conference on Data Mining (SDM) 2014.
Volltext

2013

Feragen, A.; Petersen, J.; Grimm, D.; Dirksen, A.; Pedersen, J.; Borgwardt, K.; de Bruijne, M. (2013): Geometric tree kernels: Classification of COPD from airway tree geometry. International Conference on Information Processing in Medical Imaging IPMI 2013: Information Processing in Medical Imaging.
Volltext

2011

Tsafnat, G.; Setzermann, P.; Partridge, S.; Grimm, D. (2011): Computational inference of difficult word boundaries in DNA languages. ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies 2011.
Volltext


Zeitschriftenbeiträge

2021

Ajekwe, R.; Grieb, M.; Genze, N.; Grimm, D. (2021): Beikrauterkennung mit Drohnen und künstlicher Intelligenz. Schule und Beratung 2021 (8-10), S.12-15.
| Volltext

Neuartige Technologien, verknüpft mit intelligenter Bildauswertung, eröffnen große Poten- ziale im Bereich der Effizienzsteigerung in der Landwirtschaft. Mit Hilfe von modernsten Ver- fahren des maschinellen Lernens (z. B. künstliche neuronale Netze) sollen drohnenbasierte Bildaufnahmen von Sorghum-Anbauflächen automatisch analysiert und Beikraut erkannt werden. Sorghum wird in Bayern als Energiepflanze vor allem für die Biogasproduktion an- gebaut. Die hohe Biomasseleistung und die große Sortenvarietät in Verbindung mit seiner Trockenheitstoleranz und Nährstoffeffizienz machen Sorghum zu einer vielversprechenden Rohstoffpflanze.


Bücher / Monografien

2021

Bharti, R.; Grimm, D. (2021): Design and Analysis of RNA Sequencing Data. Next Generation Sequencing and Data Analysis. Learning Materials in Biosciences.
| Volltext

In this chapter, we introduce the concept of RNA-Seq analyses. First, we start to provide an overview of a typical RNA-Seq experiment that includes extraction of sample RNA, enrichment, and cDNA library preparation. Next, we review tools for quality control and data pre-processing followed by a standard workflow to perform RNA-Seq analyses. For this purpose, we discuss two common RNA-Seq strategies, that is a reference-based alignment and a de novo assembly approach. We learn how to do basic downstream analyses of RNA-Seq data, including quantification of expressed genes, differential gene expression (DE) between different groups as well as functional gene analysis. Eventually, we provide a best-practice example for a reference-based RNA-Seq analysis from beginning to end, including all necessary tools and steps on GitHub: https://github.com/grimmlab/BookChapter-RNA-Seq-Analyses.

2018

Gumpinger, A.; Roqueiro, D.; Grimm, D.; Borgwardt, K. (2018): Methods and Tools in Genome-Wide Association Studies. Computational Cell Biology.
Volltext


Sonstige Veröffentlichungen

2021

Schäfer, F.; Walther, M.; Grimm, D.; Hübner, A. (2021): Combining Machine Learning and Optimization for the Operational Patient-Bed Assignment Problem. SSRN 2021.
| Volltext

This paper develops a multi-objective decision support model for solving the patient bed assignment problem. Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown patient arrivals and lengths of stay, in particular for emergency patients. Hospitals therefore need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper contributes by improving the anticipation of emergency patients using machine learning (ML) approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved an up to 17% better root mean square error when using ML methods compared to a baseline approach relying on averages for historical arrival rates. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 3% in a single problem instance and up to 4% in a time series analysis compared to current approaches in literature. We achieved an improvement of up to 2.2% compared to a baseline approach from literature by combining the emergency patient admission forecasting and the hyper-heuristic on real-life situations.

Haselbeck, F.; Grimm, D. (2021): EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data. arXiv 2021.
| Volltext

Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: this https URL

Göttl, Q.; Grimm, D.; Burger, J. (2021): Automated Synthesis of Steady-State Continuous Processes using Reinforcement Learning. arXiv 2021.
| Volltext

Automated flowsheet synthesis is an important field in computer-aided process engineering. The present work demonstrates how reinforcement learning can be used for automated flowsheet synthesis without any heuristics of prior knowledge of conceptual design. The environment consists of a steady-state flowsheet simulator that contains all physical knowledge. An agent is trained to take discrete actions and sequentially built up flowsheets that solve a given process problem. A novel method named SynGameZero is developed to ensure good exploration schemes in the complex problem. Therein, flowsheet synthesis is modelled as a game of two competing players. The agent plays this game against itself during training and consists of an artificial neural network and a tree search for forward planning. The method is applied successfully to a reaction-distillation process in a quaternary system.

2015

Grimm, D. (2015): easyGWAS: An Integrated Computational Framework for Advanced Genome-Wide Association Studies. Universitätsbibliothek Tübingen.