Responsive image


Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions

John, M.; Ankenbrand, M.; Artmann, C.; Freudenthal, J.; Korte, A.; Grimm, D. (2022)

Bioinformatics 2022.
DOI: 10.1093/bioinformatics/btac455


Open Access
 

Motivation: Genome-wide Association Studies (GWAS) are an integral tool for studying the architecture ofcomplex genotype and phenotype relationships. Linear Mixed Models (LMMs) are commonly used to detectassociations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of theresiduals and that the genetic markers are independent and identically distributed - both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice they are rarely implemented due to the high computational complexity.

Results: We propose permGWAS, an efficient linear mixed model reformulation based on 4D-tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have a lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than eight days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results.

Availability: permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS

mehr

Rat Hepatic Stellate Cell Line CFSC-2G: Genetic Markers and Short Tandem Repeat Profile Useful for Cell Line Authentication

Nanda, I.; Schröder, S.; Steinlein, C.; Haaf, T.; Buhl, E.; Grimm, D....

Cells 2022.
DOI: 10.3390/cells11182900


Open Access
mehr

Systematic analysis of the underlying genomic architecture for transcriptional-translational coupling in prokaryotes

Bharti, R.; Siebert, D.; Blombach, B.; Grimm, D. (2022)

NAR Genomics and Bioinformatics 2022.
DOI: 10.1093/nargab/lqac074


Open Access
mehr

Deep Learning-based Early Weed Segmentation using Motion Blurred UAV Images of Sorghum Fields

Genze, N.; Ajekwe, R.; Güreli, Z.; Haselbeck, F.; Grieb, M.; Grimm, D. (2022)

Computers and Electronics in Agriculture 2022.
DOI: 10.1016/j.compag.2022.107388


Open Access
 

Weeds are undesired plants in agricultural fields that affect crop yield and quality by competing for nutrients, water, sunlight and space. For centuries, farmers have used several strategies and resources to remove weeds. The use of herbicide is still the most common control strategy. To reduce the amount of herbicide and impact caused by uniform spraying, site-specific weed management (SSWM) through variable rate herbicide application and mechanical weed control have long been recommended. To implement such precise strategies, accurate detection and classification of weeds in crop fields is a crucial first step. Due to the phenotypic similarity between some weeds and crops as well as changing weather conditions, it is challenging to design an automated system for general weed detection. For efficiency, unmanned aerial vehicles (UAV) are commonly used for image capturing. However, high wind pressure and different drone settings have a severe effect on the capturing quality, what potentially results in degraded images, e.g., due to motion blur. In this paper, we investigate the generalization capabilities of Deep Learning methods for early weed detection in sorghum fields under such challenging capturing conditions. For this purpose, we developed weed segmentation models using three different state-of-the-art Deep Learning architectures in combination with residual neural networks as feature extractors.

We further publish a manually annotated and expert-curated UAV imagery dataset for weed detection in sorghum fields under challenging conditions. Our results show that our trained models generalize well regarding the detection of weeds, even for degraded captures due to motion blur. An UNet-like architecture with a ResNet-34 feature extractor achieved an F1-score of over 89 % on a hold-out test-set. Further analysis indicate that the trained model performed well in predicting the general plant shape, while most misclassifications appeared at borders of the plants. Beyond that, our approach can detect intra-row weeds without additional information as well as partly occluded plants in contrast to existing research.

All data, including the newly generated and annotated UAV imagery dataset, and code is publicly available on GitHubhttps://github.com/grimmlab/UAVWeedSegmentation and Mendeley Data: https://doi.org/10.17632/4hh45vkp38.3

mehr

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

John, M.; Haselbeck, F.; Dass, R.; Malisi, C.; Dreischer, C.; Schultheiss, S....

Frontiers in Plant Science 2022.
DOI: 10.3389/fpls.2022.932512


Open Access
 

Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare twelve different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allows us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.

mehr

Dynamically Self-Adjusting Gaussian Processes for Data Stream Modelling

Hüwel, J.; Haselbeck, F.; Grimm, D.; Beecks, C. (2022)

KI 2021: Advances in Artificial Intelligence 2022.
DOI: 10.1007/978-3-031-15791-2_10


Open Access
 

One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA's ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.


Code availability: https://github.com/JanHuewel/ETKA

mehr

Towards a better understanding of the genetic architecture of complex traits

Grimm, D. (2022)

Keynote @TüBMI 2022, Tübinger Bioinformatics and Medical Informatics Days 2022.


Using Reinforcement Learning in a Game-like Setup for Automated Process Synthesis without Prior Process Knowledge

Göttl, Q.; Grimm, D.; Burger, J. (2022)

Proceedings of the 14th International Symposium on Process Systems Engineering 2022, S. 1555-1560.
DOI: 10.1016/B978-0-323-85159-6.50259-1

 

The present work uses reinforcement learning (RL) for automated flowsheet synthesis. The task of synthesizing a flowsheet is reformulated into a two-player game, in which an agent learns by self-play without prior knowledge. The hierarchical RL scheme developed in our previous work (Göttl et al., 2021b) is coupled with an improved training process. The training process is analyzed in detail using the synthesis of ethyl tert-butyl ether (ETBE) as an example. This analysis uncovers how the agent’s evolution is driven by the two-player setup.

mehr

Genetic Characterization of Rat Hepatic Stellate Cell Line HSC-T6 for In Vitro Cell Line Authentication

Indrajit , N.; Steinlein , C.; Haaf, T.; Buhl, E.; Grimm, D.; Friedman, S....

Cells 2022.
DOI: 10.3390/cells11111783


Open Access
 

Immortalized hepatic stellate cells (HSCs) established from mouse, rat, and humans are valuable in vitro models for the biomedical investigation of liver biology. These cell lines are homogenous, thereby providing consistent and reproducible results. They grow more robustly than primary HSCs and provide an unlimited supply of proteins or nucleic acids for biochemical studies. Moreover, they can overcome ethical concerns associated with the use of animal and human tissue and allow for fostering of the 3R principle of replacement, reduction, and refinement proposed in 1959 by William M. S. Russell and Rex L. Burch. Nevertheless, working with continuous cell lines also has some disadvantages. In particular, there are ample examples in which genetic drift and cell misidentification has led to invalid data. Therefore, many journals and granting agencies now recommend proper cell line authentication. We herein describe the genetic characterization of the rat HSC line HSC-T6, which was introduced as a new in vitro model for the study of retinoid metabolism. The consensus chromosome markers, outlined primarily through multicolor spectral karyotyping (SKY), demonstrate that apart from the large derivative chromosome 1 (RNO1), at least two additional chromosomes (RNO4 and RNO7) are found to be in three copies in all metaphases. Additionally, we have defined a short tandem repeat (STR) profile for HSC-T6, including 31 species-specific markers. The typical features of these cells have been further determined by electron microscopy, Western blotting, and Rhodamine-Phalloidin staining. Finally, we have analyzed the transcriptome of HSC-T6 cells by mRNA sequencing (mRNA-Seq) using next generation sequencing (NGS).

mehr

Computational identification of protein complexes from network interactions: Present state, challenges, and the way forward

Omranian, S.; Nikoloski, Z.; Grimm, D. (2022)

Computational and Structural Biotechnology Journal 2022.
DOI: 10.1016/j.csbj.2022.05.049


Open Access
 

Physically interacting proteins form macromolecule complexes that drive diverse cellular processes. Advances in experimental techniques that capture interactions between proteins provide us with protein-protein interaction (PPI) networks from several model organisms. These datasets have enabled the prediction and other computational analyses of protein complexes. Here we provide a systematic review of the state-of-the-art algorithms for protein complex prediction from PPI networks proposed in the past two decades. The existing approaches that solve this problem are categorized into three groups, including: cluster-quality-based, node affinity-based, and network embedding-based approaches, and we compare and contrast the advantages and disadvantages. We further include a comparative analysis by computing the performance of eighteen methods based on twelve well-established performance measures on four widely used benchmark protein-protein interaction networks. Finally, the limitations and drawbacks of both, current data and approaches, along with the potential solutions in this field are discussed, with emphasis on the points that pave the way for future research efforts in this field.

mehr

Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions

Haselbeck, F.; Killinger, J.; Menrad, K.; Hannus, T.; Grimm, D. (2022)

Machine Learning with Applications Volume 7, 100239 (2022).
DOI: 10.1016/j.mlwa.2021.100239


Open Access
 

Forecasting future demand is of high importance for many companies as it affects operational decisions. This is especially relevant for products with a short shelf life due to the potential disposal of unsold items. Horticultural products are highly influenced by this, however with limited attention in forecasting research so far. Beyond that, many forecasting competitions show a competitive performance of classical forecasting methods. For the first time, we empirically compared the performance of nine state-of-the-art machine learning and three classical forecasting algorithms for horticultural sales predictions. We show that machine learning methods were superior in all our experiments, with the gradient boosted ensemble learner XGBoost being the top performer in 14 out of 15 comparisons. This advantage over classical forecasting approaches increased for datasets with multiple seasons. Further, we show that including additional external factors, such as weather and holiday information, as well as meta-features led to a boost in predictive performance. In addition, we investigated whether the algorithms can capture the sudden increase in demand of horticultural products during the SARS-CoV-2 pandemic in 2020. For this special case, XGBoost was also superior. All code and data is publicly available on GitHub: https://github.com/grimmlab/HorticulturalSalesPredictions.

mehr

Reinforcement Learning für die automatisierte Fließbildsynthese

Grimm, D.; Göttl, Q.; Burger, J. (2021)

AI4Life, KI Symposium.


EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data

Haselbeck, F.; Grimm, D. (2021)

44th German Conference on Artificial Intelligence (Virtual Conference).
DOI: 10.1007/978-3-030-87626-5_11

 

Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned forecasting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR combines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on simulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8% lower RMSE on different real-world datasets compared to methods with a similar computational resource consumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online forecasting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.

mehr

EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data

Haselbeck, F.; Grimm, D. (2021)

44th German Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence 2021.
DOI: 10.1007/978-3-030-87626-5_11

 

Timeseriesforecastingisagrowingdomainwithdiverseapplications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned forecast- ing model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS- GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR combines online change point detection with a refitting of the prediction model using data aug- mentation for samples prior to a change point. Our experiments on simulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8% lower RMSE on different real-world datasets compared to methods with a similar computational resource consumption. Fur- thermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online forecasting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.

mehr

Combining Machine Learning and Optimization for the Operational Patient-Bed Assignment Problem

Schäfer, F.; Walther, M.; Grimm, D.; Hübner, A. (2021)

SSRN 2021.
DOI: 10.2139/ssrn.3919282

 

This paper develops a multi-objective decision support model for solving the patient bed assignment problem. Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown patient arrivals and lengths of stay, in particular for emergency patients. Hospitals therefore need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper contributes by improving the anticipation of emergency patients using machine learning (ML) approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved an up to 17% better root mean square error when using ML methods compared to a baseline approach relying on averages for historical arrival rates. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 3% in a single problem instance and up to 4% in a time series analysis compared to current approaches in literature. We achieved an improvement of up to 2.2% compared to a baseline approach from literature by combining the emergency patient admission forecasting and the hyper-heuristic on real-life situations.

mehr

Automated Flowsheet Synthesis Using Hierarchical Reinforcement Learning: Proof of Concept

Göttl, Q.; Tönges, Y.; Grimm, D.; Burger, J. (2021)

Chemie Ingenieur Technik 93 (12), S. 2010-2018.
DOI: 10.1002/cite.202100086


Open Access
 

Recently we showed that reinforcement learning can be used to automatically generate process flowsheets without heuristics or prior knowledge. For this purpose, SynGameZero, a novel two-player game has been developed. In this work we extend SynGameZero by structuring the agent's actions in several hierarchy levels, which improves the approach in terms of scalability and allows the consideration of more sophisticated flowsheet problems. We successfully demonstrate the usability of our novel framework for the fully automated synthesis of an ethyl tert-butyl ether process.

mehr

The AIMe registry for artificial intelligence in biomedical research

Matschinske, J.; Alcaraz, N.; Benis, A.; Golebiewski, M.; Grimm, D.; Heumos, L....

Nature Methods 18, S. 1128-1131.
DOI: 10.1038/s41592-021-01241-0

 

We present the AIMe registry, a community-driven reporting platform for AI in biomedicine. It aims to enhance the accessibility, reproducibility and usability of biomedical AI models, and allows future revisions by the community. 

mehr

Beikrauterkennung mit Drohnen und künstlicher Intelligenz

Ajekwe, R.; Grieb, M.; Genze, N.; Grimm, D. (2021)

Schule und Beratung 2021 (8-10), S. 12-15.

 

Neuartige Technologien, verknüpft mit intelligenter Bildauswertung, eröffnen große Poten- ziale im Bereich der Effizienzsteigerung in der Landwirtschaft. Mit Hilfe von modernsten Ver- fahren des maschinellen Lernens (z. B. künstliche neuronale Netze) sollen drohnenbasierte Bildaufnahmen von Sorghum-Anbauflächen automatisch analysiert und Beikraut erkannt werden. Sorghum wird in Bayern als Energiepflanze vor allem für die Biogasproduktion an- gebaut. Die hohe Biomasseleistung und die große Sortenvarietät in Verbindung mit seiner Trockenheitstoleranz und Nährstoffeffizienz machen Sorghum zu einer vielversprechenden Rohstoffpflanze.

mehr

EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data

Haselbeck, F.; Grimm, D. (2021)

arXiv:2101.04422.


Open Access
 

Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: this https URL

mehr

Automated Process Synthesis Using Reinforcement Learning

Göttl, Q.; Grimm, D.; Burger, J. (2021)

Proceedings of the 31st European Symposium on Computer Aided Process Engineering (ESCAPE31) 50, S. 209-214.
DOI: 10.1016/B978-0-323-88506-5.50034-6

mehr

Prof. Dr. Dominik Grimm


Hochschule Weihenstephan-Triesdorf

Fakultät Wald und Forstwirtschaft
Hans-Carl-von-Carlowitz-Platz 3
85354 Freising

T +49 9421 187-230
F +49 9421 187-285
dominik.grimm[at]hswt.de