Impulse18
This year we are invited to give our talk about the BOA project at the Impulse 2018 in Leipzig. On 25.10. we will be at the Congress Centrum Leipzig and …
A fast and state of the art bioinformatics pipeline for performing microbiome analysis on Microsoft Azure
With Next-Generation Sequencing (NGS) it is nowadays possible to generate DNA sequences of thousands of animals originating from environmental samples in a short time and thus make them available for bioinformatic analysis. This opens the door for new and revolutionary experimental setups and new fields of research. Today, many universities and research institutions dealing with nature conservation, biodiversity, population genetics and evolution have established new chairs for metagenomics, metatranscriptomics and metabarcoding. In contrast to traditional methods, the innovative approach ‘metabarcoding’ is capable of producing timely results providing information about changes in biodiversity on a temporal and geographical scale, which is especially of urgent need in times of insect extinction and climate change. Although Metabarcoding is still subject to research and development, it can already be said that it is a revolutionary method, not only for assessing biodiversity, but also for investigating and measuring the effects of various influences such as civil engineering, agriculture or renaturation measures. Conventional analysis options require a lot of storage and compute power and are often very time-consuming and prone to introduce errors. The Bioinformatics on Azure (BOA) project is a platform based on various Azure services that fully utilizes the power and flexibility of the cloud and can thus quickly and securely make results available for research.
The Azure Data Lake Store is our preferred store for storing structured and semi-structured data in raw and ziped formats up to several terrabytes.
With Azure Data Lake Analytics we prepare the various data formats and start an initial analyses based on the U-SQL C#, R and Python integrations.
The Azure Data Factory loads data from the various external data sources and makes them available in the Azure Data Lake Store.
See how NGS analysis can be accelerated by making use of Apache Spark, the leading cluster-computing framework among professionals
Using the language R, we have access to a large number of different bioinformatics packages and can therefore use established methods
Using Python we can use machine learning methods to achieve better and faster results when searching against reference databases
Metadata and various project information are stored in an Azure SQL database and made available to the various Azure services
Yes, most of BOA will be open source and will be published by us on GitHub. Smaller tools have already found their way to GitHub
To further analyze and visualize the results of our pipeline, we rely on Microsoft Power BI and various R and Python charting libraries
This year we are invited to give our talk about the BOA project at the Impulse 2018 in Leipzig. On 25.10. we will be at the Congress Centrum Leipzig and …
Many questions in science and research require more and more complex modeling for complex systems. High Performance Computing (HPC) has therefore become indispensable for many research areas – for example …
We are pleased to present our project “Bioinformatics on Azure” at the Azure Meetup Hamburg on October 13, 2017. The meeting starts at 18:00 at Atos in the Mundsburg Office …
If you need further information about our project BOA, please contact us.
We would be pleased to inform you about the current progress of the project and give you a further insight into our project BOA.
If you think the project might be of interest to your community, meetup, user group or conference, just ask us for possible dates and conditions. We are always happy to present the project worldwide.