BOA – Bioinformatics on Azure

With Next-Generation Sequencing (NGS) it is nowadays possible to generate DNA sequences of thousands of animals originating from environmental samples in a short time and thus make them available for bioinformatic analysis. This opens the door for new and revolutionary experimental setups and new fields of research. Today, many universities and research institutions dealing with nature conservation, biodiversity, population genetics and evolution have established new chairs for metagenomics, metatranscriptomics and metabarcoding. In contrast to traditional methods, the innovative approach ‘metabarcoding’ is capable of producing timely results providing information about changes in biodiversity on a temporal and geographical scale, which is especially of urgent need in times of insect extinction and climate change. Although Metabarcoding is still subject to research and development, it can already be said that it is a revolutionary method, not only for assessing biodiversity, but also for investigating and measuring the effects of various influences such as civil engineering, agriculture or renaturation measures. Conventional analysis options require a lot of storage and compute power and are often very time-consuming and prone to introduce errors. The Bioinformatics on Azure (BOA) project is a platform based on various Azure services that fully utilizes the power and flexibility of the cloud and can thus quickly and securely make results available for research.

Azure Data Lake Store

The Azure Data Lake Store is our preferred store for storing structured and semi-structured data in raw and ziped formats up to several terrabytes.

Azure Data Lake Analytics

With Azure Data Lake Analytics we prepare the various data formats and start an initial analyses based on the U-SQL C#, R and Python integrations.

Azure Data Factory

The Azure Data Factory loads data from the various external data sources and makes them available in the Azure Data Lake Store.

Databricks

See how NGS analysis can be accelerated by making use of Apache Spark, the leading cluster-computing framework among professionals

R

Using the language R, we have access to a large number of different bioinformatics packages and can therefore use established methods

Python

Using Python we can use machine learning methods to achieve better and faster results when searching against reference databases

Azure SQL Database

Metadata and various project information are stored in an Azure SQL database and made available to the various Azure services

GitHub

Yes, most of BOA will be open source and will be published by us on GitHub. Smaller tools have already found their way to GitHub

Power BI

To further analyze and visualize the results of our pipeline, we rely on Microsoft Power BI and various R and Python charting libraries

Presentations

Impulse18

This year we are invited to give our talk about the BOA project at the Impulse 2018 in Leipzig. On 25.10. we will be at the Congress Centrum Leipzig and …

Presentations

New strategies and solutions for visionary research projects

Many questions in science and research require more and more complex modeling for complex systems. High Performance Computing (HPC) has therefore become indispensable for many research areas – for example …

Presentations

Azure Meetup Hamburg

We are pleased to present our project “Bioinformatics on Azure” at the Azure Meetup Hamburg on October 13, 2017. The meeting starts at 18:00 at Atos in the Mundsburg Office …

If you need further information about our project BOA, please contact us.
We would be pleased to inform you about the current progress of the project and give you a further insight into our project BOA.
If you think the project might be of interest to your community, meetup, user group or conference, just ask us for possible dates and conditions. We are always happy to present the project worldwide.

oh22information services GmbH

Otto-Hahn-Str. 22, 65520 Bad Camberg

+49.6434.9459.0

info@oh22.is