Bioinformatics Training

This is a list of training courses that the Bioinformatics Group can provide. To express your interest contact Keywan Hassani-Pak (Head of Bioinformatics).

Bioinformatics 101

This 1-day course aims to provide an introduction into Next Generation Sequencing (NGS) technologies, data formats and public data repositories such as ENA and NCBI. The course will teach how to use the Geneious bioinformatics platform to perform basic to intermediate tasks, including: sequence alignments, assemblies, annotation, and how to create and use custom BLAST databases. An in depth tutorial on designing PCR primers using Geneious will test and advance skills gained. The course is a mixture of presentations, software demonstrations and hands-on sessions primarily targeted at bioinformatics beginners.

Introduction to Galaxy for NGS data processing

Galaxy is an open, web-based platform for data intensive biological research. Galaxy is simple & intuitive, supports collaboration and integrates data and analysis tools together in a single place. This 1-day course will introduce the Galaxy user interface and a variety of its features, including data import, histories, jobs and workflows. It will be shown how Galaxy can be used to streamline the analysis of Next Generation Sequencing data, with special focus on data preprocessing and read mapping.The course is a mixture of presentations, software demonstrations and hands-on sessions primarily targeted at researchers that want to learn how to perform basic, shareable and reproducible bioinformatics analysis of their sequencing experiments.

RNA-seq analysis in Galaxy and R

This 2-day course covers the bioinformatics and statistical approaches needed to analyse transcriptome profiling experiments. It aims to provide participants with a clear understanding of the key issues involved with experimental design, data collection, analysis and interpretation of results for experiments using this technology. Approaches are illustrated using Galaxy and the R statistical computing environment, considering a number of available packages and pipelines for the different stages of the process. The course is a mixture presentations, software demonstrations and hands-on sessions primarily targeted at researchers that want to understand the computational steps of RNA-seq analysis and how to interpret the outputs.

Introduction to variant calling using GATK in Galaxy

This half-day course will introduce the process of calling variants (SNPs, INDELs) in genotyping-by-sequencing, exome capture and whole genome re-sequencing data. We will teach how to use the current industry standard pipeline GATK to call variants and show how to filter and annotate a VCF output using tools such as SnpSift and SnpEff.  The course is a mixture of presentations, demonstrations and hands-on sessions in Galaxy primarily targeted at researchers that want to understand the computational steps of variant calling and how to interpret the outputs.

Population genomics, GWAS and genomic prediction

This half-day course will provide a brief introduction to key concepts in population and quantitative genetics, followed by demonstrations and hands-on exercises of genome scans aimed at detecting signatures of natural selection and marker-trait associations. Finally, the concept of genomic selection (i.e., trait prediction using a genome-wide set of molecular markers) will be illustrated using real data sets. All analyses will be performed in R and Linux, although no previous experience with these will be assumed.

Introduction to phylogenetics

Building a phylogenetic tree can be the clearest way to identify species based on molecular data, to place a new gene within a gene family, to reconstruct the evolutionary history of species or to infer gene duplication and deletion events. But when would you use neighbour-joining, maximum likelihood or Bayesian analysis? JC, HKY or GTR? Outgroup, midpoint or unrooted? This half-day course will introduce the theoretical background and the methods and software available for sequence alignment, model selection, tree-building, clade support calculations and figure preparation.

KnetMiner workshop (Interpreting RNA-seq, GWAS and QTL data)

KnetMiner (f.k.a. QTLNetMiner) http://knetminer.rothamsted.ac.uk/ provides an easy to use web interface to visualisation and data mining tools for the discovery and evaluation of candidate genes from large scale integrations of public and private data sets. It addresses the needs of scientists who generally lack the time and technical expertise to review all relevant information available in the literature, from key model species and from a potentially wide range of related biological databases. This half-day workshop will give an overview of KnetMiner and demonstrate its utility for the interpretation of RNA-seq, GWAS and QTL data via its various useful components. The  workshop is primarily targeted at wet lab researchers interested in gene discovery and is useful to total beginners and also frequent bioinformatics users.

Linked Data and Ontologies for Life Science and Plant Biology

Studying life is as complex as dealing with entities and phenomena that vary greatly and interact at very different scales, from cell biology to population genetics. Moreover, it is ever more based on gathering and analyse large amounts of data and knowledge. Methods from computer science, such as ontology-based modelling allow us to address such complexity and unify different terminologies and points of view. In recent years, linked data principles and Semantic Web standards have further contributed to the integration of heterogeneous sources of data, thus easing tasks of data search, collection and analysis. This course is a practical introduction to ontology engineering and the use of OWL-based ontologies together with the RDF language, to build standard representations of data. We will also show existing data sets regarding biology and plant biology, as well as the use of tools such as Protegé or Jena, to produce and access linked data.

Introduction to Linux for data processing

Linux is the most important Operating System in scientific computing and the majority of the programs used in bioinformatics are run on Linux servers. In addition, there are numerous tools to perform basic processing of text files. However, the power of Linux is not always readily accessible to the user because a command-line interface is used. This half-day workshop will aim to make attendees feel comfortable in the Linux environment and to show them how to run programs, do basic operations with text files and to use simple scripts to automate repetitive tasks like file downloads. The course is primarily targeted at researchers with little or no knowledge of Linux.

Introduction to Software Management tools and methodology

This half-day course will provide an overview of software development best practices with a focus on Version Control with Git and GitHub. This course will extensively cover the various types of Version Control, Git version control terminology, popular command line and GUI-based tools and explain the various “versioning approaches” to follow when working in small, medium or large software development teams. It also will delve into best practices including coding standards, code inspection, issue tracking and the benefits of Test-Driven Development. This course is aimed at researchers with basic programming knowledge who are interested in building more robust research software.