Sunday, 5 November 2017

UNSW Genome Annotation workshop, Tuesday 21st November 2017

I am pleased to announce that we will be running a replacement for July’s cancelled Genome Annotation workshop at UNSW on Tuesday 21st November 2017, 1100-1400. This will use WebApollo, which is the genome annotation browser we will be using for community annotation of our snake genomes.

Places are limited but it’s free and you can sign up here through Eventbrite.


This workshop will include a short background lecture on the fundamentals of gene prediction and genome annotation followed by a hands-on component where we will conduct manual curation exercises using Apollo.

The workshop has been organised by EMBL-ABR and will be led by Dr Monica Munoz-Torres from Phoenix Bioinformatics who is an expert in genome annotation, current chair of the International Society for Biocuration Executive Committee, and former Project Manager of the Apollo Project.

Monica will be joining us direct from the San Francisco Bay Area, and we will have locally trained trainers on hand to help and facilitate the workshop locally.


  • Genome Annotation - why is it important?
  • Gene prediction
    • what is a gene
    • computation
    • annotation
  • Genome curation
    • knowledge
    • curation - why is this necessary?
  • Structural Annotation using Apollo
  • Biological principles for curation with Apollo
  • Apollo functionality: step by step
  • Curation example


Participants must bring their own eduroam-enabled laptop with either Chrome or Firefox installed.

Further information or contact Richard Edwards.

Monday, 28 August 2017

Where do our snakes come from?

The snakes we are sequencing for the BABS Genome project were kindly supplied by Nathan Dunstan at Venom Supplies as a collaborative contribution to Paul Waters and Denis O’Meally when they were at ANU. Thanks Nathan!

We have sequenced two Tiger snake parents, originally caught from the southeast of South Australia (just north of Mt Gambier) in about 2004. They were bred at Venom Supplies, and we have also sequenced one of the babies (sex unknown) born in February 2013.

The brown snake was a female from a clutch of eggs from a gravid (pregnant) female caught locally in the Barossa.

Photo Credits

Tiger Snake (left): Teneche [CC BY-SA 3.0] | Brown Snake (right): Denis O'Meally.

Tuesday, 15 August 2017

Linked read sequencing is go!

We already have over four billion reads and 620 GB of NovaSeq Illumina data for our three tiger snakes; next week’s BABS3291 prac will look at some of the early ABySS assemblies of one of these snakes.

Phase 2 of the sequencing is now go! 10x Chromium linked read libraries were prepared at the Ramaciotti Centre for Genomics last week for one tiger snake and one eastern brown snake. These data promise to make much easier and more intact genome assemblies. We received notification today that the samples have arrived in the KCCG sequencing laboratory at the Garvan Institute for Illumina HiSeq X (“XTen”) sequencing.

Nobody knows how well linked read sequencing, which is optimised for human genomes, will work in a snake but we look forward to finding out!

Friday, 4 August 2017

Important considerations for sample preparation

Today we had a tutorial on the things you need to think about during a genome sequencing project. The first student suggestion for sample selection and handling is good advice for life:

Thursday, 3 August 2017

Sequencing technologies used for the BABS Genome

Sequencing for the BABS Genome is being performed at the Ramaciotti Centre for Genomics at UNSW, which is one of Australia’s top sequencing centres and has a long, rich history of genome sequencing.

The Gold Standard for genome assembly is currently to combine three technologies:

  1. High coverage short read sequencing for accurate base calling of unique regions.

  2. Long read sequencing for assembling complex and small repetitive regions of the genome.

  3. Long range sequencing for scaffolding contigs across larger repetitive regions of the genome.

We will be using a combination of three of these latest technologies for the BABS genome:

Illumina NovaSeq and HiSeq X

Short read Illumina sequencing is still the starting point for sequencing large (>0.5 Gb) genomes. Although it is impossible to assemble short read data alone into a high-quality genome, it remains the most cost-effective technology in terms of high-quality bases sequenced per dollar. Illumina sequencing struggles with regions of the genome with certain compositional bias and short read assembly fails at repetitive regions. Nevertheless, it is possible to get a useful assembly of a large portion of the “unique” genome, which includes most of the protein-coding genes.

For the 2017 BABS genome, we are using two of the latest - and most cost-effective - Illumina sequencing platform: the HiSeq X (XTen) and new HiSeq NovaSeq. These machines have a phenomenal output per run. The NovaSeq is being used for pure Illumina sequencing, whereas the HiSeq X is being used for the sequencing component of the 10X Genomics Linked Read sequencing (below).

PacBio Sequel

Whole genome sequencing and assembly has been revolutionised by the development of long read sequencing technologies by Pacific Biosciences (PacBio) and Oxford Nanopore (MinION). With typical read lengths a hundred times longer than Illumina reads, long read sequencing enables resolution of many of the shorter repetitive regions in the genome.

Long read sequencing is still comparably expensive and the budget does not stretch for a pure PacBio assembly this year. However, we will be getting some sequencing done on the new PacBio Sequel, which will help with scaffolding Illumina contigs. We also hope to be able to generate a pure PacBio mitochondrial genome; mitochondria are present in multiple copies per cell, which effectively increases the depth of coverage!

10X Genomics Chromium Linked Reads

Due to the cost (and DNA requirements) of long read sequencing, there has been considerable effort in recent years to combine cost-effective Illumina short read sequencing with additional experimental approaches to leverage long-range information. The long range service offered by the Ramaciotti Centre is 10X Genomics Chromium linked read sequencing. Unlike PacBio or MinION, this does not contiguously sequence a long DNA molecule. Instead, it uses a clever barcoding system to link short reads back to their DNA molecule of origin. 10X Genomics software then uses this linkage to regenerate pseudo-long-reads that can be used for both genome assembly and haplotype phasing.

Friday, 28 July 2017

What would YOU do with six billion sequencing reads?

The Ramaciotti Centre for Genomics, where we get all the sequencing done for the BABS Genome, is holding a competition to win a full sequencing run on their new NovaSeq 6000. This is one of the technologies we are using for our snake genomes - in fact, our three tiger snakes were part of the very first sequencing run on the new machine.

The capacity of this thing is awesome. In addition to the three snake samples, we had three cane toads as part of the ongoing [cane toad genome project] and, for a control, sequenced one of our yeast strains about 10,000 times!

The Competition

NovaSeq Mini Grant – How would you use 3 billion reads?

To celebrate the opening of our new genomics facility we are pleased to announce a mini grant valued up to $28,000. Researchers with innovative, collaborative projects are invited to submit a 250-word application outlining how 3 billion reads can be utilised to advance their research. The winner will receive an Illumina NovaSeq 6000 S2 100bp PE run (up to 3.3B reads/660Gb), with heavily subsidised library construction. Submit your entry by completing an application form and emailing it to with the subject heading “NovaSeq mini grant”. Terms and conditions apply.

Monday, 24 July 2017

We're sequencing snakes! (But the competition's still on...)

Today was the first BABS3291 Genes, Genomes and Evolution lecture and the official opening of the new Ramaciotti Centre for Genomics labs in the shiny new E26 Bioscience South building at UNSW. This seemed like an appropriate day to reveal the chosen organisms for the BABS2017 Genome.

And the answer is... two snake species!

And not just any snakes… the Tiger Snake (left) and Brown snake (right, narrowly avoided by my postdoc ├ůsa and her partner!) are two of the most deadly snakes in the world. More information will follow in future posts.

Data from the BABS Genome is going to form the core of the seven-week genomics bioinformatics practical at the heart of the coursework for BABS3291. This obviously meant that we needed to start generating data before the course started. We’re still keen to find out what you would like sequenced, though, and the BABS Genome competition remains open for now. Who knows, you may help pick the next genome we do!

Photo Credits

  • Tiger Snake: Teneche - CC BY-SA 3.0. Location: Banyule Flats Reserve, Melbourne, Victoria.

  • Brown Snake: Patrick Dessi. Location: Booroomba Rocks, Namadgi National Park, ACT, Australia.