Monday, July 9, 2007
Helicos Part I
Helicos Biosciences part I (a.k.a Next Generation Sequencing overview)
Helicos Biosciences Corporation aims to commercialize a novel platform for “next generation” genomics. Their first product, the Heliscope, is designed to enable ultra-high-throughput genetic analysis based on the direct sequencing of single molecules of DNA and RNA.
Before we begin an in depth look at Helicos some background in sequencing is necessary. To begin, I am going to assume that the reader knows what sequencing is. More importantly, we can agree that as the price of personal genomics drops to a reasonable cost the impact on human biology and modern medicine will be changed profoundly. In essence, the hype of the sequencing of the human genome is finally coming to fruition as the cost of individual sequencing approaches the $1000 mark. In addition, for those who may have been napping I will recap the sequencing news from the last 12 months. Next generation sequencing companies are ostensibly getting snatched up by larger pharmaceuticals. In May 2006 Agencourt Personal Genomics was bought for $120million by Applied Biosystems, then in November Illumina purchased Solexa for a whopping $600million, and finally Roche payed $155million for 454 life sciences in March of 2007. What the three acquired companies have in common is that they employ technologies known as “next generation” sequencing. In other words, they have developed technologies that vastly increase the speed and breadth of DNA sequencing over the traditional Sanger method.
Next Generation Sequencing (NGS)
Sanger sequencing is the gold standard for very large projects. Unfortunately it does require a large infrastructure. The current state-of the-art, the Applied Biosystems 3730 xl Genetic Analyzer has an average read length of 1000bp and can generate a maximum 2.1Mbp (2,100,000) of sequence per day. This machine is priced at ~$400,000, and estimated cost for sequencing a human genome using the 3730 xl is $24M.
Most importantly, sequencing a human genome on this machine with six-fold coverage (~18 GB) would take 18 years. Because of this, large scale sequencing efforts have been carried out by genome centers which employ many machines running in parallel.
The goal of developing Next Generation Sequencing is to develop technologies that can produce a complete human genome in a reasonable time-frame, by using a single sequencing machine for $100,000 and eventually $1000. Lets have a look at the three top currently marketed products and methods.
Illumina/Solexa 1G genome analyzer
Solexa was the first out of the gate to provide a technology capable of generating a $100000 genome with their 1G analyzer which began shipping in the second quarter of 2006. Having just celebrated their 75 order placement, this puts Illumina firmly in front with the first mover advantage in the NGS market.
Run Length: ~2-3 days
Read Length: ~25bp
Raw Base Accuracy:99.99%
Genomic DNA is first sheared into small fragments and adapters are ligated onto both ends of the sequence. The DNA is then added to the flow cell whereby the ends bind to the proprietary surface on the inside of the channels. The free adapters of these fragments form bridges to the complementary nearby attached primers Next during a process known as solid phase amplification the fragments are thermocycled in the presence of nucleotides and polymerase and the bridges become double stranded. After denaturation, repeated cycles of this process give rise to random dense clusters of homogeneous DNA fragments containing millions of copies.
Solexa sequencing uses four proprietary, different colored, fluorescently-labeled modified nucleotides to sequence the above clusters present on the flow cell surface. Clusters are first referenced via laser excitation of the flourophores and their individual locations are captured via a CCD detector. In subsequent cycles progressive bases are sequenced as nucleotides are added, the entire slide is excited and scanned for individual colored incorporations, and then the flourescence is removed for the next base to repeat the cycle. In this manner each cluster on a slide is sequenced in a massively parallel manner.
Applied Biosystems SOLiD (Supported Oligo Ligation Detection)
Applied Biosystems, the current leader in sequencing has their own product now on the market. ABI's current NGS platform the SOLiD system. The shipping of initial units began in June 2007.
Price: $600,000 which includes the instrument, a computing cluster, a high capacity data storage centre and ancillary equipment for upfront sample preparation.
MB/day: ~500MB for mate pair samples (genome resequencing)
Read Length: up to 35bp
Raw Base Accuracy: 99.94%
The technology works by first amplifying DNA fragments using a water in oil emulsion polymerase chain reaction (PCR) technique that amplifies the DNA onto polystyrene beads. When the emulsion is broken the beads float to the top of the sample and are then placed on an array. Sequencing primers are then added along with a mixture of four different fluorescently labelled oligo probes. The oligo probes are eight bases long and bind specifically to the fifth base in the sequence to determine which of the four bases (A, T, C or G) it is. After washing and reading the fluorescence signal from the first base, a ligase is added, not a polymerase as in standard Sanger sequencing. The ligase cleaves the oligo probe between the fifth and sixth bases, removing the fluorescent dye from the strand of amplified DNA.
The whole process is repeated using a different sequence primer, until all of the intervening positions in the sequence are imaged. The process allows the simultaneous reading of millions of DNA fragments in a 'massively parallel' manner. This 'sequence-by-ligation' technique also allows the use of probes that encode for two bases rather than just one allowing error recognition by signal mismatching, leading to increased base determination accuracy.
Roche/454 Life Sciences Genome Sequencer FLX
Roche Diagnostics began distribution of the GS FLX in November 2007. The FLX system incorporates a number of technological advances over the original GS 20 launched by 454 Life Sciences in October 2005.
Price: ~$500000. (Academic Price?)
Read Length: ~250bases/read (depending on the organism)
Raw Base Accuracy: 99.5%
Like the 1G analyzer, the GS FLX also uses a methodology of sequencing by synthesis (rather than ligation) specifically known as pyrosequencing. DNA fragments 300-500bp in length are ligated by two short adaptors, which provide primers for both amplification and sequencing of the fragment as well as a biotin tag that immobilises it onto a streptavidin-coated bead.
A subsequent emulsion PCR step gives rise to beads with millions of copies of each DNA fragment attached. These beads are then deposited in 454's PicoTiterPlate device by centrifugation. The PicoTiterPlate wells are 44um and therefore fit only one bead apeice. In addition, each well has an optical fibre attached to its base, which form an array leading to a CCD camera.
The fluidics sub-system allows nucleotides to be pumped in, in a fixed order. During the nucleotide flow each of the beads is sequenced in parallel, with the polymerase extending the sequencing strand only if the nucleotide is complimentary to the template strand. The addition of one (or more) nucleotides results in a sequential pyrophosphate reaction with sulfurylase and luciferase producing a light signal, which is recorded by the instrument's camera.
I have been wanting to take a hard look at the next generation sequencing market for a little while now, and after the initial research needed for a background in Helicos I am very glad that I put in the effort to research it thoroughly. The implications involved with this technology are undoubtedly going to be groundbreaking from all areas of research right through to the clinic, and as an investor and scienist I believe it will pay to know who the players in the market are.
In summation, the sequencing market is extremely hot, and very competitive. However there may be room for multiple instruments which have different advantages depending on the application. For many applications, such as sequencing of everyday plasmids, the above technologies' read lengths are unacceptable. (Assembling 30 contigs to check the sequence of a 700bp gene is just not convenient ). In this manner, I dont expect the trusty 3730XL to be disappearing anytime soon.
Next time I'll get into Helicos and see what technology they have to throw into this frothing mix, for better or for worse.
Disclosure: I am long shares of HLCS