Wednesday, July 25, 2007

Helicos Part II

This is the second in a series about Helicos Biosciences, a next generation sequencing startup

Helicos Part I

Having familiarized ourselves with the NGS market, we are better positioned to take a look at Helicos and evaluate their recent IPO


In 2003 professor Stephen Quake, then at California Institute of Technology, published a paper in the Proceedings of the National Acedemy of Sciences describing that sequence information could be obtained from a single strand of DNA without amplification. Quake's group was able to overcome the hurdle of resolving individual bases by using DNA polymerase in combination with fluorescent nucleotides to image sequential incorporations during synthesis. Furthermore, this paper demonstrated a breakthrough application of single molecule theory in the field of DNA sequencing.

Dr. Quake subsequently met with Noubar Afeyan and Stanley Lapidus, then CEO and Partner at Flagship Ventures respectively, and they agreed to found a company to develop and commercialize the single molecule sequencing technology. Professor Eric Lander, Director of the Broad institute, played some sort of advisory role during the founding stages of the company. The company was incorporated in May 2003 and was renamed Helicos BioSciences in November of the same year.

Besides launching an IPO, highlights of Helicos since its inception include receiving a $2 million grant from NHGRI. In addition they have assembled a rather strategic management team and a prototype collaboration with, among others, Dr. Leroy Hood of the Institute for Systems Biology. Finally, during the 2Q investor conference Mr. Lapidus stated that the company is on track for a product launch to take place later in 2007.

Helicos Heliscope

price : n/a

MB/run : n/a
run length : ~100MB/hr
MB/day : ~1000MB
Read length : ~25
Raw base accuracy : n/a

The length of time between this blog entry and my last is largely due to the difficulties I encountered researching on the exact processes that the Heliscope employs to enable its "true Single Molecule Sequencing" (tSMS) technology. Needless to say that Helicos, despite the media fanfare, is playing their precise technology pretty close to the vest.

Using the original PNAS article, available patents
, this chapter, (written by the two first authors on the PNAS paper for the Ohio U physics department) and some rumors, I will piece together what exactly we know about the Heliscope technology. Where applicable, I will add my suspicions to the best of my knowledge to fill in the gaps.

The Heliscope process in a nutshell is to shear the DNA and polyadenylate the fragments. These fragments, which also incorporate a dye molecule, are then attached randomly (via poly Ts) to the proprietary flowcell surface. Initial attachment locations are recorded via the dye molecule which is illuminated by laser excitation and recorded by a CCD camera connected to a microscope. After removing the dye, DNA polymerase and a dye-labeled nucleotide flow in and are then washed out. If the particular nucleotide-dye is complementary to any given fragment it will thus become incorporated into the growing strand. The camera will again mark the location of the fluorescence upon subsequent laser excitation. The dye molecule is then removed, washed away, and sequencing processes by repeatedly cycling through the four different nucleotides.

The benefits of sequencing single molecules of DNA are advantages of improved throughput and reduced cost. Compared to the other techniques discussed in my last entry, the Heliscope workflow is considerably less complicated, leading to shorter run times....there is no PCR, no beads, no microtiter plates etc. Of course this also equates to less reagents used, and conceivably, lower price per run.

However, the major challenge facing single molecule sequencing is that of sensitivity. Allow yourself to imagine the difference in signal intensity between a singly incorporated nucleotide on the Heliscope surface and the analgous millions of incorporated flourescent molecules on the Illumina system's flowcell surface. Therefore exploring how the Heliscope allegedly attains this sensitivity to the degree of six orders of magnitude over its competitors is essential to our assessment of the company.

In the field of single molecule imaging, the largest challenge is that of increasing both the resolution and sensitivity of a given signal past the limitations of the detecting instrument. This effort is primarily approached by increasing the signal to noise ratio. Given that the efficiency of the flourophores are maximal this is best accomplished by reducing the noise in the system. The Heliscope, apparently based on the method developed in Dr. Quake's laboratory, accomplishes this in two major ways. First, it uses a method called Total Internal Reflection Microscopy (TIRM) whereby only the flourophores within ~150nm of the flowcell surface are illuminated. This leads to a dramatic reduction of the noise from the bulk fluids. In addition, this method increases theoretical speed of readout as no scanning is involved. However, while TIRM reduces noise from objects in the solution far away from the surface, it does not reduce noise from surface bound impurities. In this manner, Dr. Quake's group overcame the second challenge of eliminating non-specifically bound surface dye molecules using a method known as Flourescent Resonant Energy Transfer (FRET). In this method, not one but actually two flourescent dyes are used. Requirements are that one dye, (Cy3) termed the donor, has an emission spectra that overlaps with absorption spectra of a second dye (Cy5), termed the acceptor. Thus when the donor molecule is within proximity to the acceptor, usually less than 10nm, and is excited at its specific excitation wavelength, it will transfer this energy to the acceptor dye which in turn becomes excited and emits a photon of lower energy. Consider it to be a kind of baton passing between the donor and the acceptor in a molecular relay race. The PNAS paper describes FRET being used by placing the donor molecule (Cy3) on the existing DNA strand and the acceptor (Cy5) on the incorporated nucleotide. In this manner, only the polymerase incorporated nucleotide-Cy5 molecules emit a signal while the non-specifically bound surface nucleotide-Cy5s remain dark because they are not within 10nm of the attached DNA containing a Cy3 donor. This combination of TIRM with FRET provide an unparalleled increase in the signal to noise ratio of single molecule detection and was indeed a groundbreaking application in DNA sequencing. Presumably this is the technology which makes the Heliscope possible, but how has Helicos improved, if at all, upon the technology?

Three fundamental questions remain which I will endeavor to answer in turn. How does the Heliscope deal with long stretches of repeat nucleotides? Does the Heliscope still use FRET and where exactly is the donor? How is the dye molecule "removed" after each incorporation cycle?

To begin, an initial issue with this method of DNA sequencing was a problem with accurately sequencing large numbers of repeat nucleotides, so called homopolymer regions. The problem can be realized if one imagines that during a cycle of a given nucleotide say dGMP the instrument would need to be able to detect 1, 2, 3 or more simultaneous incorporations if the template has a string of cytosines. As the problems explained above with signal to noise concerned simply detecting the presence of a fluorescent molecule it can be understood that detecting the difference between one and two incorporations is possible, but higher orders are out of the question. After a period of uncertainty it seems that Helicos has solved this issue. The press release on Feb 9th 2007 states "The proprietary nucleotide analogs contained in these unique formulations control accurate base-by-base extension through chemical means." I assume that they refer to their patent issued on Jan 30 2007. This patent simply describes using a dye-conjugated nucleotide in conjunction with DNA polymerase kinetics in such a way that one or two (but statistically insignificant amounts of higher) nucleotides are incorporated per relatively short reaction cycle.

Does the Heliscope use FRET? I ask because on the website the "technology" shows one dye molecule only. The short answer is that I don't know for sure, but I'm pretty sure. The original PNAS paper was published in 2003 which wasn't that long ago and all the single molecule people I know are still using FRET for the unsurpassed resolution. Furthermore, page 12 of this presentation (by the chairman of research core facilities and technology at the Mayo clinic) shows FRET as part of the process and furthermore has the donor attached to the end of the polyadenylated tail. This makes additional sense in light of the rumored ~25bp read length because at 3.4nm per turn and 10.5 bases per turn, that puts about 30bp within the 10nm range of FRET acceptor absorbance. It should be noted that there has been discussion in the original paper, the patents, and elsewhere of putting the donor molecule on the DNA polymerase itself, an area of research that I have no doubt that Helicos is actively pursuing for its later generation sequencers.

Finally, is the acceptor molecule removed and if so, how? On this topic I am unable to find any definitive information. Traditionally, the acceptor molecule is photobleached after detection using specific laser illumination at its absorbance. This leaves the donor molecule relatively unharmed, and capable of donating to another fresh acceptor. One drawback to this technique however, is that the acceptor molecule is not removed and therefore successive incorporations are compromised via steric interactions. In addition, successive photobleaching of the acceptor molecules will eventually also bleach the donor. Perhaps Helicos has developed a more robust donor, and an uncompromising acceptor, in this anything is possible. Alternatively, in the single molecule literature there are many examples of cleavable dyes, either chemically or photocleavable. That the dye is in fact cleaved off and removed seems indicated on the Helicos website, but I am not confident about the specific details of those slides. Certainly cleaving and removing the dye is the preferred method in this instance and if Helicos does not yet employ this method I am sure it is another development for future machines.

Phew! I hope you are still with me after all that. There is no doubt that it has been a challenge to disentangle the methodology from the hype regarding Helicos. However, the benefit is that we get to make an assessment based on quite a bit of science, long before S&P even touches it. At this point I still have some misgivings regarding whether or not the Heliscope will live up to expectations, but we will address these issues next time, as well as go over the financials.

Disclosure: I am long shares of HLCS

Monday, July 9, 2007

Helicos Part I

Helicos Biosciences part I (a.k.a Next Generation Sequencing overview)

Helicos Biosciences Corporation aims to commercialize a novel platform for “next generation” genomics. Their first product, the Heliscope, is designed to enable ultra-high-throughput genetic analysis based on the direct sequencing of single molecules of DNA and RNA.

Before we begin an in depth look at Helicos some background in sequencing is necessary. To begin, I am going to assume that the reader knows what sequencing is. More importantly, we can agree that as the price of personal genomics drops to a reasonable cost the impact on human biology and modern medicine will be changed profoundly. In essence, the hype of the sequencing of the human genome is finally coming to fruition as the cost of individual sequencing approaches the $1000 mark. In addition, for those who may have been napping I will recap the sequencing news from the last 12 months. Next generation sequencing companies are ostensibly getting snatched up by larger pharmaceuticals. In May 2006 Agencourt Personal Genomics was bought for $120million by Applied Biosystems, then in November Illumina purchased Solexa for a whopping $600million, and finally Roche payed $155million for 454 life sciences in March of 2007. What the three acquired companies have in common is that they employ technologies known as “next generation” sequencing. In other words, they have developed technologies that vastly increase the speed and breadth of DNA sequencing over the traditional Sanger method.

Next Generation Sequencing (NGS)
Sanger sequencing is the gold standard for very large projects. Unfortunately it does require a large infrastructure. The current state-of the-art, the Applied Biosystems 3730 xl Genetic Analyzer has an average read length of 1000bp and can generate a maximum 2.1Mbp (2,100,000) of sequence per day. This machine is priced at ~$400,000, and estimated cost for sequencing a human genome using the 3730 xl is $24M.

Most importantly, sequencing a human genome on this machine with six-fold coverage (~18 GB) would take 18 years. Because of this, large scale sequencing efforts have been carried out by genome centers which employ many machines running in parallel.

The goal of developing Next Generation Sequencing is to develop technologies that can produce a complete human genome in a reasonable time-frame, by using a single sequencing machine for $100,000 and eventually $1000. Lets have a look at the three top currently marketed products and methods.

Illumina/Solexa 1G genome analyzer
Solexa was the first out of the gate to provide a technology capable of generating a $100000 genome with their 1G analyzer which began shipping in the second quarter of 2006. Having just celebrated their 75 order placement, this puts Illumina firmly in front with the first mover advantage in the NGS market.


MB/run: ~
Run Length: ~2-3 days
MB/day: ~500MB
Read Length: ~25bp
Raw Base Accuracy:99.99%

Genomic DNA is first sheared into small fragments and adapters are ligated onto both ends of the sequence. The DNA is then added to the flow cell whereby the ends bind to the proprietary surface on the inside of the channels.
The free adapters of these fragments form bridges to the complementary nearby attached primers Next during a process known as solid phase amplification the fragments are thermocycled in the presence of nucleotides and polymerase and the bridges become double stranded. After denaturation, repeated cycles of this process give rise to random dense clusters of homogeneous DNA fragments containing millions of copies.

Solexa sequencing uses four proprietary, different colored, fluorescently-labeled modified nucleotides to sequence the above clusters present on the flow cell surface. Clusters are first referenced via laser excitation of the flourophores and their individual locations are captured via a CCD detector. In subsequent cycles progressive bases are sequenced as nucleotides are added, the entire slide is excited and scanned for individual colored incorporations, and then the flourescence is removed for the next base to repeat the cycle. In this manner each cluster on a slide is sequenced in a massively parallel manner.

Applied Biosystems SOLiD (Supported Oligo Ligation Detection)
Applied Biosystems, the current leader in sequencing has their own product now on the market. ABI's current NGS platform the SOLiD system. The shipping of initial units began in June 2007.

Price: $600,000 which includes the instrument, a computing cluster, a high capacity data storage centre and ancillary equipment for upfront sample preparation.

Run Length:~2-3days
MB/day: ~500MB for mate pair samples (genome resequencing)
Read Length: up to 35bp
Raw Base Accuracy: 99.94%

The technology works by first amplifying DNA fragments using a water in oil emulsion polymerase chain reaction (PCR) technique that amplifies the DNA onto polystyrene beads. When the emulsion is broken the beads float to the top of the sample and are then placed on an array. Sequencing primers are then added along with a mixture of four different fluorescently labelled oligo probes. The oligo probes are eight bases long and bind specifically to the fifth base in the sequence to determine which of the four bases (A, T, C or G) it is. After washing and reading the fluorescence signal from the first base, a ligase is added, not a polymerase as in standard Sanger sequencing. The ligase cleaves the oligo probe between the fifth and sixth bases, removing the fluorescent dye from the strand of amplified DNA.

The whole process is repeated using a different sequence primer, until all of the intervening positions in the sequence are imaged. The process allows the simultaneous reading of millions of DNA fragments in a 'massively parallel' manner. This 'sequence-by-ligation' technique also allows the use of probes that encode for two bases rather than just one allowing error recognition by signal mismatching, leading to increased base determination accuracy.

Roche/454 Life Sciences Genome Sequencer FLX
Roche Diagnostics began distribution of the GS FLX in November 2007. The FLX system incorporates a number of technological advances over the original GS 20 launched by 454 Life Sciences in October 2005.

: ~$500000. (Academic Price?)

Run Length:7.5hours
Read Length: ~250bases/read (depending on the organism)
Raw Base Accuracy: 99.5%

Like the 1G analyzer, the GS FLX also uses a methodology of sequencing by synthesis (rather than ligation) specifically known as pyrosequencing. DNA fragments 300-500bp in length are ligated by two short adaptors, which provide primers for both amplification and sequencing of the fragment as well as a biotin tag that immobilises it onto a streptavidin-coated bead.

A subsequent emulsion PCR step gives rise to beads with millions of copies of each DNA fragment attached. These beads are then deposited in 454's PicoTiterPlate device by centrifugation. The PicoTiterPlate wells are 44um and therefore fit only one bead apeice. In addition, each well has an optical fibre attached to its base, which form an array leading to a CCD camera.

The fluidics sub-system allows nucleotides to be pumped in, in a fixed order. During the nucleotide flow each of the beads is sequenced in parallel, with the polymerase extending the sequencing strand only if the nucleotide is complimentary to the template strand. The addition of one (or more) nucleotides results in a sequential pyrophosphate reaction with sulfurylase and luciferase producing a light signal, which is recorded by the instrument's camera.

Overview Conculsion:
I have been wanting to take a hard look at the next generation sequencing market for a little while now, and after the initial research needed for a background in Helicos I am very glad that I put in the effort to research it thoroughly. The implications involved with this technology are undoubtedly going to be groundbreaking from all areas of research right through to the clinic, and as an investor and scienist I believe it will pay to know who the players in the market are.

In summation, the sequencing market is extremely hot, and very competitive. However there may be room for multiple instruments which have different advantages depending on the application. For many applications, such as sequencing of everyday plasmids, the above technologies' read lengths are unacceptable. (Assembling 30 contigs to check the sequence of a 700bp gene is just not convenient ). In this manner, I dont expect the trusty 3730XL to be disappearing anytime soon.

Next time I'll get into Helicos and see what technology they have to throw into this frothing mix, for better or for worse.

Disclosure: I am long shares of HLCS