Illumina Demonstrates 350 Gb Run on HiSeq 2000

Hits:3678 Date: 3/15/2010

This article, originally published March 4, has been corrected to reflect that two flow cells, not two runs, on the HiSeq instrument generated 124 and 145 gigabases of data.

Illumina said earlier this month that internally, it has generated 350 gigabases of human genome sequence data in a single run on its new HiSeq 2000 platform.

During a company workshop at the Advances in Genome Biology and Technology conference this month, Illumina representatives and a customer showed results from early projects in which they used the HiSeq 2000 for whole-genome sequencing and gene expression analysis and compared its performance to that of the GAIIx.

Illumina launched the HiSeq 2000 last month and said that the instrument, which has two flow cells and uses dual-surface imaging, will have an initial throughput of 200 gigabases per run (see In Sequence 1/12/2010).

At the company's workshop, David Bentley, Illumina's chief scientist, said that he and his colleagues have compared the HiSeq platform internally to the Genome Analyzer IIx for human genome sequencing, and found that there is "essentially the same look and feel to the data," which are comparable in quality.

Bentley showed metrics for a number of runs of human DNA samples on the HiSeq, using paired 100-base reads. Read error rates were on the order of 0.5 percent to 1 percent, and in one example, 80 percent of the first reads and 75 percent of the second reads matched perfectly to the reference genome.

The first commercial version of the HiSeq 2000 has a lower cluster density than the current GAIIx, but internally, Illumina has already "substantially increased" the cluster density to approach the highest density used on the GAIIx, Bentley said. As a result, the company has generated 350 gigabases of data in a single HiSeq run for a human sample.

Illumina has also run pairs of tumor samples and matched controls, one on each flow cell of the instrument, achieving 222 gigabases of data in one run, and 330 gigabases in another. The number of single nucleotide variants called from these genomes is at a "very comparable" level to GAIIx data, he said.

Illumina has also run samples for a handful of customers on its HiSeq machines, including one for Elliott Margulies, head of the genome informatics section at the National Human Genome Research Institute, who spoke at the company's workshop at AGBT.

Margulies said that NHGRI's intramural sequencing center currently has nine GAIIx instruments installed — as well as a number of capillary sequencers and a 454 instrument — and is waiting for a HiSeq to arrive.

Illumina sequenced the genome of a patient from the NIH's "Undiagnosed Diseases Program" for his group using HiSeq. He and his colleagues sequenced the same library internally on its GAIIx to compare the datasets from the two platforms.

While two runs on the GAIIx — one a partial run — produced a total of 118 gigabases, two flow cells from a single HiSeq run generated 124 gigabases and 145 gigabases, respectively, with comparable error rates. Both platforms generated paired 100-base reads.

"The data are really equivalent to the GAIIx, both in quality and in format," Margulies said. "While the hardware has dramatically changed, the export file is still the same, it's just five times bigger," so he had to split the data from each lane into four batches for further analysis.

Other researchers at Illumina have "put the HiSeq 2000 through its paces" for RNA-seq and other RNA-based applications, according to Gary Schroth, a senior director for R&D at Illumina.

In one experiment, he and his colleagues analyzed the transcriptome of 16 human tissues, sequencing one tissue per lane using standard mRNA-seq libraries. One run generated 2.54 billion paired 50-base reads; another 1.26 billion 75-base reads. In a third run, the researchers pooled RNA from all 16 tissues — prepared using three different protocols that differed in whether they used total RNA or selected mRNA, and whether they removed ribosomal RNA — and sequenced them, generating 1.2 billion 100-base reads.

The error rates, Schroth noted, were "very much comparable" to the GAII, ranging between 0.2 percent and 0.8 percent, depending on the read length, and the data are "much like we are used to on the GAIIx, just six times more."

In an example of a sequencing-based high-throughput gene expression profiling experiment, "which we have all been talking about for years," he said, Schroth's team sequenced 192 different mRNA libraries in a single run — 12 per lane — generating 1.38 billion 50-base reads, or 7.1 million reads per sample on average. The run time was 2.5 days and the cost of the run was $8,900, or $46 per sample, including cluster reagents, flow cells, and sequencing reagents, but not library prep.

Such multiplexed experiments will be further enabled by a new, automation-friendly library preparation kit that Illumina plans to launch within the next two or three months for both genomic sequencing and RNA sequencing, according to Jeremy Preston, Illumina's marketing manager for sequencing.

These kits will allow users to process up to 96 samples in parallel in pools of up to 12, reduce the number of gel-based steps by 90 percent, and eliminate about three-quarters of clean-up steps, Preston said.