This new genome map tries to capture all human genetic variation
This new genome map tries to capture all human genetic variation
The reason a reference genome is important is that when a new person’s genome is sequenced, that sequence is projected onto the reference in order to organize and read the new data. Yet since the current reference is just one possible genome, missing bits that some people have, some information can’t be analyzed and is usually ignored.
Researchers call this effect “reference bias” or, more simply, the streetlamp problem. You don’t see where you don’t look.
“It’s hard to appreciate just how important the current reference is. We use it like a coordinate system or a map, and we refer to it constantly when we talk about genes,” says Benedict Paten, a computational biologist, also at Santa Cruz, and the senior author of the report.. “But it’s both incomplete and lacks diversity. It lacks the things that make us different—in other words, the interesting bits.”
Officials with NIH said they hoped the new update to the genome map would make gene research more “equitable.” That’s because the more different your genome is from the current reference, the more information about you could be missed. The existing reference is largely the DNA of one African-American man, although it includes segments from several other people as well.
“If the genome you want to analyze has sequences that are not in that reference, they will be missed in the analysis,” says Deanna Church, a consultant with the business incubator General Inception, who previously held a key role at NIH managing the reference genome. “In reality, the notion that there is a ‘human genome’ is really the problem,” she says. “The current version is the simplest model you can make. It made sense when we started … But now we need better models.”
Piecing together the puzzle of us
The pangenome, which itself remains at draft stage, was constructed with the help of two newer technologies. One is a type of sequencing machine that reads out very long stretches of DNA in one go. Most sequencing is done by shredding DNA into tiny bits, under 200 letters long. But the new machines, made by the company Pacific Biosciences, produce continuous readouts of 10,000 letters at once.
Such “long reads,” as researchers call them, are like extra-large puzzle pieces that are much easier to arrange correctly in the actual order they’re present in a person’s genome.
That puzzling-together process—called genome assembly—is the other area where researchers say they’ve made advances with new computation tools. Even so, organizing and comparing 47 genomes at once (each with about 6 billion pairs of DNA letters) remains a gnarly problem.