23andMe is a startup based in Mountain View, California. Founded in 2006, its core business is genome sequencing for individuals, and providing additional information on your ancestry and possible disease risk, which you can access on their website.
The cost of sequencing a person’s genome used to be prohibitive. However, 23andMe with its deep pocket of venture and personal funding (Co-founder Ann Wojcicki is the wife of Google co-founder Sergey Brin), was able to cut the sequencing price from $999 in 2007 to $399 in 2008, then to $299 until end of 2012. In December 2012, with $50 Million Series D funding, 23andMe slashed the price to $99 per person. Such price is probably below their actual testing cost. Why the price cut? 23andMe states that their goal is to get 1 million people participate.
What is the drive behind the large expansion of the user base? The first potential is disease discovery. With a large population, a disease can be more solidly linked to genome data. Suppose we find gene mutation in 1 diabetes patient, it is not enough to conclude that the mutation caused her diabetes. However, if we find the same gene mutation in 1000 diabetes patients, we can be more confident to draw this conclusion. Ultimately it is getting a large enough size of population sample so that we can uniquely link a segment of the gene mutation or ancestral traits to a disease.
By December 2012 (before the price slash), 23andMe has accumulated 180,000 individual genome profiles . So far, this is the largest dataset any one organization has accumulated on human genomes. Combined with the self-reported health profiles of these customers, studies of disease link to gene patterns can be done more conclusively.
23andMe has partnered with Genentech to study a range of diseases from Alzhermer’s, to breast cancer, and (mostly recently) Avastin. In addition, the company received a small funding from NIH to study allergy and asthma. Given the large population data of genomes, we could see some exciting discovery.
Data mining will play a big role in these new discoveries. Note only data mining enables pattern discovery in a large data where there are many different diseases and persona traits, it can also create predictive models on disease onset related to person’s genome profile. The feature selection technique from data mining also has worked well on genome study where there are more than 20,000 gene features but only a few data points. Even with 1 million people in the data, the problem of small data points could still exist when only a small of group of people have similar diseases (Thus it is important to get even data from more people, ideally tens of millions or even billions).
The future of genome study is closely linked to data mining. This is an exciting time to be a data miner.
 23andMe press release, “23andMe Raises More Than $50 Million in New Financing”, December 11, 2012. http://mediacenter.23andme.com/press-releases/23andme-raises-more-than-50-million-in-new-financing/