23andMe raw data is the beginning of a wonderful frontier of possibilities starting with the contemplation of the ultimate introspection.  With 579,320 SNP geneotypes and genome wide association studies (GWAS) and other reports popping up in on-line scientific journals, I’ve got a lot of SNPs to read about and to watch for. There are two great things about 23andMe: their curated analysis and the community. 23andMe staff go to great extent to provide references, resources, summary explanations, helpful visualizations and expert explanations about every possible way our SNP data can be interpreted. Within reason - they carefuly vet what research reports they’ll describe as having meaning, and that’s understandable. The community is great, too, providing commentary from professional and hobbyist seekers.

While looking for pragmatic and structured approaches to processing the data, I wanted another angle, some way to make a set of As Ts Gs and Cs stand out and put more a physical face on it. And, it’s fun.

Simply put, a color is assigned to base letters in the full, raw SNP data downloaded from 23andMe and rasterized. After downloading my data from 23andMe, the Visual SNP Chip reads the data and represents it in a visual form. Since there are a lot of bases (my SNP data contined over a million bases), I added a scaling function for the resulting image and also a way to select a subset range of the data. Visual SNP Chip can generate a PNG file so you can save it, share it, make it your background (I have mine as my phone’s background), print it or do whatever you want with it.
(download)
Above, 29,000 bases of my personal 23andMe SNP data; base legend; Nexus One background
Design decisions

There were a bunch of decisions that effected the design of the Visual SNP Chip.  The first and foremost one was providing as much level of comfort about how the data’s being used:  Nowhere in this application does the SNP data go over the wire.  It’s not sent anywhere or stored anywhere other than where you have it.  This means your web browser on your computer is the only thing that sees all the data. I used HTML 5’s File API in conjunction with Web Worker threads to read 14mb+ files rapidly and Canvas to display the visualization and also to render to a png.

The accuracy of the representation wasn’t a priority - the physical visualization was.  That’s not to say it’s a random or made up visualization. There were design choices that make the resulting visualization more aesthetic than objective.

As an example, pairs versus individual bases. SNPs come in pairs, such as AA and TG - that’s two bases per single nucleotide polymorphism.  As represented in the Visual SNP Chip, these are “flattened” - in other words, an AA is represented as two red boxes side by side, a TG is a blue and a green box side by side instead of, say, some combination of colors per SNP.  I experimented with creating a composite pixel - one made up of blue + green for TG, for example - but I didn’t get the correct appearance I wanted.  I want revisit the concept of representing the pair rather than the individuals, since that feels more accurate.

And lastly, it hasn’t been thoroughly tested - if you find something gone wrong or that can be done better, please let me know!  I’ve mainly used Google’s Chrome browser as my primary target (I’m using 9.0.597.83 beta at the moment) with Firefox (version 3.6.13 is what I have).  Since not all browsers support the same HTML 5 features, I haven’t really coded it for compatability.

There were a lot of things I’d like to do, but haven’t prioritized - drag and drop files onto Visual SNP Chip rather than having a file dialog to search for the data, handling zipped files rather than text files, using HTML 5 storage (database) for more efficiencies, auto-scaling of large number of bases, mouseover info, and multiple UI tweaks to make it easier to use.

There were also a lot of things I’m sure I haven’t considered. I’m hoping this will appeal to some folks and I’ll be able to take and incorporate their feedback.

Enjoy and please let us know what you think!

Vsc_1987
Visual SNP Chip displaying 1,987 bases (few enough that they could be displayed) and the resulting image.