Layout examples

Page 1

Examples of bad layout

Leveraging Maine Middle School Laptops and Apple XGrid to Perform Scientific Computation Glen Beane The Jackson Laboratory , Bar Harbor, Maine 04609 USA Abstract

Single Nucleotide Polymorphism

In 2002, the Maine Learning Technologies Initiative (MLTI) began an initiative to get laptops into the hands of 7th and 8th grade students throughout the state resulting in a deployment of over 36,000 Apple iBooks as of 2006. The Jackson Laboratory, a nonprofit genetics research institution located in Bar Harbor, Maine has partnered with Apple Computer and the MLTI to leverage XGrid and the middle school laptops to perform scientific computations. Not only are we at the Jackson Laboratory leveraging these laptops to conduct scientific research, but we are also using them as a way to reach out to students to directly engage them in the cutting edge scientific research being conducted here "in their backyard." Jackson Laboratory researchers and software engineers identified a scientific problem for the pilot XGrid deployment. This problem is to annotate single nucleotide polymorphism (SNP) datasets, which can consist of millions of data points. Each dataset can be split into many independent pieces since the annotation of one SNP does not depend on any other SNPs, and the order of execution is unimportant. These properties make this problem ideal for XGrid. The program was initiated with Connors Emerson School in Bar Harbor, Maine. Plans are underway to expand the program to more volunteer schools, and to identify additional scientific problems.

Observations • At any given time a large number of student laptops are unavailable.

A Single Nucleotide Polymorphism (SNP, pronounced snip) is a DNA sequence variation in a single nucleotide between members of the same species. Dr Graber’s lab wants to annotate large SNP datasets with additional information to help determine what the functional implications of the SNP may be. One SNP dataset of interest to the lab contains over 8 million SNPs. This annotation needs to be redone periodically when the SNP dataset changes or the source of the annotation data is changed. Computing millions of annotations is time consuming but each SNP can be annotated independently. SNP Annotation seemed like a perfect problem for our pilot - straight forward, easily decomposed into many independent sub-problems, and existing program would require few modifications.

Annotating SNPs

• Availability somewhat based on school schedule •For example, fewer laptops are available during lunch period, evening, or during student vacations • SNP annotator proved to be well suited for XGrid • Split into 1667 individual input files (approx ~5000 SNPs each) • Splitting takes less than 5 minutes, Submitting and retrieval ~2 hours • Annotation: approximately 20 minutes per input file on MLTI laptops (average, depends on load) • Entire Genome could take over 23 days on a single MLTI laptop (much less on high powered workstation, usually 2 to 3 days) • Using small enough pieces and enough laptops, total run time could be reduced to just over two hours (approximately 6400 agents would allow us to complete the entire process in about 2 hours 10 minutes)

• Each SNP is described as a position on the genome, annotations are also relative to the genome. The SNP annotation project integrates the two, inferring the functional implications of the SNP based on the transcript notation.

Introduction Maine Learning Technology Initiative (MLTI) is a 1 to 1 technology initiative that has equipped all 7th and 8th grade students in Maine with Apple laptops. The Jackson Laboratory is a private non-profit genetics research laboratory located in Bar Harbor, Maine. TJL’s mission is to discover the genetic basis for preventing, treating and curing human disease, and to enable research and education for the global biomedical community.

• Determine if SNP falls within a gene (intragenic) or falls outside of a gene (intergenic). • If intragenic, does the SNP fall in the gene’s coding region (exon) or does it fall in a non-coding region (intron)? • If the SNP is located in an exon, does it code the same amino acid (coding synonymous) or does it alter the amino acid sequence of the protein (coding nonsynonymous)? • Code degeneracy (64 codons to specify 20 amino acids) means some SNPs will not change which amino acid a codon specifies. • Other SNPs can change amino acid specifying codons into a stop codon and truncate the amino acid sequence.

Apple, Inc approached the Laboratory about using XGrid to perform scientific computations on MLTI laptops. TJL, Apple, and MLTI began a project to deploy XGrid on a number of MLTI laptops and use them to solve an existing computational problem at the Lab. Rather than just using MLTI laptops to perform the computation we would use the project as a way to engage students.

Future Work • Identify additional scientific problems at TJL that would be well suited for XGrid • Incorporate more schools and more computers at TJL (approx. 500 Macs on site at the Lab). • In order to be a viable tool to TJL the XGrid will have to include many more schools. • How far will we be able to scale the system? • How will our XGrid controllers cope with potentially tens of thousands of agents? • What kind of load will this place on our network at the Lab and at the individual schools? • Develop a program to “profile” the XGrid • measure bandwidth to agents • track agent’s average time online and available for jobs • determine optimal run times and data sizes for TJL/MLTI XGrid applications • Educational outreach • Develop a web presence for the project so that students can learn about the science • What are we trying to learn? • What are the implications for human health? • Provide statistics so students can see how much compute time their school has provided for the project.

Middle school Students visiting The Jackson Laboratory

Pilot Study The pilot study of the MLTI/Jackson Laboratory XGrid project included the identification of a well suited scientific application. In selecting a scientific problem for the pilot study we had a few criteria in mind: • The problem should be computationally intensive (take days or more on single workstations) • The program should require minimal modification to run in the XGrid environment • The problem space should be easily decomposed into independent sub-problems A suitable application was selected, and students at Connors Emerson school were chosen to participate in the pilot study. The application we selected for the pilot study was a SNP annotation program developed by Dr. Joel Graber’s lab at The Jackson Laboratory. Two retired MLTI XServes were provided to serve as XGrid controllers at TJL. The Only cost to TJL was labor to configure the XGrid controller.

Examples of good layout

Examples of a good grid structure

Acknowledgements Running SNP Annotator on XGrid • C program that originally ran on Linux (command line interface) • minimal modifications required to read in split data • compiled as Universal Binary • annotation data can be downloaded at run time from web server or preinstalled on agents • SNP data sent with job • results are simply concatenated after jobs are finished

At The Jackson Laboratory: Joel Graber, PhD Lucie Hutchins Jon Geiger, PhD Chuck Donnelly Jon Mitchell Shaun Meredith, InfoBridge Jim Doyle, Apple Jeff Mao, Maine Department of Education, MLTI Rick Barter, Conners Emerson School


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.