Monday, December 22, 2008
BiRG Christmas Cake
Our BiRGer 'Mark' surprised me today when he walked into the BiRG lab with a big cake. Here are the photos of the cake and the BiRG lab.
Thank you very much for the cake! A very Merry Christmas and a very Happy New Year to you all.
Sunday, August 31, 2008
IU School of Informatics named to Computerworld's top IT schools to watch
BLOOMINGTON, Ind. -- Computerworld magazine announced in its August 2008 issue its "Top IT Schools to Watch 2008," and the Indiana University School of Informatics was among the 10 schools recognized in a feature article on graduate programs.
The schools, including institutions such as Carnegie Mellon University, Stanford University, the University of Pennsylvania, and the University of Virginia, were selected based on how well they were keeping pace with today's IT workplace, and the relevance of their curriculum to the ever-changing technology industry.
IU's School of Informatics was touted for not only providing students with real-world experience, but for its interdisciplinary approach to the field and for the responsiveness of its faculty and the students.
The list was compiled by a panel of more than two dozen IT executives, hiring managers, recruiters and academics who were asked to help identify the country's leading-edge schools for IT workers seeking to advance their careers. They considered graduate-level IT programs and schools that give graduates the best value in terms of salary increases or promotions vs. cost of tuition, and that best gear their curriculum to the everyday demands of today's IT workplace.
From the IT schools selected by the panel, Computerworld editors chose the innovative IT schools to profile. Finally, Computerworld partnered with Dice.com to survey alumni at the schools, asking for feedback on their satisfaction with their schools' program.
"We are honored to be part of Computerworld's list for 2008," said Bobby Schnabel, dean of the School. "It is gratifying to be in the company of schools that have long been considered at the top of the computing field, and to gain recognition for our still young set of graduate programs in informatics."
The complete story can be found in the August issue of ComputerWorld magazine and online at www.computerworld.com.
Founded in 2000 as the first school of its kind in the United States, the Indiana University School of Informatics is dedicated to research and teaching across a broad range of computing and information technology, with emphases on science, applications, and societal implications. The school includes the Departments of Computer Science and Informatics on the Bloomington campus and Informatics on the IUPUI campus.
The school administers a variety of bachelor's and master's degree programs in computer science and informatics, as well as Ph.D. programs in computer science and the first-ever doctorate in informatics. The School is dedicated to excellence in education and research, to partnerships that bolster economic development and entrepreneurship, and to increasing opportunities for women and underrepresented minorities in computing and technology. For more information, visit www.informatics.indiana.edu.
Thursday, August 28, 2008
Re: Speed Museum Membership
Friday, July 25, 2008
BiRG minutes: 7-18-2008
Comparing Two Sequences
Objectives:
• Get the basics about dot plots
• Know how to interpret the most common patterns in a dot plot
• Use Dotlet
• Use Lalign to extract local alignments
Why compare two?
• Database searches are useful for finding homologues
• Database searches don't provide precise comparisons
• More precise tools are needed to analyze the sequences in detail including
– Dot plots for graphic analysis
– Local or global alignments for residue/residue analysis
• The alignment of two sequences is called a pairwise alignment
Dot Plot:
• A dot plot is a graphic representation of pairwise similarity
• The simplicity of dot plots prevents artifacts
• Ideal for looking for features that may come in different orders
• Reveal complex patterns
• Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain
Choosing your two sequences:
• Making pairwise comparisons takes time
• Use BLAST to rapidly select your sequences
– More than 70% identity for DNA
– More than 25% identity for proteins
• If your sequences are too similar, comparing them yields no useful information
What can you analyze with Dot Plot?
• Any pair of sequences
– DNA
– Proteins
– RNA
• DNA with proteins
– Dotlet is an appropriate tool
– To compare full genomes, install the program locally
• Sequences longer than 1000 symbols are hard to analyze online
• Divergent sequences where only a segment is homologous
• Long insertions and deletions
• Tandem repeats
The square shape of the pattern is characteristic of these repeats
Using Dotlet:
• Dotlet is one of the handiest tools for making dot plots
• Dotlet is a Java applet
• Open and download the applet at the following site:
– www.isrec.isb-sib.ch/java/dotlet
• Dotlet slides a window along each sequence
• If the windows are more similar than the threshold, Dotlet prints a dot at their intersection
• You can control the similarity threshold with the little window on the left
• Every dot has a score given by the window comparison
• When the score is
– Below threshold 1 ó black dot
– Between thresholds 1 and 2 ó grey dot
– Above threshold 2 ó white dot
• The blue curve is the distribution of scores in the sequences
• The peak ó most common score,
– Most common ó less informative
• Window size and the stringency control the aspect of your dot plot
– Very stringent = clean dot plot, little signal
– Not stringent enough = noisy dot plot, too much signal
• Play with the threshold until a usable signal appears
• The square shape is typical of tandem repeats
• The repeats are not perfect because the sequences have diverged after their duplication
Comparing a Gene and its Product:
• Eukaryotic genes are transcribed into RNA
• The RNA is then spliced to remove the introns' sequences
• It may be necessary to compare the gene and its product
• Dotlet makes this comparative analysis easy
Aligning Sequences:
• Dotlet dot plots are a good way to provide an overview
• Dot plots don't provide residue/residue analysis
• For this analysis you need an alignment
• The most convenient tool for making precise local alignments is Lalign
Lalign and BLAST:
• Lalign is like a very precise BLAST
• It works on only two sequences at a time
• You must provide both sequences
Going Farther:
• If you need to align coding DNA with a protein, try these sites:
– www.tcoffee.org => protogene
• If you need to align very large sequences, try this site:
– www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi
• If you need a precise estimate of your alignment's statistical significance, use PRSS
– The program is available at fasta.bioch.virginia.edu
– Low E-value ó good alignment
BiRG Minutes : June 11, 2008
Analyzing Protein Sequences
In-silico biochemistry
Sliding-windows techniques – most ancient way of looking at sequences
-used if the strand of DNA was cut in the middle
-ND THE WAS where the A was cut off
NDT HEW AS
DTH EWA S
THE WAS
-Use past experiences and what proteins have been together in the past
-Hydrophobicity is the most popular analysis – a good indicator of transmembrane segments or core regions within a protein.
Predicting transmembrane domains
ProtScale allows one to compute and represent the profile produced by any amino acid scale on a selected protein.
amino acid scale is defined by a numerical value assigned to each type of amino acid.
THMM Transmembrane Helix Prediction is a method for predicting transmembrane helices based on a hidden Markov Model (HMM)
HMM - a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. The extracted model parameters can then be used to perform further analysis, for example for pattern recognition applications.
THMM creates a prediction, or what it should have been
ProtScale has parameters and shows what it is
Predicting post-translational modifications w/PROSITE
Proteins get modified between the cell and getting read
PROSITE motifs are written as patterns
– Short patterns are not very informative by themselves
– They only indicate a possibility
– Combine them with other information to draw a conclusion
NOT EVERYTHING IS IN PROSITE
Interpreting PROSITE patterns
Some patterns may suggest nonexistent protein features
Short patterns are more informative if they are conserved across homologous sequences
Domains is defined as " independent globular folding units". It is a portion of protein that can keep its shape if you remove it from the rest of the protein. It consists of at least 50 amino acids. - Domains are like the various components of our kitchen – such as the oven, the microwave, the refrigerator, etc. All together they constitute the complete kitchen, but they can also exist separately. You only need to use microwave when making pop corn - which can be done outside the kitchen.
An average protein consists of 2 or 3 domains. Usually each domain plays a specific role in the function of the protein. It may interact with other proteins, or bind ion like calcium or zinc, or it may contain an active site. It is common to have a catalytic domain associated with a binding domain and a regulatory domain. Imagine - a toaster, where you have the grill [catalytic], the toast holder [binding], and the switch [regulation].
Domains are like independent functions that can be taken out of a program but still function
Researchers
A domain is a multi-sequence alignment similar to a puzzle
Using Domain collections
Scientists have been discovering and characterizing protein domains for more than 20 years
Manual collections are precise but small; where the researchers must document everything on their own
Automatic collections go out and find data in research documents etcIt is probably that only one of these servers will have the information to help you understand your protein
Friday, June 27, 2008
Meeting Minutes: 27th June 2007
The discussion today was about Working with a single DNA sequence : the slides are available in the Shared Folder
David began his presentation with the statement "Not everything that can be counted counts and not everything that counts can be counted"
Topics covered in detail included
- PCR
- Cloning
- Cleaning a DNA sequence (removing contamination)
- Restriction Map
- FIRTSMARKET website (use Netscape browser)
- Primer Design (MIT Website)
- Emboss
- Gene Prediction [ prokaryotes -> GeneMark ]
- Gene Prediction [ eukaryotes -> Genomescan ]
- BLAST
- Shotgun sequencing/contigs/assemble/homology
- read pieces
- PHRAP assembly program
- CAP3 program for joining small sequences
We have a nice discussion with Dr. Ramachandran explaining several concepts during the presentation. We also discussed about gene expression and Hidden Markov Models
Lastly, we do not meet next week - Independence Day - July 4
Friday, June 13, 2008
BiRG MINUTES June 13, 2008
Types of Organisms: Prokaryotic, Eukaryotic, and Archea
Protein Maturation
Deciphering a Swiss-Prot entry
Specialized protein databases: KEGG (the metabolic pathways database) or PDB (structure database)
2 ways to predict genetics
1. Genes to proteins or translation (genomics)
2. DNA
We must merge the two
From Gene to functional Protein
DNA > mRNA > proteins > upon maturing > transportation > destination
Protein Maturation:
-removal of some fragments
-specific protein cleavage
-chemical modifications
-Phosphorylation (addition of phosphate that gives the protein its shape)
-adition of lipids or sugars (glycosylation)
-Proteins are often modified to make them active
-Modification can imply attaching a lipid or a sugar
-Use these resources to determine the details of the modification
Swiss-Prot Database – (British) entries describe all proteins that have known functions
tremble contains the 4 mill putative proteins found in GenBank
Swiss-Prot contains the subset of tremble with a known function
This is redundant to create many databases using the same information
A Swiss-Prot entry: www.expasy.org/uniprot/P00533
Gen Info (accession number), References, Commments, Cross-reference, feature table, sequence
General Information: Entry Name, Primary Accession Number (PXXXX [P is for protein]), Last Modified, Protein name and synonyms, from/taxonomy fields (tells where protein came from), references section
Comments section lists all the known functions of the protein
Features Section localizes precisely every known function of your protein, each on its sequence
• TRANSMEM: Transmembrane domain (something that passes through the membrane)
• ACT_SITE: Active sites (where chemicals can bond)
• BINDING: Binding sites
• DISULPHID: Bridge of cysteines
• EMBL: GenBank original DNA sequence
• PDB: Experimental structure of your protein
• DIP: Proteins interacting with your protein
• GlycoSuiteDB: Glycolsylations
• MIM: List of genetic diseases involving your protein
• Ontologies: Function of your protein
• Profiles: Known protein domains in your protein
• ENSEMBL: Genomic location of your protein
By alternative splicing, the protein can have MANY functions
• To find out about the function of your protein, you will need to determine
– Where your protein works
– Metabolic pathway in which the protein is involved
– The protein's 3D structure
– Which protein family it belongs to
Where do proteins work?
Part of the metabolic pathway
Chain of production linking several different proteins
Modify metabolites by passing them from one enzyme to the next
On KEGG pathway, each enzyme appears w/its EC number
– KEGG is the most extensive database of metabolic pathways
– You can use it to compare species Japan
– The IUBMD assigns the EC numbers used to describe an enzyme activity UK
– An exhaustive list of all known metabolic pathways in E. coli and other bacteria
Some important Protein Families
– Kinases control everything in us; their deregulation is the cause of many cancers
– Immunoglobulins are key elements of our natural defenses
– This site is a key resource on restriction enzymes
Predicting protein function is a central goal in biology
• Protein databases help organize knowledge
• They provide the material for
– Developing new biological experiments
– Developing new prediction algorithms
– Extrapolating experimental data to unknown sequences
Friday, June 6, 2008
June 6, 2008 MINUTES by William Apple
Distinguishing structure of eukaryotic and prokaryotic proteins
Eukaryotic DNA in nucleus, nuclear DNA
mRNA-photocopy of DNA that is carried to Ribosomes
alternative splicing-introns "comment" out segments of exons (information) to generate many types of proteins
Eukaryotes (1 gene/1Kb) are very complex compared to Prokaryotes (1 gene/100Kb)
GenBank – housed by national Center of Biotechnoloies
memory of biological science; many biologists send genes they find to keep the database up-to-date
Reading a Prokaryotic GenBank entry (p75)
ACESSION is the accession number
LOCUS contains information on gene size
ORGANISM defines the organism containing the gene
REFERNECE indicates who produces the gene
FEATURES
Reading a Eukaryotic GenBank entry
Gene-centric databases: pieces genes together to work with; uses GenBank data; Entrez Gene genome
ENSEMBLE visualization of human chromosomes – can click and zoom on various parts of a chromosome
TIGR Institute
DoE Joint Genome Institute
University of California – good alternative to ENSEMBL; is a mirror site
We had many good discussions. The meeting was very productive.
Minutes from 30 MAY 08
- David brought along a very informative video on Bioinformatics
- The video is a "must see" and is on the SHARED Folder of the BiRG server.
- David continued to discuss about the different avenues with BioInformatics research directions
- William raised a few interesting questions, Dr. Ramachandran explained the concepts about introns and exons with genes.
- John Lannon submitted his digital version of the paper titled "Investigating Alu Distribution by Family across Human Chromosome Sequences" for inclusion in the Undergraduate Research Journal.
This work had won the Chancellor's Award for Interdisciplinary Achievement in this year's Celebrating Success Student Conference held on April 17-18, 2008
23 MAY 08 - Meeting Minutes
- The BiRG lab welcomed another new member William Apple.
- William is our new Undergraduate Research Assistant working on Flash Animations and Social Informatics.
- David continued with his presentations about Bioinformatics and Proteomics
- BiRG accounts were set-up for William and David
- The Lab had coffee and chocolates again for the BiRGers
Meeting Minutes: 05/16/2008
- The BiRG lab celebrated the the Success of fellow BiRGer John Lannon with coffee, soda and chocolates.
- David started his tenure with the BiRG Lab with a very informative discussion on Bioinformatics.( His presentation slides are available on the BiRG Server in the Presentation Folder)
Very interesting discussion followed - with Prof Manwani sharing with us his insights on Karma and DNA.
Theresa and Shawn also kept the discussion active. At the end Dr. Ramachandran informed the lab about available resources and one-one-one appointment time available for scheduling.
Friday, April 25, 2008
Congratulations to John Lannon
Chancellor's Award for Interdisciplinary achievement | |||
2008 | Investigating Alu Distribution by Family Across Human Chromosome Sequences | John Lannon Informatics | Sridhar Ramachandran |
Excellent work John!! You make us proud. We have free coffee in the lab today in celebrating this achievement :).
Friday, April 4, 2008
Meeting Notes from 04/04/2008
The four new machines have the following specs:
- Dell XPS 420
- Intel Core2 Quad (Q6600) CPU running @ 2.40 GHz
- 3.00 GB of RAM
- 22" wide screen displays
- Windows XP SP2
Furthermore, the BiRG server, which is maintained by computer services, can now be accessed by every member of the lab using their university-wide computer services credentials. The server can be used to store presentations, documentation and research materials.
Aside from on campus access, BiRG members can use a PPTP VPN client (via Internet Connect on Mac machines or The New Connection wizard from the Network Connections control panel on Windows XP machines) to access the server from off campus. Each BiRGer has her or his own directory on the server.
Remote Desktop (RDP) access is also available (via VPN or on campus). However, only two users can be connected simultaneously. Be sure to logoff (via the Start Menu) of your RDP session when finished such that other users are not locked out.
BiRG Server:
\\se-cser-info
PPTP VPN Server:
vpn.ius.edu
By next week, the lab will likely be outfitted with several more amenities.
For next Friday's meeting, John Lannon will be presenting his fall research project. Shawn Haynes will be presenting his work on Bioethics. Both will be using this meeting to prepare for the IUS Undergradute Research Conference, which will be held April 17-18 in The Ogle Center.
Monday, February 18, 2008
Our Very own BiRGer -> Dr.Holly in the News
http://homepages.ius.edu/Horizon/021808P05.pdf
Friday, February 15, 2008
Meeting Minutes from 02/15/2008
Announcements:
* Next week (2/22) tour of Biology facilities (by John Norman), then to Sitar
* Following week (2/29) Tom will present his lab
Prof. Manwani's presentation on Ayurveda Informatics
ayur - life
veda - knowledge
Turmeric can be anti-septic, anti-cancer, anti-arthritis agent
presentation (Dr. Bhushan Patwardhan, University of Pune)
Overview:
Foundation of modern drugs:
* morphine from poppy
* cocaine from coca
All of these are derived from natural sources
Modern pharmacy industry experiencing innovation deficit
* Pre-clinical and clinical time increasing. Less effective
Ayurveda: discovery engine
* 300 well documented and used medicinal plants, sound philosphical, rational base
* Discovery and development process is reversed : starts with large scale trials (we all eat turmeric)
Ayusoft : Ayurvedic Data warehouse
* Composed of drug Database, disease database, treatment pricinples, diet/lifestyle db
* Language issues. India has 400+ languages. So, texts are scattered and not available in one single language
* Good to know Sankrit
AyuGenomics:
* Classifying human composition, broken down into 15 primary variables. The permutations of these variables makes each person a unique entity
Thursday, January 24, 2008
Venter Keeps Himself in The News: Synthetic Genome Created
http://www.wired.com/science/discoveries/news/2008/01/synthetic_genome
Wednesday, January 23, 2008
Free tuition at Harvard
Friday, January 11, 2008
Meeting Minutes from 11 January 2008
As of now, the weekly meeting is set for Fridays at 5pm in the BiRG Lab (LF264).
On Jan. 18, Tom House will present on the nature of his work with Dr. John Doyle: using protein-generated magnetic fields in electronic application.
On Jan. 25th, John Lannon will present the findings of his research project on Alu distribution and discuss its implications and derivative research ideas.
Starting next week, a couple of biology students will be joining the group.