BiRG Lab Blog: 2008

Monday, December 22, 2008

BiRG Christmas Cake

Our BiRGer 'Mark' surprised me today when he walked into the BiRG lab with a big cake. Here are the photos of the cake and the BiRG lab.

Thank you very much for the cake! A very Merry Christmas and a very Happy New Year to you all.

Sunday, August 31, 2008

IU School of Informatics named to Computerworld's top IT schools to watch

BLOOMINGTON, Ind. -- Computerworld magazine announced in its August 2008 issue its "Top IT Schools to Watch 2008," and the Indiana University School of Informatics was among the 10 schools recognized in a feature article on graduate programs.

The schools, including institutions such as Carnegie Mellon University, Stanford University, the University of Pennsylvania, and the University of Virginia, were selected based on how well they were keeping pace with today's IT workplace, and the relevance of their curriculum to the ever-changing technology industry.

IU's School of Informatics was touted for not only providing students with real-world experience, but for its interdisciplinary approach to the field and for the responsiveness of its faculty and the students.

The list was compiled by a panel of more than two dozen IT executives, hiring managers, recruiters and academics who were asked to help identify the country's leading-edge schools for IT workers seeking to advance their careers. They considered graduate-level IT programs and schools that give graduates the best value in terms of salary increases or promotions vs. cost of tuition, and that best gear their curriculum to the everyday demands of today's IT workplace.

From the IT schools selected by the panel, Computerworld editors chose the innovative IT schools to profile. Finally, Computerworld partnered with Dice.com to survey alumni at the schools, asking for feedback on their satisfaction with their schools' program.

"We are honored to be part of Computerworld's list for 2008," said Bobby Schnabel, dean of the School. "It is gratifying to be in the company of schools that have long been considered at the top of the computing field, and to gain recognition for our still young set of graduate programs in informatics."

The complete story can be found in the August issue of ComputerWorld magazine and online at www.computerworld.com.

Founded in 2000 as the first school of its kind in the United States, the Indiana University School of Informatics is dedicated to research and teaching across a broad range of computing and information technology, with emphases on science, applications, and societal implications. The school includes the Departments of Computer Science and Informatics on the Bloomington campus and Informatics on the IUPUI campus.

The school administers a variety of bachelor's and master's degree programs in computer science and informatics, as well as Ph.D. programs in computer science and the first-ever doctorate in informatics. The School is dedicated to excellence in education and research, to partnerships that bolster economic development and entrepreneurship, and to increasing opportunities for women and underrepresented minorities in computing and technology. For more information, visit www.informatics.indiana.edu.

Thursday, August 28, 2008

Re: Speed Museum Membership

Note that IU Southeast is an institutional member of the Speed Art Museum in Louisville for 2008. This membership entitles faculty members, staff, and students to free admission to the museum's permanent art collection, special traveling exhibitions, AfterHours events, and selected lectures and concerts. You must show your IU Southeast ID card to take advantage of this membership.

Friday, July 25, 2008

BiRG minutes: 7-18-2008

Comparing Two Sequences

Objectives:

• Get the basics about dot plots

• Know how to interpret the most common patterns in a dot plot

• Use Dotlet

• Use Lalign to extract local alignments

Why compare two?

• Database searches are useful for finding homologues

• Database searches don't provide precise comparisons

• More precise tools are needed to analyze the sequences in detail including

– Dot plots for graphic analysis

– Local or global alignments for residue/residue analysis

• The alignment of two sequences is called a pairwise alignment

Dot Plot:

• A dot plot is a graphic representation of pairwise similarity

• The simplicity of dot plots prevents artifacts

• Ideal for looking for features that may come in different orders

• Reveal complex patterns

• Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain

Choosing your two sequences:

• Making pairwise comparisons takes time

• Use BLAST to rapidly select your sequences

– More than 70% identity for DNA

– More than 25% identity for proteins

• If your sequences are too similar, comparing them yields no useful information

What can you analyze with Dot Plot?

• Any pair of sequences

– DNA

– Proteins

– RNA

• DNA with proteins

– Dotlet is an appropriate tool

– To compare full genomes, install the program locally

• Sequences longer than 1000 symbols are hard to analyze online

• Divergent sequences where only a segment is homologous

• Long insertions and deletions

• Tandem repeats

The square shape of the pattern is characteristic of these repeats

Using Dotlet:

• Dotlet is one of the handiest tools for making dot plots

• Dotlet is a Java applet

• Open and download the applet at the following site:

– www.isrec.isb-sib.ch/java/dotlet

• Dotlet slides a window along each sequence

• If the windows are more similar than the threshold, Dotlet prints a dot at their intersection

• You can control the similarity threshold with the little window on the left

• Every dot has a score given by the window comparison

• When the score is

– Below threshold 1 ó black dot

– Between thresholds 1 and 2 ó grey dot

– Above threshold 2 ó white dot

• The blue curve is the distribution of scores in the sequences

• The peak ó most common score,

– Most common ó less informative

• Window size and the stringency control the aspect of your dot plot

– Very stringent = clean dot plot, little signal

– Not stringent enough = noisy dot plot, too much signal

• Play with the threshold until a usable signal appears

• The square shape is typical of tandem repeats

• The repeats are not perfect because the sequences have diverged after their duplication

Comparing a Gene and its Product:

• Eukaryotic genes are transcribed into RNA

• The RNA is then spliced to remove the introns' sequences

• It may be necessary to compare the gene and its product

• Dotlet makes this comparative analysis easy

Aligning Sequences:

• Dotlet dot plots are a good way to provide an overview

• Dot plots don't provide residue/residue analysis

• For this analysis you need an alignment

• The most convenient tool for making precise local alignments is Lalign

Lalign and BLAST:

• Lalign is like a very precise BLAST

• It works on only two sequences at a time

• You must provide both sequences

Going Farther:

• If you need to align coding DNA with a protein, try these sites:

– www.tcoffee.org => protogene

– coot.embl.de/pal2nal

• If you need to align very large sequences, try this site:

– www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

• If you need a precise estimate of your alignment's statistical significance, use PRSS

– The program is available at fasta.bioch.virginia.edu

– Low E-value ó good alignment

BiRG Minutes : June 11, 2008

Analyzing Protein Sequences

In-silico biochemistry

Sliding-windows techniques – most ancient way of looking at sequences

-used if the strand of DNA was cut in the middle

-ND THE WAS where the A was cut off

NDT HEW AS

DTH EWA S

THE WAS

-Use past experiences and what proteins have been together in the past

-Hydrophobicity is the most popular analysis – a good indicator of transmembrane segments or core regions within a protein.

Predicting transmembrane domains

ProtScale allows one to compute and represent the profile produced by any amino acid scale on a selected protein.

amino acid scale is defined by a numerical value assigned to each type of amino acid.

THMM Transmembrane Helix Prediction is a method for predicting transmembrane helices based on a hidden Markov Model (HMM)

HMM - a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. The extracted model parameters can then be used to perform further analysis, for example for pattern recognition applications.

THMM creates a prediction, or what it should have been

ProtScale has parameters and shows what it is

Predicting post-translational modifications w/PROSITE

Proteins get modified between the cell and getting read

PROSITE motifs are written as patterns

– Short patterns are not very informative by themselves

– They only indicate a possibility

– Combine them with other information to draw a conclusion

NOT EVERYTHING IS IN PROSITE

Interpreting PROSITE patterns

Some patterns may suggest nonexistent protein features

Short patterns are more informative if they are conserved across homologous sequences

Domains is defined as " independent globular folding units". It is a portion of protein that can keep its shape if you remove it from the rest of the protein. It consists of at least 50 amino acids. - Domains are like the various components of our kitchen – such as the oven, the microwave, the refrigerator, etc. All together they constitute the complete kitchen, but they can also exist separately. You only need to use microwave when making pop corn - which can be done outside the kitchen.

An average protein consists of 2 or 3 domains. Usually each domain plays a specific role in the function of the protein. It may interact with other proteins, or bind ion like calcium or zinc, or it may contain an active site. It is common to have a catalytic domain associated with a binding domain and a regulatory domain. Imagine - a toaster, where you have the grill [catalytic], the toast holder [binding], and the switch [regulation].

Domains are like independent functions that can be taken out of a program but still function

Researchers

A domain is a multi-sequence alignment similar to a puzzle

Using Domain collections

Scientists have been discovering and characterizing protein domains for more than 20 years

Manual collections are precise but small; where the researchers must document everything on their own

Automatic collections go out and find data in research documents etc

It is probably that only one of these servers will have the information to help you understand your protein

Friday, June 27, 2008

Meeting Minutes: 27th June 2007

We welcomed our newest BiRG member "Kimberly Holmes"
The discussion today was about Working with a single DNA sequence : the slides are available in the Shared Folder
David began his presentation with the statement "Not everything that can be counted counts and not everything that counts can be counted"

Topics covered in detail included

- PCR
- Cloning
- Cleaning a DNA sequence (removing contamination)
- Restriction Map
- FIRTSMARKET website (use Netscape browser)
- Primer Design (MIT Website)
- Emboss
- Gene Prediction [ prokaryotes -> GeneMark ]
- Gene Prediction [ eukaryotes -> Genomescan ]
- BLAST
- Shotgun sequencing/contigs/assemble/homology
- read pieces
- PHRAP assembly program
- CAP3 program for joining small sequences

We have a nice discussion with Dr. Ramachandran explaining several concepts during the presentation. We also discussed about gene expression and Hidden Markov Models

Lastly, we do not meet next week - Independence Day - July 4

Friday, June 13, 2008

BiRG MINUTES June 13, 2008

Protein and specialized Sequence Databases

Types of Organisms: Prokaryotic, Eukaryotic, and Archea

Protein Maturation

Deciphering a Swiss-Prot entry

Specialized protein databases: KEGG (the metabolic pathways database) or PDB (structure database)

2 ways to predict genetics

1. Genes to proteins or translation (genomics)

2. DNA

We must merge the two

From Gene to functional Protein

DNA > mRNA > proteins > upon maturing > transportation > destination

Protein Maturation:

-removal of some fragments

-specific protein cleavage

-chemical modifications

-Phosphorylation (addition of phosphate that gives the protein its shape)

-adition of lipids or sugars (glycosylation)

-Proteins are often modified to make them active

www.ebi.ac.uk/RESID

-Modification can imply attaching a lipid or a sugar

www.glycosuite.com

-Use these resources to determine the details of the modification

www.lipidbank.jp

Swiss-Prot Database – (British) entries describe all proteins that have known functions

tremble contains the 4 mill putative proteins found in GenBank

Swiss-Prot contains the subset of tremble with a known function

This is redundant to create many databases using the same information

A Swiss-Prot entry: www.expasy.org/uniprot/P00533

Gen Info (accession number), References, Commments, Cross-reference, feature table, sequence

General Information: Entry Name, Primary Accession Number (PXXXX [P is for protein]), Last Modified, Protein name and synonyms, from/taxonomy fields (tells where protein came from), references section

Comments section lists all the known functions of the protein

Features Section localizes precisely every known function of your protein, each on its sequence

• TRANSMEM: Transmembrane domain (something that passes through the membrane)

• ACT_SITE: Active sites (where chemicals can bond)

• BINDING: Binding sites

• DISULPHID: Bridge of cysteines

• EMBL: GenBank original DNA sequence

• PDB: Experimental structure of your protein

• DIP: Proteins interacting with your protein

• GlycoSuiteDB: Glycolsylations

• MIM: List of genetic diseases involving your protein

• Ontologies: Function of your protein

• Profiles: Known protein domains in your protein

• ENSEMBL: Genomic location of your protein

By alternative splicing, the protein can have MANY functions

• To find out about the function of your protein, you will need to determine

– Where your protein works

– Metabolic pathway in which the protein is involved

– The protein's 3D structure

– Which protein family it belongs to

Where do proteins work?

Part of the metabolic pathway

Chain of production linking several different proteins

Modify metabolites by passing them from one enzyme to the next

On KEGG pathway, each enzyme appears w/its EC number

• www.genome.ad.jp/kegg

– KEGG is the most extensive database of metabolic pathways

– You can use it to compare species Japan

• www.chem.qmul.ac.uk/iubmb

– The IUBMD assigns the EC numbers used to describe an enzyme activity UK

• www.ecocy.org

– An exhaustive list of all known metabolic pathways in E. coli and other bacteria

Some important Protein Families

• www.kinasenet.org

– Kinases control everything in us; their deregulation is the cause of many cancers

• imgt.cines.fr

– Immunoglobulins are key elements of our natural defenses

• rebase.neb.com

– This site is a key resource on restriction enzymes

Predicting protein function is a central goal in biology

• Protein databases help organize knowledge

• They provide the material for

– Developing new biological experiments

– Developing new prediction algorithms

– Extrapolating experimental data to unknown sequences

Friday, June 6, 2008

June 6, 2008 MINUTES by William Apple

We discussed Nucleotide Sequence Databases

Distinguishing structure of eukaryotic and prokaryotic proteins
Eukaryotic DNA in nucleus, nuclear DNA
mRNA-photocopy of DNA that is carried to Ribosomes
alternative splicing-introns "comment" out segments of exons (information) to generate many types of proteins
Eukaryotes (1 gene/1Kb) are very complex compared to Prokaryotes (1 gene/100Kb)
GenBank – housed by national Center of Biotechnoloies
memory of biological science; many biologists send genes they find to keep the database up-to-date
Reading a Prokaryotic GenBank entry (p75)
    ACESSION is the accession number
    LOCUS contains information on gene size
    ORGANISM defines the organism containing the gene
    REFERNECE indicates who produces the gene
    FEATURES
Reading a Eukaryotic GenBank entry
Gene-centric databases: pieces genes together to work with; uses GenBank data; Entrez Gene genome
ENSEMBLE visualization of human chromosomes – can click and zoom on various parts of a chromosome
TIGR Institute
DoE Joint Genome Institute
University of California – good alternative to ENSEMBL; is a mirror site

We had many good discussions. The meeting was very productive.

Minutes from 30 MAY 08

- David brought along a very informative video on Bioinformatics
- The video is a "must see" and is on the SHARED Folder of the BiRG server.
- David continued to discuss about the different avenues with BioInformatics research directions
- William raised a few interesting questions, Dr. Ramachandran explained the concepts about introns and exons with genes.

- John Lannon submitted his digital version of the paper titled "Investigating Alu Distribution by Family across Human Chromosome Sequences" for inclusion in the Undergraduate Research Journal.
This work had won the Chancellor's Award for Interdisciplinary Achievement in this year's Celebrating Success Student Conference held on April 17-18, 2008

23 MAY 08 - Meeting Minutes

- The BiRG lab welcomed another new member William Apple.
- William is our new Undergraduate Research Assistant working on Flash Animations and Social Informatics.
- David continued with his presentations about Bioinformatics and Proteomics
- BiRG accounts were set-up for William and David
- The Lab had coffee and chocolates again for the BiRGers

Meeting Minutes: 05/16/2008

- The BiRG lab welcomed our new member David Olayemi. David was awarded the Undergraduate Research Fellowship Award for Summer 2008.
- The BiRG lab celebrated the the Success of fellow BiRGer John Lannon with coffee, soda and chocolates.
- David started his tenure with the BiRG Lab with a very informative discussion on Bioinformatics.( His presentation slides are available on the BiRG Server in the Presentation Folder)

Very interesting discussion followed - with Prof Manwani sharing with us his insights on Karma and DNA.
Theresa and Shawn also kept the discussion active. At the end Dr. Ramachandran informed the lab about available resources and one-one-one appointment time available for scheduling.

Friday, April 25, 2008

Congratulations to John Lannon

Please join me in congratulating our very own BiRGer John Lannon on being awarded the Chancellor's Award for Interdisciplinary achievement at the Indiana University Southeast's Conference " Celebrating Achievement"

Chancellor's Award for Interdisciplinary achievement

2008

Investigating Alu Distribution by Family Across Human Chromosome Sequences

John Lannon

Informatics

Sridhar Ramachandran

Excellent work John!! You make us proud. We have free coffee in the lab today in celebrating this achievement :).

Friday, April 4, 2008

Meeting Notes from 04/04/2008

In today's BiRG lab meeting, Shawn introduced the lab to the newly available machines and computer services.

The four new machines have the following specs:

Dell XPS 420
Intel Core2 Quad (Q6600) CPU running @ 2.40 GHz
3.00 GB of RAM
22" wide screen displays
Windows XP SP2

Furthermore, the BiRG server, which is maintained by computer services, can now be accessed by every member of the lab using their university-wide computer services credentials. The server can be used to store presentations, documentation and research materials.

Aside from on campus access, BiRG members can use a PPTP VPN client (via Internet Connect on Mac machines or The New Connection wizard from the Network Connections control panel on Windows XP machines) to access the server from off campus. Each BiRGer has her or his own directory on the server.

Remote Desktop (RDP) access is also available (via VPN or on campus). However, only two users can be connected simultaneously. Be sure to logoff (via the Start Menu) of your RDP session when finished such that other users are not locked out.

BiRG Server:
\\se-cser-info

PPTP VPN Server:
vpn.ius.edu

By next week, the lab will likely be outfitted with several more amenities.

For next Friday's meeting, John Lannon will be presenting his fall research project. Shawn Haynes will be presenting his work on Bioethics. Both will be using this meeting to prepare for the IUS Undergradute Research Conference, which will be held April 17-18 in The Ogle Center.

Monday, February 18, 2008

Our Very own BiRGer -> Dr.Holly in the News

Check out the link below (Scroll down on the page to read about Dr. Hollingsworth)

http://homepages.ius.edu/Horizon/021808P05.pdf

Friday, February 15, 2008

Meeting Minutes from 02/15/2008

Feb 15 BiRG Meeting:

Announcements:
* Next week (2/22) tour of Biology facilities (by John Norman), then to Sitar
* Following week (2/29) Tom will present his lab

Prof. Manwani's presentation on Ayurveda Informatics
ayur - life
veda - knowledge
Turmeric can be anti-septic, anti-cancer, anti-arthritis agent

presentation (Dr. Bhushan Patwardhan, University of Pune)
Overview:
Foundation of modern drugs:
* morphine from poppy
* cocaine from coca
All of these are derived from natural sources

Modern pharmacy industry experiencing innovation deficit
* Pre-clinical and clinical time increasing. Less effective

Ayurveda: discovery engine
* 300 well documented and used medicinal plants, sound philosphical, rational base
* Discovery and development process is reversed : starts with large scale trials (we all eat turmeric)

Ayusoft : Ayurvedic Data warehouse
* Composed of drug Database, disease database, treatment pricinples, diet/lifestyle db
* Language issues. India has 400+ languages. So, texts are scattered and not available in one single language
* Good to know Sankrit

AyuGenomics:
* Classifying human composition, broken down into 15 primary variables. The permutations of these variables makes each person a unique entity

Thursday, January 24, 2008

Venter Keeps Himself in The News: Synthetic Genome Created

Wired article detailing the Venter Institute's creation of a long string of synthetic DNA. Next step: synthetic organisms!

http://www.wired.com/science/discoveries/news/2008/01/synthetic_genome

Wednesday, January 23, 2008

Free tuition at Harvard

FYI BiRGers

Harvard University announced over the weekend that from now on undergraduate students from low-income families will pay no tuition. In making the announcement, Harvard's president Lawrence H. Summers said, 'When only 10 percent of the students in Elite higher education come from families in lower half of the income distribution, we are not doing enough. We are not doing enough in bringing elite higher education to the lower half of the income distribution.'

If you know of a family earning less than $60,000 a year with an honor student graduating from high school soon, Harvard University wants to pay the tuition. The prestigious university recently announced that from now on undergraduate students from low income families can go to Harvard for free...no tuition and no student loans!

To find out more about Harvard offering free tuition for families making less than $60,000 a year visit Harvard's financial aid website at: http://www.fao.fas.harvard.edu or call the school's financial aid office at (617) 495-1581.

Friday, January 11, 2008

Meeting Minutes from 11 January 2008

For this semester, each BiRGer will select and present a research article in her or his informatics area of interest.

As of now, the weekly meeting is set for Fridays at 5pm in the BiRG Lab (LF264).

On Jan. 18, Tom House will present on the nature of his work with Dr. John Doyle: using protein-generated magnetic fields in electronic application.

On Jan. 25th, John Lannon will present the findings of his research project on Alu distribution and discuss its implications and derivative research ideas.

Starting next week, a couple of biology students will be joining the group.

BiRG Lab Blog