considerations - hints - solutions
2-Protein sequence homology search
Hints:
-
The protein sequence is shown in the EMBL entry annotation. You may use
the EMBL entry as starting point to edit a protein sequence file that can
then be reformatted into GCG format (sequence in gcg
format and fasta format).
-
Dotplot the query sequence against itself and against a high-scoring blast
match in order to reveal its internal repeat structure.
-
Split query sequence into repeat regions and unique parts: recommended
partitioning: repeat region 1: pos. 1 - 313; repeat region 2: 314-502;
unique region: 643-1433. You may use the GCG program ASSEMBLE to transfer
these sequence regions to individual files.
-
Using the the bioccelerator server one should know that the substitution
matrices are scaled differently. For BLOSUM45 gap opening=15, and gap extension=2
may be reasonable choices.
-
The only way to perform Smith-Waterman searches locally is by first constructing
a profile from a single sequence and then doing a profilesearch against
the protein database. The GCG recipe follows:
profilemake -dat=genmoredata:blosum45.cmp -stringent <your querry sequence -outf=query.prf
tprofilesearch -nosixframe -noave -normalize -minscore=5.0 -list=100 -gap=3.0 -len=0.3 -batch query.prf SW:*
[BACK]
Answer:
The protein has the following internal repeat structure:
200 400 600 800 1000 1200 1400
' ' ' ' ' ' '
[-R1-][-R1-][R2][R2][-R1-][--------unique region---------]
Answers to questions:
-
Only the fibronectin type 3 repeats. The N-terminal repeats may be homologous
to a substrate-binding domain in SW:NANH_MICVI (Z-score=7).
-
At least three different types of domains: type 1 repeats, type 2 repeats
(=FN3 domains), and a C-terminal unique region.
-
The protein is probably not a chitinase homolog because the similarity
to chitinase is confined to the two FN3 domains, an extracellular module
found in many different types of eukaryotic and prokaryotic proteins.
[BACK]
3 DNA sequence comparisons
Hints:
-
Use LOOKUP to find EMBL sequences. Compare promoter regions against REPBASE:*
in order to find matches to common repetive elements.
-
Use dotplots to identify multiple copies of the same type of repetitive
elements.
Answer:
See output
files
The longest prometer regions are in EMBL entries HSENHREG1 (Z70243)
and MMIL2R5 (M16398).
The mouse IL2Ra promoter region contains B1 and MT elements. The human
IL2Ra promoter region contains Alu and Line-1 elements.
Answers to questions:
-
In the mouse promoter: -350 to transcription start, -1800 -1000.
-
Perhaps 25%.
-
The human Alu and mouse B1 elements are remotely related.
-
Available in EMBL are shorter upstream promoter regions from rat and from
bovine.
[BACK]