|Foreground data set - Use the file upload button to select a file containing the sequences, or use
the text box to paste your sequences for motif extraction. For whole protein motif extraction, the sequences should be in FASTA format.
If the sequences are pre-aligned on a central residue or for mass spectrometry based tryptic output, carriage returns should separate entries.
Foreground format - Select the format of the foreground data set. FASTA refers to whole protein analysis of sequences in FASTA format.
In FASTA mode, motif-x will center the analysis on the appropriate residue. In pre-aligned mode, motif-x assumes that the residues have
already been aligned on a central residue. Text refers to a linguistic data set. In text mode, motif-x will remove
all spaces and punctuation between words. MS/MS refers to short peptides (usually tryptic) which have been generated by
tandem mass spectrometry. Note that in MS/MS mode, motif-x will extend the peptides (based on the selected organism) by
an appropriate amount such that a complete analysis at a given width could be performed.
- Central character - Since motif-x creates a pseudo alignment of entered data, the algorithm requires a central residue on which to align
the data. For example, for phosphorylation motif analysis generated from MS/MS data, center on phosphorylated S, T or Y residues. A central residue
may consist of two characters, for example, S* may be used to denote modified serine residues. Note, this field is case sensitive.
- Width - The width is the number of total characters is the motif (the sum of wildcard and fixed characters). It should be an odd number
between 3 and 35.
- Occurrences - The occurrence threshold refers to the minimum number of times you wish each of your extracted motifs to occur in the data
set. An occurrence threshold of 20 usually is appropriate, although this parameter may be adjusted to yield more specific or less specific motifs.
- Significance - The significance refers to the P-value threshold for the binomial probability. This is used for the selection of
significant residue/position pairs in the motif. We suggest a threshold of 0.000001 to maintain a low false positive rate in standard protein
- Background - The background simply refers to the organism from which the data set was taken. This is important for accurate statistical
analysis. You may choose one of our predefined organism backgrounds, use the unaligned motif data set, or upload your own in the advanced options section.
- Upload background - Use this option if you wish to upload your own background to be used for statistical calculations.
- Background format - Used to indicate the format of the background data set.
- Extend organism - If you have uploaded MS/MS based data as a background, use this option to suggest an organism from which to extend the
- Background central residue - The residue on which to center the background data set (should almost always be the same as the foreground