![]() |
megamerger |
The sequences can be very long. The program does a match of all sequence words of size 20 (by default). It then reduces this to the minimum set of overlapping matches by sorting the matches in order of size (largest size first) and then for each such match it removes any smaller matches that overlap. The result is a set of the longest ungapped alignments between the two sequences that do not overlap with each other. If the two sequences are identical in their region of overlap then there will be one region of match and no mismatches.
It should be possible to merge sequences that are Mega bytes long. Compare this with the program merger which does a more accurate alignment of more divergent sequences using the Needle and Wunsch algorithm but which uses much more memory.
The sequences should ideally be identical in their region of overlap. If there are any mismatches between the two sequences then megamerger will still attempt to create a merged sequence, but you should check that this is what you required.
A report of the actions of megamerger is written out. Any actions that require a choice between using regions of the two sequences where they have a mismatch is marked with the word WARNING!. The sequence in these regions is written out in uppercase. All other regions of the output sequence are written in lowercase.
Where there is a mismatch then the sequence that is chosen to supply the region of the mismatch in the final merged sequence is that sequence whose mismatch region is furthest from the start of end of the sequence.
% megamerger embl:ap000504 embl:af129756 Merge two large overlapping nucleic acid sequences Word size [20]: Output sequence [ap000504.merged]: Output file [ap000504.megamerger]:
Mandatory qualifiers: [-seqa] sequence Sequence USA [-seqb] sequence Sequence USA -wordsize integer Word size [-outseq] seqout Output sequence USA [-report] outfile Output report Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-seqa] (Parameter 1) |
Sequence USA | Readable sequence | Required |
[-seqb] (Parameter 2) |
Sequence USA | Readable sequence | Required |
-wordsize | Word size | Integer 2 or more | 20 |
[-outseq] (Parameter 3) |
Output sequence USA | Writeable sequence | <sequence>.format |
[-report] (Parameter 4) |
Output report | Output file | <sequence>.megamerger |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
Where there has been a mismatch between the two sequences, the merged sequence is written out in uppercase and the sequence whose mismatch region is furthest from the edges of the sequence is used in the merged sequence.
The name and description of the first input sequence is used for the name and description of the output sequence. A report of the merger is written out.
A typical report where there are many mismatches, taken from the example above follows:
# Report of megamerger of: AP000504 and AF129756 AP000504 overlap starts at 1 AF129756 overlap starts at 6036 Using AF129756 1-6035 as the initial sequence Matching region AP000504 1-846 : AF129756 6036-6881 Length of match: 846 WARNING! Mismatch region found: Mismatch AP000504 847-847 Mismatch AF129756 6882-6882 Mismatch is closer to the ends of AP000504, so use AF129756 in the merged sequence Matching region AP000504 848-1794 : AF129756 6883-7829 Length of match: 947 WARNING! Mismatch region found: Mismatch AP000504 1795-1795 Mismatch AF129756 7830-7830 Mismatch is closer to the ends of AP000504, so use AF129756 in the merged sequence Matching region AP000504 1796-2272 : AF129756 7831-8307 Length of match: 477 [many lines removed for brevity] Matching region AP000504 97717-97826 : AF129756 103745-103854 Length of match: 110 WARNING! Mismatch region found: Mismatch AP000504 97827-97827 Mismatch AF129756 103855-103855 Mismatch is closer to the ends of AP000504, so use AF129756 in the merged sequence Matching region AP000504 97828-100000 : AF129756 103856-106028 Length of match: 2173 AP000504 overlap ends at 100000 AF129756 overlap ends at 106028 Using AF129756 106029-184666 as the final sequence
Program name | Description |
---|---|
cons | Creates a consensus from multiple alignments |
merger | Merge two overlapping nucleic acid sequences |
Compare this with the program merger which does a more accurate alignment of more divergent sequences using the Needle and Wunsch algorithm but which uses much more memory.
A graphical dotplot of the matches used in this merge can be displayed using the program dotpath.