yank

 

Function

Reads a sequence range, appends the full USA to a list file

Description

yank is a simple utility to add a specified sequence name to a list file.

In fact, it writes out not just the name of the sequence, but also the start and end position of a region within that sequence and, if the sequence is nucleic, if can specify whether the sequence is the reverse complement.

Many EMBOSS programs can read in a set of sequences. (Some examples are emma and union) There are many ways of specifying these sequences, including wildcarded sequence file names, wildcarded database entry names and list files. List files (files of file names) are the most flexible. yank is a utility to add sequences to a list file.

Instead of containing the sequences themselves, a List file contains "references" to sequences - so, for example, you might include database entries, the names of files containing sequences, or even the names of other list files. For example, here's a valid list file, called seq.list:

unix % more seq.list 

opsd_abyko.fasta
sw:opsd_xenla
sw:opsd_c*
@another_list

This looks a bit odd, but it's really very straightforward; the file contains:

Notice the @ in front of the last entry. This is the way you tell EMBOSS that this file is a list file, not a regular sequence file.

Without the program yank you would need to use a text editor such as pico to create the appropriate list files. yank makes this process easy.

Usage

Here is a sample session with yank, adding the regions making up the coding sequence of embl:hsfau1 to the list 'cds.list':

% yank
Reads a range from a sequence, appends the full USA to a list file
Input sequence: em:hsfau1
     Begin at position [start]: 782
       End at position [end]: 856
        Reverse strand [N]:
Output file [hsfau1.yank]: cds.list

% yank
Reads a range from a sequence, appends the full USA to a list file
Input sequence: em:hsfau1
     Begin at position [start]: 951
       End at position [end]: 1095
        Reverse strand [N]:
Output file [hsfau1.yank]: cds.list

% yank
Reads a range from a sequence, appends the full USA to a list file
Input sequence: em:hsfau1
     Begin at position [start]: 1557
       End at position [end]: 1612
        Reverse strand [N]:
Output file [hsfau1.yank]: cds.list

% yank
Reads a range from a sequence, appends the full USA to a list file
Input sequence: em:hsfau1
     Begin at position [start]: 1787
       End at position [end]: 1912
        Reverse strand [N]:
Output file [hsfau1.yank]: cds.list

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -newfile            boolean    Overwrite existing output file

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.yank
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-newfile Overwrite existing output file Yes/No No

Input file format

Any valid sequence can be read in.

You will be prompted for the start and end positions you wish to use.

If the sequence is nucleic, you will be prompted whether you wish to use the reverse complement of the sequence.

Output file format

The result file 'cds.list' of the example above is:

em-id:HSFAU1[782:856]
em-id:HSFAU1[951:1095]
em-id:HSFAU1[1557:1612]
em-id:HSFAU1[1787:1912]     

This can now be read in by a program such as union by specifying the list file as '@cds.list' when union prompts for input.

Data files

None.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
biosedReplace or delete sequence sections
cutseqRemoves a specified section from a sequence
degapseqRemoves gap characters from sequences
descseqAlter the name or description of a sequence
entretReads and writes (returns) flatfile entries
extractfeatExtract features from a sequence
extractseqExtract regions from a sequence
listorWrites a list file of the logical OR of two sets of sequences
maskfeatMask off features of a sequence
maskseqMask off regions of a sequence
newseqType in a short new sequence
noreturnRemoves carriage return from ASCII files
notseqExcludes a set of sequences and writes out the remaining ones
nthseqWrites one sequence from a multiple set of sequences
pasteseqInsert one sequence into another
revseqReverse and complement a sequence
seqretReads and writes (returns) sequences
seqretsplitReads and writes (returns) sequences in individual files
skipseqReads and writes (returns) sequences, skipping the first few
splitterSplit a sequence into (overlapping) smaller sequences
swissparseRetrieves sequences from swissprot using keyword search
trimestTrim poly-A tails off EST sequences
trimseqTrim ambiguous bits off the ends of sequences
unionReads sequence fragments and builds one sequence
vectorstripStrips out DNA between a pair of vector sequences

The program extract does not make list files, but creates a sequence from sub-regions of a single other sequence.

Author(s)

This application was written by Peter Rice (peter.rice@uk.lionbioscience.com)

History

Written (March 2002) - Peter Rice.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments