Package translate :: Package lang :: Module common :: Class Common
[hide private]
[frames] | no frames]

Class Common

source code


This class is the common parent class for all language classes.

Instance Methods [hide private]
 
__deepcopy__(self, memo={}) source code
 
__repr__(self)
Give a simple string representation without address information to be able to store it in text for comparison later.
source code
 
length_difference(cls, len)
Returns an estimate to a likely change in length relative to an English string of length len.
source code
 
alter_length(cls, text)
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Methods [hide private]
 
punctranslate(cls, text)
Converts the punctuation in a string according to the rules of the language.
 
character_iter(cls, text)
Returns an iterator over the characters in text.
 
characters(cls, text)
Returns a list of characters in text.
 
word_iter(cls, text)
Returns an iterator over the words in text.
 
words(cls, text)
Returns a list of words in text.
 
sentence_iter(cls, text, strip=True)
Returns an iterator over the sentences in text.
 
sentences(cls, text, strip=True)
Returns a list of senteces in text.
 
capsstart(cls, text)
Determines whether the text starts with a capital letter.
Static Methods [hide private]
a new object with type S, a subtype of T
__new__(cls, code)
This returns the language class for the given code, following a singleton like approach (only one object per language).
Class Variables [hide private]
  code = ''
The ISO 639 language code, possibly with a country specifier or other modifier.
  fullname = ''
The full (English) name of this language.
  nplurals = 0
The number of plural forms of this language.
  pluralequation = '0'
The plural equation for selection of plural forms.
  listseperator = u', '
This string is used to seperate lists of textual elements.
  commonpunc = u'.,;:!?-@#$%^*_()[]{}/\'`"<>'
These punctuation marks are common in English and most languages that use latin script.
  quotes = u'‘’‛“”„‟′″‴‵‶‷‹›«»'
These are different quotation marks used by various languages.
  invertedpunc = u'¿¡'
Inveted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
  rtlpunc = u'،؟؛÷'
These punctuation marks are used by Arabic and Persian, for example.
  CJKpunc = u'。、,;!?「」『』【】'
These punctuation marks are used in certain circumstances with CJK languages.
  indicpunc = u'।॥॰'
These punctuation marks are used by several Indic languages.
  ethiopicpunc = u'።፤፣'
These punctuation marks are used by several Ethiopic languages.
  miscpunc = u'…±°¹²³·©®×£¥€'
The middle dot (·) is used by Greek and Georgian.
  punctuation = u'.,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡...
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation.
  sentenceend = u'.!?…։؟।。!?።'
These marks can indicate a sentence end.
  sentencere = re.compile(r'(?sx).*?[\.!\?\u2026\u0589\u061f\u09...
  puncdict = {}
A dictionary of punctuation transformation rules that can be used by punctranslate().
  ignoretests = []
List of pofilter tests for this language that must be ignored.
  checker = None
A language specific checker (see filters.checks).
  _languages = {}
  validaccel = None
Characters that can be used as accelerators (access keys) i.e.
  validdoublewords = []
Some languages allow double words in certain cases.
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__new__(cls, code)
Static Method

 

This returns the language class for the given code, following a singleton like approach (only one object per language).

Returns: a new object with type S, a subtype of T
Overrides: object.__new__

__repr__(self)
(Representation operator)

source code 

Give a simple string representation without address information to be able to store it in text for comparison later.

Overrides: object.__repr__

Class Variable Details [hide private]

code

The ISO 639 language code, possibly with a country specifier or other 
modifier.

Examples:
    km
    pt_BR
    sr_YU@Latn

Value:
''

fullname

The full (English) name of this language.

Dialect codes should have the form of 
  Khmer
  Portugese (Brazil)
  #TODO: sr_YU@Latn?

Value:
''

nplurals

The number of plural forms of this language.

0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6) Also see data.py

Value:
0

pluralequation

The plural equation for selection of plural forms.

This is used for PO files to fill into the header. See http://www.gnu.org/software/gettext/manual/html_node/gettext_150.html. Also see data.py

Value:
'0'

listseperator

This string is used to seperate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.

Value:
u', '

punctuation

We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won't need to override this.

Value:
u'.,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣\
…±°¹²³·©®×£¥€'

sentenceend

These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won't need to override this.

Value:
u'.!?…։؟।。!?።'

sentencere

Value:
re.compile(r'(?sx).*?[\.!\?\u2026\u0589\u061f\u0964\u3002\uff01\uff1f\\
u1362]\s+(?=[^a-z\d])')

checker

A language specific checker (see filters.checks).

This doesn't need to be supplied, but will be used if it exists.

Value:
None

validaccel

Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.

Value:
None

validdoublewords

Some languages allow double words in certain cases. This is a dictionary of such words.

Value:
[]