This class is the common parent class for all language classes.
|
|
|
__repr__(self)
Give a simple string representation without address information to be
able to store it in text for comparison later. |
source code
|
|
|
length_difference(cls,
len)
Returns an estimate to a likely change in length relative to an
English string of length len. |
source code
|
|
|
alter_length(cls,
text)
Converts the given string by adding or removing characters as an
estimation of translation length (with English assumed as source
language). |
source code
|
|
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__init__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__str__
|
|
punctranslate(cls,
text)
Converts the punctuation in a string according to the rules of the
language. |
|
|
|
character_iter(cls,
text)
Returns an iterator over the characters in text. |
|
|
|
characters(cls,
text)
Returns a list of characters in text. |
|
|
|
word_iter(cls,
text)
Returns an iterator over the words in text. |
|
|
|
words(cls,
text)
Returns a list of words in text. |
|
|
|
sentence_iter(cls,
text,
strip=True)
Returns an iterator over the sentences in text. |
|
|
|
sentences(cls,
text,
strip=True)
Returns a list of senteces in text. |
|
|
|
capsstart(cls,
text)
Determines whether the text starts with a capital letter. |
|
|
|
code = '
'
The ISO 639 language code, possibly with a country specifier or other
modifier.
|
|
fullname = '
'
The full (English) name of this language.
|
|
nplurals = 0
The number of plural forms of this language.
|
|
pluralequation = ' 0 '
The plural equation for selection of plural forms.
|
|
listseperator = u' , '
This string is used to seperate lists of textual elements.
|
|
commonpunc = u' .,;:!?-@#$%^*_()[]{}/\'`"<> '
These punctuation marks are common in English and most languages that
use latin script.
|
|
quotes = u' ‘’‛“”„‟′″‴‵‶‷‹›«» '
These are different quotation marks used by various languages.
|
|
invertedpunc = u' ¿¡ '
Inveted punctuation sometimes used at the beginning of sentences in
Spanish, Asturian, Galician, and Catalan.
|
|
rtlpunc = u' ،؟؛÷ '
These punctuation marks are used by Arabic and Persian, for example.
|
|
CJKpunc = u' 。、,;!?「」『』【】 '
These punctuation marks are used in certain circumstances with CJK
languages.
|
|
indicpunc = u' ।॥॰ '
These punctuation marks are used by several Indic languages.
|
|
ethiopicpunc = u' ።፤፣ '
These punctuation marks are used by several Ethiopic languages.
|
|
miscpunc = u' …±°¹²³·©®×£¥€ '
The middle dot (·) is used by Greek and Georgian.
|
|
punctuation = u' .,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡ ...
We include many types of punctuation here, simply since this is only
meant to determine if something is punctuation.
|
|
sentenceend = u' .!?…։؟।。!?። '
These marks can indicate a sentence end.
|
|
sentencere = re.compile(r'(?sx) .*? [ \.!\?\u2026\u0589\u061f\u09...
|
|
puncdict = { }
A dictionary of punctuation transformation rules that can be used by
punctranslate().
|
|
ignoretests = [ ]
List of pofilter tests for this language that must be ignored.
|
|
checker = None
A language specific checker (see filters.checks).
|
|
_languages = { }
|
|
validaccel = None
Characters that can be used as accelerators (access keys) i.e.
|
|
validdoublewords = [ ]
Some languages allow double words in certain cases.
|