c++-gtk-utils
Classes | Functions
Cgu::Utf8 Namespace Reference

This namespace contains utilities relevant to the use of UTF-8 in programs. More...

Classes

class  ConversionError
class  Iterator
 A class which will iterate through a std::string object by reference to unicode characters rather than by bytes. More...
class  ReverseIterator
 A class which will iterate in reverse through a std::string object by reference to unicode characters rather than by bytes. More...
class  Reassembler
 A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters. More...

Functions

std::wstring uniwide_from_utf8 (const std::string &input)
std::string uniwide_to_utf8 (const std::wstring &input)
std::u32string utf32_from_utf8 (const std::string &input)
std::string utf32_to_utf8 (const std::u32string &input)
std::u16string utf16_from_utf8 (const std::string &input)
std::string utf16_to_utf8 (const std::u16string &input)
std::wstring wide_from_utf8 (const std::string &input)
std::string wide_to_utf8 (const std::wstring &input)
std::string filename_from_utf8 (const std::string &input)
std::string filename_to_utf8 (const std::string &input)
std::string locale_from_utf8 (const std::string &input)
std::string locale_to_utf8 (const std::string &input)
bool validate (const std::string &text)
bool operator== (const Iterator &iter1, const Iterator &iter2)
bool operator!= (const Iterator &iter1, const Iterator &iter2)
bool operator< (const Iterator &iter1, const Iterator &iter2)
bool operator<= (const Iterator &iter1, const Iterator &iter2)
bool operator> (const Iterator &iter1, const Iterator &iter2)
bool operator>= (const Iterator &iter1, const Iterator &iter2)
bool operator== (const ReverseIterator &iter1, const ReverseIterator &iter2)
bool operator!= (const ReverseIterator &iter1, const ReverseIterator &iter2)
bool operator< (const ReverseIterator &iter1, const ReverseIterator &iter2)
bool operator<= (const ReverseIterator &iter1, const ReverseIterator &iter2)
bool operator> (const ReverseIterator &iter1, const ReverseIterator &iter2)
bool operator>= (const ReverseIterator &iter1, const ReverseIterator &iter2)

Detailed Description

This namespace contains utilities relevant to the use of UTF-8 in programs.

#include <c++-gtk-utils/convert.h> (for conversion and validation functions)

#include <c++-gtk-utils/reassembler.h> (for Reassembler class)

See also:
convert.h reassembler.h

This namespace contains utilities relevant to the use of UTF-8 in programs. If you want these functions to work, you will generally have needed to have set the locale in the relevant program with either std::locale::global(std::locale("")) (from the C++ standard library) or setlocale(LC_ALL,"") (from the C standard library).


Function Documentation

std::string Cgu::Utf8::filename_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's filename encoding.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to filename encoding.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to filename encoding (eg because the input characters cannot be represented by that encoding).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
Note:
glib takes the system's filename encoding from the environmental variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. If G_BROKEN_FILENAMES is set to 1 and G_FILENAME_ENCODING is not set, it will be assumed that the filename encoding is the same as the locale encoding. If G_FILENAME_ENCODING is set, then G_BROKEN_FILENAMES is ignored, and filename encoding is taken from the value held by G_FILENAME_ENCODING.
std::string Cgu::Utf8::filename_to_utf8 ( const std::string &  input)

Converts text from the system's filename encoding to UTF-8.

Parameters:
inputText in valid filename encoding.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid filename encoding.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
Note:
glib takes the system's filename encoding from the environmental variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. If G_BROKEN_FILENAMES is set to 1 and G_FILENAME_ENCODING is not set, it will be assumed that the filename encoding is the same as the locale encoding. If G_FILENAME_ENCODING is set, then G_BROKEN_FILENAMES is ignored, and filename encoding is taken from the value held by G_FILENAME_ENCODING.
std::string Cgu::Utf8::locale_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's locale encoding.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to locale encoding.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to locale encoding (eg because the input characters cannot be represented by that encoding).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::locale_to_utf8 ( const std::string &  input)

Converts text from the system's locale encoding to UTF-8.

Parameters:
inputText in valid locale encoding.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid locale encoding.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
bool Cgu::Utf8::operator!= ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator!= ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator< ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator< ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator<= ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator<= ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator== ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator== ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator> ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator> ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator>= ( const Iterator &  iter1,
const Iterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator>= ( const ReverseIterator &  iter1,
const ReverseIterator &  iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

std::wstring Cgu::Utf8::uniwide_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's Unicode wide character representation, which will be UTF-32/UCS-4 for systems with a wide character size of 4 (almost all unix-like systems), and UTF-16 for systems with a wide character size of 2.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to UTF-32 or UTF-16.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::uniwide_to_utf8 ( const std::wstring &  input)

Converts text from the system's Unicode wide character representation, which will be UTF-32/UCS-4 for systems with a wide character size of 4 (almost all unix-like systems) and UTF-16 for systems with a wide character size of 2, to narrow character UTF-8 format.

Parameters:
inputText in valid UTF-32 or UTF-16 format.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-32/UCS-4 or UTF-16 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::u16string Cgu::Utf8::utf16_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to UTF-16.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to UTF-16.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::utf16_to_utf8 ( const std::u16string &  input)

Converts text from UFF-16 to narrow character UTF-8 format.

Parameters:
inputText in valid UTF-16 format.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-16 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::u32string Cgu::Utf8::utf32_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to UTF-32/USC-4.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to UTF-32.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::utf32_to_utf8 ( const std::u32string &  input)

Converts text from UFF-32/UTF4 to narrow character UTF-8 format.

Parameters:
inputText in valid UTF-32 format.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-32/UCS-4 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
bool Cgu::Utf8::validate ( const std::string &  text)
inline

Indicates whether the input text comprises valid UTF-8.

Parameters:
textThe text to be tested.
Returns:
true if the input text is in valid UTF-8 format, otherwise false.
Exceptions:
std::bad_allocThis function might throw std::bad_alloc if std::string::data() might throw when memory is exhausted.
Note:
#include <c++-gtk-utils/convert.h> for this function.
std::wstring Cgu::Utf8::wide_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's wide character locale representation. For this function to work correctly, the system's installed iconv() must support conversion to a generic wchar_t target, but in POSIX whether it does so is implementation defined (GNU's C library implemention does). For most unix-like systems the wide character representation will be Unicode (UCS-4/UTF-32 or UTF-16), and where that is the case use the uniwide_from_utf8() function instead, which will not rely on the generic target being available.

Parameters:
inputText in valid UTF-8 format.
Returns:
The input text converted to the system's wide character locale representation.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to the system's wide character locale representation (eg because the input characters cannot be represented by that encoding, or the system's installed iconv() function does not support conversion to a generic wchar_t target).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::wide_to_utf8 ( const std::wstring &  input)

Converts text from the system's wide character locale representation to UTF-8. For this function to work correctly, the system's installed iconv() must support conversion from a generic wchar_t target, but in POSIX whether it does so is implementation defined (GNU's C library implemention does). For most unix-like systems the wide character representation will be Unicode (UCS-4/UTF-32 or UTF-16), and where that is the case use the uniwide_to_utf8() function instead, which will not rely on the generic target being available.

Parameters:
inputText in a valid wide character locale format.
Returns:
The input text converted to UTF-8.
Exceptions:
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in a valid wide character locale format, or cannot be converted to UTF-8 (eg because the system's installed iconv() function does not support conversion from a generic wchar_t target).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.