c++-gtk-utils
|
A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters. More...
#include <c++-gtk-utils/reassembler.h>
Public Member Functions | |
Cgu::SharedHandle< char * > | operator() (const char *input, size_t size) |
size_t | get_stored () const |
void | reset () |
Reassembler () |
A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters.
Utf8::Reassembler is a functor class which takes in a partially formed UTF-8 string and returns a nul-terminated string comprising such of the input string (after inserting, at the beginning, any partially formed UTF-8 character which was at the end of the input string passed in previous calls to the functor) as forms complete UTF-8 characters (storing any partial character at the end for the next call to the functor). If the input string contains invalid UTF-8 after adding any stored previous part character (apart from any partially formed character at the end of the input string) then operator() will return a null Cgu::SharedHandle<char*> object (that is, Cgu::SharedHandle<char*>::get() will return 0). Such input will not be treated as invalid if it consists only of a single partly formed UTF-8 character which could be valid if further bytes were received and added to it. In that case the returned SharedHandle<char*> object will contain an allocated string of zero length, comprising only a terminating \0 character, rather than a NULL pointer.
This enables UTF-8 strings to be sent over pipes, sockets, etc and displayed in a GTK+ object at the receiving end
Note that for efficiency reasons the memory held in the returned Cgu::SharedHandle<char*> object may be greater than the length of the nul-terminated string that is contained in that memory: just let the Cgu::SharedHandle<char*> object manage the memory, and use the contents like any other nul-terminated string.
This class is not needed if std::getline(), with its default '\n' delimiter, is used to read UTF-8 characters using, say, Cgu::fdistream, because a whole '\n' delimited line of UTF-8 characters will always be complete.
This is an example of its use, reading from a pipe until it is closed by the writer and putting the received text in a GtkTextBuffer object:
This class maintains an array as a data member, containing partly formed characters from previous calls to operator(), and should not be copied. There should be no reason to do so, but unfortunately enforcing this by explicitly precluding copy construction and copy assignment was overlooked when this class was first provided. At the next API break, the copy constructor will be explicitly deleted and moving only allowed. Where a Reassembler object is to be moved, use std::move and the code will be safe against this change in the future.
|
inline |
The constructor will not throw.
|
inline |
Gets the number of bytes of a partially formed UTF-8 character stored for the next call to operator()(). It will not throw.
Cgu::SharedHandle<char*> Cgu::Utf8::Reassembler::operator() | ( | const char * | input, |
size_t | size | ||
) |
Takes a byte array of wholly or partly formed UTF-8 characters to be converted (after taking account of previous calls to the method) to a valid string of wholly formed characters.
input | The input array. |
size | The number of bytes in the input (not the number of UTF-8 characters). |
std::bad_alloc | The method might throw std::bad_alloc if memory is exhausted and the system throws in that case. It will not throw any other exception. |
|
inline |
Resets the Reassembler, by discarding any partially formed UTF-8 character from previous calls to operator()(). It will not throw.