Roll our own qUtf8Printable()?

Thiago Macieira thiago at macieira.org
Sun Feb 25 17:13:11 PST 2018


On domingo, 25 de fevereiro de 2018 15:00:38 PST Lubomir I. Ivanov wrote:
> <bstoeger at mail.tuwien.ac.at> wrote:
> > Dear all,
> > 
> > In many places, QStrings are converted to C-style strings with an
> > expression of the kind
> > 
> >  s.toUf8().data()
> > 
> > I have a patch replacing all of them with
> > 
> >  qUtf8Printable(s)
> > 
> > The idea is that - according to the documentation - this is equivalent to
> > 
> >  s.toUtf8().constData()
> > 
> > The latter should produce less machine code, because it doesn't have to
> > check whether the data of the temporary QByteArray is shared. Owing to
> > the many indirections in Qt's code, the compiler is not smart enough to
> > understand that in this case data() = constData().

It couldn't and shouldn't. toUtf8() returns a non-const QByteArray, so .data() 
calls the non-const version which will in turn try to detach. However, the 
data isn't shared, which means this is just a useless branch-and-compare. The 
performance impact should be negligible, except in tight loops, and most 
definitely dwarfed by the memory allocation that involved creating that UTF-8 
string in the first place.

I've got some code that will try to gift the memory from the QString to the 
QByteArray, but it can't be done in Qt 5 without breaking binary 
compatibility.

> > Much to my surprise, qUtf8Printable() produced larger code. The reason is
> > that it is defined as a macro in <qglobal.h>:
> > 
> > #ifndef qUtf8Printable
> > #  define qUtf8Printable(string) QString(string).toUtf8().constData()
> > #endif
> > 
> > It generates a temporary copy of the string! This looks like a defect to
> > me.

Not really, but we may be able to improve it.

It's done that way because you want this to work:

	qUtf8Printable(someString + ":" + otherString)

When the fast operator+ is active, the expression in the parentheses isn't a 
QString, but a QStringBuilder<QStringBuilder<QString, char[2]>, QString>, 
which does a single memory allocation with no temporaries, instead of the 
expected two allocations. So we need to get it to convert to QString first 
before we can call .toUtf8() on it.

This also works if the macro parameter is anything else that can be implicitly 
converted to QString, like QLatin1String. So if you wrote:

	qUtf8Printable(QLatin1String("\xe9"))

You'd get the expected "\xc3\xa9" output.

But, like I said, there may be a way to improve that. It's not what you 
suggested here:

> > So what do you think about rolling our own helper function:
> > 
> > inline const char *qstring2c(const QString &s)
> > {
> > 
> >         return s.toUtf8().constData();
> > 
> > }

The above is just wrong. The pointer that is returned is dangling, because the 
above code actually does:

	QByteArray temp1 = s.toUtf8();
	const char *temp2 = temp1.constData();
	temp1.~QByteArray();
	return temp2;

Since the temporary QByteArray owns the pointer, it will free the pointer 
before the function returns. Any and all use of that pointer is strictly UB.

That's also why both qPrintable and qUtf8Printable are macros, instead of 
inline functions. Many a novice contributor to Qt has made such a proposal to 
convert macros to inline functions and stumbled upon this mistake, myself 
included (though over 10 years ago).

> hi, Thiago.
> 
> if you have a moment, what do you think about the size differences
> between qUtf8Printable() and the proposed home brew qstring2c()
> replacement.
> Berhold is saying that qUtf8Printable() always creates a copy of the
> string, does that seem right?

No, it doesn't.

But there's another option, which is to avoid a copy of the QString if the 
input is already a QString. Let me try a few variations of possible solutions 
and compile Qt to see what happens.

I'll BRB.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center





More information about the subsurface mailing list