[RFC PATCH] gas model: use virial cubic polynomial form
torvalds at linux-foundation.org
Fri Mar 4 11:56:31 PST 2016
On Fri, Mar 4, 2016 at 1:04 AM, Robert Helling <helling at atdotde.de> wrote:
> But if you really wanted to tweet things, there are two things that could be
> factored out: Don’t divide o2 and he by 1000.0, rather do that only once in
> the end.
Yes. And make it a "multiply by 0.001" instead - that not only avoids
the divide, it makes it possible for the compiler to do more madd
instructions (not that we'll see those on x86 any time soon due to
compatibility issues, even though they are starting to exist)
> And instead of doing the weighted average of the Z factors, you
> could equivalently do the weighted average of the coefficients of the
> polynomial. That you would need to do only once per gas and save those
> values as we compute Z for many pressures for the same gas.
The testing overhead for "is it the same gas" would actually overwhelm
the calculations. Floating point isn't really *that* expensive in the
Now, the whole thing is actually amenable to be written in vector
form, but that's just too much.
Anyway, having done a release build (in order to get reasonable
optimizations), the code generated is actually quite good.. It ends up
being just 52 instructions, of which 41 are floating point
instructions (multiply, add, convert from integer, load fp constant).
The remaining 11 instructions are just the trivial "load o2 and
helium, check for air, subtract 1000-o2-he".
It would be even better if we had multiply-add instructions, but it's
really not bad at all.
Even vectorizing it might actually just make things worse: turning on
the vector units can be much more expensive than the couple of
instructions we'd win.
I'm attaching the trivial patch to get to that state here, but I'm not
even sure it's worth it: with optimizations, gcc actually turns our
*current* code into fine code.
Yeah, without this patch there's a couple of "divsd" instructions, but
they don't dominate, so it's actually likely that they schedule fine.
The latency of a divsd is only 3-4 times the latency of a multiply on
modern cores - we're talking fewer cycles than a cache miss.
-------------- next part --------------
subsurface-core/gas-model.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/subsurface-core/gas-model.c b/subsurface-core/gas-model.c
index 81765e003002..4612d99c86d8 100644
@@ -40,22 +40,25 @@ double gas_compressibility_factor(struct gasmix *gas, double bar)
- double o2, he;
+ int o2, he;
double x1, x2, x3;
- o2 = get_o2(gas) / 1000.0;
- he = get_he(gas) / 1000.0;
+ o2 = get_o2(gas);
+ he = get_he(gas);
x1 = bar; x2 = x1*x1; x3 = x2*x1;
Z = virial_m1(o2_coefficients, x1, x2, x3) * o2 +
virial_m1(he_coefficients, x1, x2, x3) * he +
- virial_m1(n2_coefficients, x1, x2, x3) * (1.0 - o2 - he);
+ virial_m1(n2_coefficients, x1, x2, x3) * (1000 - o2 - he);
* We add the 1.0 at the very end - the linear mixing of the
* three 1.0 terms is still 1.0 regardless of the gas mix.
+ * The "*0.001" is because we multiplied in the gas fractions
+ * as whole permille.
- return Z + 1.0;
+ return Z * 0.001 + 1.0;
More information about the subsurface