Unknown W. Brackets
d962fb35d3
softgpu: Centralize more prim drawing state.
2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d06f17d27b
softgpu: Move tex filter setting check to state.
2022-01-11 00:07:24 -08:00
Unknown W. Brackets
75ff3e44e6
softgpu: Move texture addresses to prim state.
2022-01-11 00:00:03 -08:00
Unknown W. Brackets
d5c5e9478e
softgpu: Prepare more state per prim call.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
9ec7d65c49
softgpu: Use func IDs instead of gstate more.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
d7a82ab7b8
softgpu: Compute func IDs once per batch of verts.
...
This saves a decent chunk of time, especially when many verts are being
drawn.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
e57730a97d
softgpu: Output normals to GE debugger.
2022-01-09 21:33:45 -08:00
Unknown W. Brackets
b915a82c41
softgpu: Correct decal doubling without alpha.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
72aa4be879
samplerjit: Skip processing alpha if unused.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
fe0b3dbd01
samplerjit: Fix alpha for 565 in linear lookup.
2022-01-09 11:08:46 -08:00
Henrik Rydgård
2d7a7fd34e
Merge pull request #15288 from unknownbrackets/softgpu-self
...
softgpu: Draw top left of rectangles first
2022-01-09 08:33:28 +01:00
Unknown W. Brackets
88ef2d1ac1
softgpu: Skip threading when rendering to self.
...
This will probably always be a problem to thread.
2022-01-08 21:05:08 -08:00
Unknown W. Brackets
6367d5dc8f
softgpu: Draw top left of rectangles first.
...
This helps when things do self-rendering, since this way we won't read
from things we've just written to when scaling down. See #11623 .
2022-01-08 20:53:01 -08:00
Unknown W. Brackets
8a00c2d233
GPU: Allow gcc/clang/icc runtime SSE4 usage.
...
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Henrik Rydgård
eee62849fe
Merge pull request #15284 from unknownbrackets/softgpu-opt
...
Improve softgpu lighting accuracy and speed
2022-01-08 22:05:06 +01:00
Unknown W. Brackets
c7fc448869
softgpu: Use some SSE4 in triangle interpolation.
2022-01-08 11:38:07 -08:00
Unknown W. Brackets
3b1cc0d3b8
softgpu: Limit minX/maxX per line.
...
Only helps when single-threaded, though.
2022-01-08 10:04:52 -08:00
Unknown W. Brackets
9458610d96
softgpu: Avoid rsqrt path for normals.
...
In LittleBigPlanet, it's noticeable that the lighting is very off due to
the slight loss of accuracy - possibly due to cutoff or similar.
2022-01-07 23:22:57 -08:00
Unknown W. Brackets
ce8a49b1c1
softgpu: Retain floats in diffuse/specular.
...
This seems to be a bit more accurate. Color blending seems correct now,
but the factors and especially pow results are off.
Also, normalize normal to 0, 0, 1, which seems to match results better.
2022-01-06 21:52:31 -08:00
Unknown W. Brackets
bd354164bc
softgpu: Cleanup -NAN and diffuse factor.
2022-01-06 21:52:27 -08:00
Unknown W. Brackets
537e357741
softgpu: Correct NAN spotlight exponent/direction.
2022-01-06 21:19:48 -08:00
Unknown W. Brackets
b86bdc9456
softgpu: Correct handling of NAN attenuation.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
fa80c448ee
softgpu: More closely match PSP light rounding.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
079b67e7ed
softgpu: Use common SIMD matrix multiplies.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
cba2374abd
softgpu: Separate calculation of S/T.
...
We could probably reuse, but we're not right now and it complicates the
logic.
2022-01-06 21:19:47 -08:00
Henrik Rydgård
683289402c
Merge pull request #15279 from unknownbrackets/samplerjit-fastpath
...
softgpu: Correct mirroring in fastpath+nearest
2022-01-05 09:43:20 +01:00
Henrik Rydgård
f82f24a9bb
Merge pull request #15280 from unknownbrackets/samplerjit-dxt
...
Correct some recent regressions in samplerjit
2022-01-05 09:42:30 +01:00
Unknown W. Brackets
0993771104
samplerjit: Fix standard bufw check.
...
Oops, bufw could be intentionally higher while w is 16 bytes.
2022-01-05 00:11:34 -08:00
Unknown W. Brackets
741a9b0a4d
samplerjit: Fix DXT compilation.
2022-01-05 00:00:03 -08:00
Unknown W. Brackets
19998976c7
samplerjit: Correct linear compile failure.
...
It was resetting to nullptr, because `nearest` was nullptr.
2022-01-04 23:58:07 -08:00
Unknown W. Brackets
e2f8cf8bf2
softgpu: Correct mirroring in fastpath+nearest.
2022-01-04 23:42:31 -08:00
Unknown W. Brackets
d98e5bfc97
softgpu: Improve usage of SSE for lighting.
...
Gives about a 2% improvement in many places.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
2aa57679fa
softjit: Keep mip S/T calc in SIMD.
...
This is only a tiny bit faster, though.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
a309ed791b
softjit: Use RIP access in color/depth off.
...
Seems to help, though it's small.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
612cc0ab5c
softjit: Optimize depth range checks.
...
This was higher than I expected on the profile. Not a huge improvement,
but a bit faster.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
961cfcd75c
softjit: Add describes here too.
...
Helpful to aggregate when there are multiple rasterizers.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
26e7768a67
samplerjit: Remove old linear nearest paths.
...
We only use it for DXT now, so let's not keep the dead code around.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
5e3bef7e14
samplerjit: Avoid gather if overread could crash.
...
This should be rare, but a game could easily shove a CLUT4 texture at the
end of VRAM, and then accessing the last index would segfault.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
7806dfddea
samplerjit: Use VPGATHERDD for all types.
2022-01-02 17:19:18 -08:00
Unknown W. Brackets
ce6ea8da11
samplerjit: Apply gather lookup to all CLUT4.
2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828
samplerjit: Use VPGATHERDD for simple CLUT4 loads.
...
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5
samplerjit: Avoid a couple more copies in AVX.
...
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Unknown W. Brackets
7594187538
softgpu: Skip sample lookup if masked.
...
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to. This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf
softgpu: Stop specializing on miplevels.
...
Now that samplerjit is processing mips, it no longer helps. Just
complexity now.
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
e4673a5fa4
softgpu: Separately profile verts and lighting.
2022-01-02 13:46:11 -08:00
Henrik Rydgård
d3f0af7458
Merge pull request #15273 from unknownbrackets/softjit-bloom
...
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d
Merge pull request #15272 from unknownbrackets/softgpu-meminfo
...
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063
Merge pull request #15271 from unknownbrackets/samplerjit-color16
...
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Unknown W. Brackets
a259761262
samplerjit: Use nearest func in fast path too.
...
This uses the more optimal tex funcs.
2022-01-02 08:48:16 -08:00
Unknown W. Brackets
ba17f538d6
softjit: Avoid const temp registers.
...
Was trying to make sure register allocation was okay in the worst case.
2022-01-02 08:47:04 -08:00