Skip to content

Commit

Permalink
Rewrite SetShader for performance.
Browse files Browse the repository at this point in the history
Rewrote the routine, but hopefully did not change behavior.  It's got a lot of complicated features though.

This removes all but one of the EnterCriticalSection, and puts that where a map is actually modified, and nothing else.

Reorders the subpieces so that we can just modify the shader passed in, and set that shader in the same spot in the routine for all of them.

Does an optimization to see if the mShaderOverrideMap is empty or not, and if it's empty, skip doing lookups that take measurable CPU time when profiled.

Added several other .empty() checks for other hot spots in the code, so that unless we are using one of the esoteric features of 3Dmigoto, we don't pay any CPU price. Specifically TextureOverrides, ShaderOverrides, or BufferOverrides.

After this change, the profile demonstrates we recovered all that CPU time.  1.5% CPU to 0.8% CPU now.

I do not believe the remaining items can be trimmed further.
  • Loading branch information
bo3b committed Jul 30, 2015
1 parent 9360da0 commit 267b225
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 54 deletions.
107 changes: 53 additions & 54 deletions DirectX11/HackerContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -804,13 +804,15 @@ DrawContext HackerContext::BeforeDraw()

// Override settings?
// TODO: Process other types of shaders
ShaderOverrideMap::iterator iVertex = G->mShaderOverrideMap.find(mCurrentVertexShader);
ShaderOverrideMap::iterator iPixel = G->mShaderOverrideMap.find(mCurrentPixelShader);

if (iVertex != G->mShaderOverrideMap.end())
ProcessShaderOverride(&iVertex->second, false, &data, &separationValue, &convergenceValue);
if (iPixel != G->mShaderOverrideMap.end())
ProcessShaderOverride(&iPixel->second, true, &data, &separationValue, &convergenceValue);
if (!G->mShaderOverrideMap.empty()) {
ShaderOverrideMap::iterator iVertex = G->mShaderOverrideMap.find(mCurrentVertexShader);
ShaderOverrideMap::iterator iPixel = G->mShaderOverrideMap.find(mCurrentPixelShader);

if (iVertex != G->mShaderOverrideMap.end())
ProcessShaderOverride(&iVertex->second, false, &data, &separationValue, &convergenceValue);
if (iPixel != G->mShaderOverrideMap.end())
ProcessShaderOverride(&iPixel->second, true, &data, &separationValue, &convergenceValue);
}

if (data.override) {
HackerDevice *device = mHackerDevice;
Expand Down Expand Up @@ -1077,6 +1079,9 @@ HRESULT HackerContext::MapDenyCPURead(
if (dim != D3D11_RESOURCE_DIMENSION_TEXTURE2D)
return E_FAIL;

if (G->mTextureOverrideMap.empty())
return E_FAIL;

tex->GetDesc(&desc);
hash = GetTexture2DHash(tex, false, NULL);

Expand Down Expand Up @@ -1122,6 +1127,9 @@ void HackerContext::FreeDeniedMapping(ID3D11Resource *pResource, UINT Subresourc
if (Subresource != 0)
return;

if (G->mTextureOverrideMap.empty())
return;

DeniedMap::iterator i;
i = mDeniedMaps.find(pResource);
if (i == mDeniedMaps.end())
Expand Down Expand Up @@ -1159,7 +1167,7 @@ STDMETHODIMP_(void) HackerContext::Unmap(THIS_
__in UINT Subresource)
{
FreeDeniedMapping(pResource, Subresource);
mOrigContext->Unmap(pResource, Subresource);
mOrigContext->Unmap(pResource, Subresource);
}

STDMETHODIMP_(void) HackerContext::PSSetConstantBuffers(THIS_
Expand Down Expand Up @@ -1768,7 +1776,7 @@ STDMETHODIMP_(void) HackerContext::SetShader(THIS_
/* [annotation] */
__in_ecount_opt(NumClassInstances) ID3D11ClassInstance *const *ppClassInstances,
UINT NumClassInstances,
std::unordered_map<ID3D11Shader *, UINT64> *shaders,
std::unordered_map<ID3D11Shader *, UINT64> *registered,
std::unordered_map<ID3D11Shader *, ID3D11Shader *> *originalShaders,
std::unordered_map<ID3D11Shader *, ID3D11Shader *> *zeroShaders,
std::set<UINT64> *visitedShaders,
Expand All @@ -1778,67 +1786,58 @@ STDMETHODIMP_(void) HackerContext::SetShader(THIS_
{
if (pShader) {
// Store as current shader. Need to do this even while
// not hunting for ShaderOverride sections.
if (G->ENABLE_CRITICAL_SECTION) EnterCriticalSection(&G->mCriticalSection);

std::unordered_map<ID3D11Shader *, UINT64>::iterator i = shaders->find(pShader);
if (i != shaders->end()) {
// not hunting for ShaderOverride section in BeforeDraw
// As an optimization, we can skip the lookup if there are no ShaderOverride
// The lookup/find takes measurable amounts of CPU time.
if (!G->mShaderOverrideMap.empty() || (G->hunting == HUNTING_MODE_ENABLED)) {
std::unordered_map<ID3D11Shader *, UINT64>::iterator i = registered->find(pShader);
if (i != registered->end()) {
*currentShaderHash = i->second;
*currentShaderHandle = pShader;
LogDebug(" shader found: handle = %p, hash = %016I64x\n", pShader, *currentShaderHash);
LogDebug(" shader found: handle = %p, hash = %016I64x\n", *currentShaderHandle, *currentShaderHash);

if ((G->hunting == HUNTING_MODE_ENABLED) && visitedShaders) {
// Add to visited shaders.
if (G->ENABLE_CRITICAL_SECTION) EnterCriticalSection(&G->mCriticalSection);
visitedShaders->insert(i->second);
if (G->ENABLE_CRITICAL_SECTION) LeaveCriticalSection(&G->mCriticalSection);
}

// second try to hide index buffer.
// if (mCurrentIndexBuffer == mSelectedIndexBuffer)
// pIndexBuffer = 0;
} else
}
else
LogDebug(" shader %p not found\n", pShader);
}

if (G->hunting == HUNTING_MODE_ENABLED) {
// Replacement map.
if (G->marking_mode == MARKING_MODE_ORIGINAL || !G->fix_enabled) {
std::unordered_map<ID3D11Shader *, ID3D11Shader *>::iterator j = originalShaders->find(pShader);
if ((selectedShader == *currentShaderHash || !G->fix_enabled) && j != originalShaders->end()) {
ID3D11Shader *shader = j->second;
if (G->ENABLE_CRITICAL_SECTION) LeaveCriticalSection(&G->mCriticalSection);
(mOrigContext->*OrigSetShader)(shader, ppClassInstances, NumClassInstances);
return;
}
}
if (G->marking_mode == MARKING_MODE_ZERO) {
std::unordered_map<ID3D11Shader *, ID3D11Shader *>::iterator j = zeroShaders->find(pShader);
if (selectedShader == *currentShaderHash && j != zeroShaders->end()) {
ID3D11Shader *shader = j->second;
if (G->ENABLE_CRITICAL_SECTION) LeaveCriticalSection(&G->mCriticalSection);
(mOrigContext->*OrigSetShader)(shader, ppClassInstances, NumClassInstances);
return;
}
}
}
// If the shader has been live reloaded from ShaderFixes, use the new one
// No longer conditional on G->hunting now that hunting may be soft enabled via key binding
ShaderReloadMap::iterator it = G->mReloadedShaders.find(pShader);
if (it != G->mReloadedShaders.end() && it->second.replacement != NULL) {
LogDebug(" shader replaced by: %p\n", it->second.replacement);

// If the shader has been live reloaded from ShaderFixes, use the new one
// No longer conditional on G->hunting now that hunting may be soft enabled via key binding
ShaderReloadMap::iterator it = G->mReloadedShaders.find(pShader);
if (it != G->mReloadedShaders.end() && it->second.replacement != NULL) {
LogDebug(" shader replaced by: %p\n", it->second.replacement);
// Todo: It might make sense to Release() the original shader, to recover memory on GPU
pShader = (ID3D11Shader*)it->second.replacement;
}

// Todo: It might make sense to Release() the original shader, to recover memory on GPU
ID3D11Shader *shader = (ID3D11Shader*)it->second.replacement;
if (G->ENABLE_CRITICAL_SECTION) LeaveCriticalSection(&G->mCriticalSection);
(mOrigContext->*OrigSetShader)(shader, ppClassInstances, NumClassInstances);
return;
if (G->hunting == HUNTING_MODE_ENABLED) {
// Replacement map.
if (G->marking_mode == MARKING_MODE_ORIGINAL || !G->fix_enabled) {
std::unordered_map<ID3D11Shader *, ID3D11Shader *>::iterator j = originalShaders->find(pShader);
if ((selectedShader == *currentShaderHash || !G->fix_enabled) && j != originalShaders->end()) {
pShader = j->second;
}
}
if (G->marking_mode == MARKING_MODE_ZERO) {
std::unordered_map<ID3D11Shader *, ID3D11Shader *>::iterator j = zeroShaders->find(pShader);
if (selectedShader == *currentShaderHash && j != zeroShaders->end()) {
pShader = j->second;
}
}
}

if (G->ENABLE_CRITICAL_SECTION) LeaveCriticalSection(&G->mCriticalSection);
} else {
*currentShaderHash = 0;
*currentShaderHandle = NULL;
}

// Call through to original XXSetShader, but pShader may have been replaced.
(mOrigContext->*OrigSetShader)(pShader, ppClassInstances, NumClassInstances);
}

Expand Down Expand Up @@ -2512,7 +2511,7 @@ STDMETHODIMP_(void) HackerContext::IASetIndexBuffer(THIS_
{
LogDebug("HackerContext::IASetIndexBuffer called\n");

if (pIndexBuffer) {
if (pIndexBuffer && !G->mDataBuffers.empty()) {
// Store as current index buffer.
if (G->ENABLE_CRITICAL_SECTION) EnterCriticalSection(&G->mCriticalSection);
DataBufferMap::iterator i = G->mDataBuffers.find(pIndexBuffer);
Expand Down
14 changes: 14 additions & 0 deletions DirectX11/HackerDevice.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1601,6 +1601,20 @@ STDMETHODIMP HackerDevice::CreateTexture2D(THIS_
}
}

// In the case where there is in fact nothing to be done with a texture hash,
// no texture overrides, let's not calculate it, because it can use measurable
// amounts of CPU time. In GTA5 I measured this as avg frame rates 55 vs. 48.
//
// If we are hunting mode, we need all the hashes for ShaderUsages.
if (G->mTextureOverrideMap.empty() && !G->hunting && (G->gSurfaceSquareCreateMode == -1))
{
HRESULT hr = mOrigDevice->CreateTexture2D(pDesc, pInitialData, ppTexture2D);
if (ppTexture2D) LogDebug(" returns result = %x, handle = %p\n", hr, *ppTexture2D);

return hr;
}


This comment has been minimized.

Copy link
@DarkStarSword

DarkStarSword Aug 23, 2015

Collaborator

This should probably go after the depth/stencil resolution check - the result of that check can be used in a ShaderOverride section without needing any TextureOverrides. Currently the only games that use it (Lichdom, Crysis 3) all have TextureOverride sections, so it will still work for them, but it would be better if this did not surprise someone later on.

This comment has been minimized.

Copy link
@bo3b

bo3b Aug 24, 2015

Author Owner

Ah, good to know. I'd say go ahead and delete that section altogether.

That early exit was a first attempt at fixing performance problems in GTA5, which lead to the whole hash rewrite/change. Now that it takes no measurable CPU to do the hash, even in the target case of GTA5, i'd say there is no longer a need to early exit.

My test cases were with hunting=2, so that the full hash calculation would happen.

// Rectangular depth stencil textures of at least 640x480 may indicate
// the game's resolution, for games that upscale to their swap chains:
if (pDesc && (pDesc->BindFlags & D3D11_BIND_DEPTH_STENCIL) &&
Expand Down

0 comments on commit 267b225

Please sign in to comment.