大型场景下的Vulkan CascadeShadowMap功能开发
亚马逊开源的Bistro场景大小超过了300m x 300m。先前单纯ShadowMap算法已经无法满足如此大的场景阴影表现需求,就算单张ShadowMap拉到8192 * 8192,也没有办法得到一个质量还看得过去的阴影。
简单的解决方案就是接入Cascade Shadow Map系统,由于GPU硬件的发展,CSM流派也渐渐划分为两种,一种是传统型,一种是GPU Driven的Virtual Shadow Map型。
传统型的Cascade Shadow Map系统非常简单,原理就是将相机的视锥划分为近、中、远等几个子视锥,然后根据这几个子视锥的顶点分别计算出从灯源方向的Fit To View正交投影矩阵,然后再分别为每个Cascade渲染一次深度Pass即可。
流程总结如下:
- 按照用户输入的级联数和相机参数划分每级级联的划分深度。
- 根据1中的每级级联划分深度和灯源方向计算每级级联的正交投影矩阵。
- 为每级级联都建立一次Renderpass,单独渲染到一张深度图上。
- 正常Shading时,在片元着色器根据片元的z深度和1中计算出来的每级级联的划分深度做比较,确定当前片元位于第几级级联,然后采样该级级联的深度图做常规的ShadowMap算法即可。
对于步骤3,为每级级联建立一次Renderpass时,必然会根据该级级联的RenderBounds做一次剔除操作,并且哪怕进行了一次物体级别的剔除,最终进入DrawRecord的物体数还是很多,虽然Vulkan可以多线程Record CommandBuffer,但是这样CPU压力还是很高。
因此,在育碧的Virtual Shadow Map做法(2015年的Siggraph分享)中,他们用GPU来进行剔除,并且剔除完后直接在GPU端用Compute Shader软件光栅化输出深度到Storage Image中,这样,Light Shading时直接采样这几张Storage Image即可,速度非常的快。
想要实现Virtual Shadow Map,首先要有一个GPU Culling && DrawIndirect 的渲染管线,但我目前还没有进行到这一步,因此,我还是用的传统型CascadeShadowMap。
目前的4级Cascade如下,还没做软阴影,同时性能有待优化:
实现的细节如下:
static AutoCVarFloat cVarCascadeSplitLambda(
"r.Shadow.CascadeSplitLambda",
"Cascade shadow split lambda.",
"Shadow",
0.95f,
CVarFlags::ReadAndWrite
);
static AutoCVarUint32 cVarCascadeNum(
"r.Shadow.CascadeNum",
"Cascade shadow num.",
"Shadow",
4,
CVarFlags::ReadOnly | CVarFlags::InitOnce // In future we will set it changable maybe.
);
struct Cascade
{
float splitDepth;
glm::mat4 viewProj;
static void SetupCascades(std::vector<Cascade>& inout,
float nearClip,float farClip,const mat4& cameraViewProj,const vec3& lightDir);
};
void Cascade::SetupCascades(std::vector<Cascade>& inout,float nearClip,float farClip,const mat4& cameraViewProj,const vec3& inLightDir)
{
inout.clear();
inout.resize(cVarCascadeNum.Get());
std::vector<float> cascadeSplits{};
cascadeSplits.resize(cVarCascadeNum.Get());
float clipRange = farClip - nearClip;
float minZ = nearClip;
float maxZ = nearClip + clipRange;
float range = maxZ - minZ;
float ratio = maxZ / minZ;
// 根据相机视锥和级联数目计算每级级联划分深度。
// See: https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch10.html
for(uint32 i = 0; i < cVarCascadeNum.Get(); i++)
{
float p = (i + 1) / static_cast<float>(cVarCascadeNum.Get());
float log = minZ * std::pow(ratio, p);
float uniform = minZ + range * p;
float d = cVarCascadeSplitLambda.Get() * (log-uniform) + uniform;
cascadeSplits[i] = (d-nearClip) / clipRange;
}
/////////// 为每级级联计算正交投影矩阵 ////////////
float lastSplitDist = 0.0;
for(uint32 i = 0; i < cVarCascadeNum.Get(); i++)
{
// 根据划分距离计算当前Cascade的新视锥
float splitDist = cascadeSplits[i];
glm::vec3 frustumCorners[8] =
{
glm::vec3(-1.0f, 1.0f, -1.0f),
glm::vec3(1.0f, 1.0f, -1.0f),
glm::vec3(1.0f, -1.0f, -1.0f),
glm::vec3(-1.0f, -1.0f, -1.0f),
glm::vec3(-1.0f, 1.0f, 1.0f),
glm::vec3(1.0f, 1.0f, 1.0f),
glm::vec3(1.0f, -1.0f, 1.0f),
glm::vec3(-1.0f, -1.0f, 1.0f),
};
// 将ClipSpace下的相机视锥转回世界空间
glm::mat4 invCam = glm::inverse(cameraViewProj);
for(uint32_t i = 0; i<8; i++)
{
glm::vec4 invCorner = invCam * glm::vec4(frustumCorners[i] , 1.0f);
frustumCorners[i] = invCorner / invCorner.w;
}
for(uint32 i = 0; i < cVarCascadeNum.Get(); i++)
{
glm::vec3 dist = frustumCorners[i + 4] - frustumCorners[i];
frustumCorners[i + 4] = frustumCorners[i] + (dist * splitDist);
frustumCorners[i] = frustumCorners[i] + (dist * lastSplitDist);
}
// 计算视锥中心
glm::vec3 frustumCenter = glm::vec3(0.0f);
for(uint32 i = 0; i<8; i++)
{
frustumCenter += frustumCorners[i];
}
frustumCenter /= 8.0f;
// 计算视锥半径
float radius = 0.0f;
for(uint32 i = 0; i < 8; i++)
{
float distance = glm::length(frustumCorners[i]-frustumCenter);
radius = glm::max(radius,distance);
}
radius = std::ceil(radius * 16.0f) / 16.0f;
// 计算灯光投影范围
glm::vec3 maxExtents = glm::vec3(radius);
glm::vec3 minExtents = -maxExtents;
glm::vec3 lightDir = normalize(inLightDir);
glm::mat4 lightViewMatrix = glm::lookAt(
frustumCenter - lightDir * -minExtents.z,
frustumCenter,
glm::vec3(0.0f,1.0f,0.0f)
);
glm::mat4 lightOrthoMatrix = glm::ortho(
minExtents.x,
maxExtents.x,
minExtents.y,
maxExtents.y,
0.0f,
maxExtents.z - minExtents.z
);
inout[i].splitDepth = (nearClip + splitDist * clipRange) * -1.0f;
inout[i].viewProj = lightOrthoMatrix * lightViewMatrix;
lastSplitDist = cascadeSplits[i];
}
}
我这里用了类似虚幻的AutoConsoleValue的系统来配置cVarCascadeNum和cVarCascadeSplitLambda。
计算完级联的分配后,在渲染部分,我们需要为每一级级联都准备对应的Framebuffer:
class CascadeShadowDepthPass final: public GraphicsPassInterface
{
// ...
private:
std::vector<std::vector<VkFramebuffer>> mCascadeFrameBuffers;
};
void CascadeShadowDepthPass::CreateFramebuffer()
{
uint32 cascadeNum = 4;
const uint32 swapchainImagecount = (uint32)VulkanRHI::Get()->GetSwapchainImages().size();
mCascadeFrameBuffers.resize(swapchainImagecount);
for(uint32 i = 0; i < swapchainImagecount; i++)
{
auto& cascadeFrameBuffers = mCascadeFrameBuffers[i];
cascadeFrameBuffers.resize(cascadeNum);
for(uint32 j = 0; j < cascadeNum; j++)
{
VulkanFrameBufferFactory vfbf {};
vfbf.SetRenderpass(mRenderpass)
.AddArrayAttachment(SceneTexture::Get()->GetCascadeShadowDepthMapArray(), j);
cascadeFrameBuffers[j] = vfbf.Create(VulkanRHI::Get()->GetDevice());
}
}
}
在SceneTexture类中,我申请了一组 2048 * 2048 * cVarCascadeNum 的 Texture2DArray(可以省点带宽,同时方便写forloop下标选择),用它们来存储CascadeShadowMap的信息。
在每帧渲染前的Update部分,需要先更新全局UBO的Cascde信息(我把Cascade信息写在了全局的UBO中):
// PerframeData.glsl
layout(set = 0, binding = 1) uniform SceneData{
vec4 cascadeSplits;
mat4[4] cascadeViewProjMatrix;
vec4 sunlightDirection;
} sceneData;
// RenderScene.h
struct GPUSceneData
{
vec4 cascadeSplits;
glm::mat4 cascadeViewProjMatrix[MAX_CASCADE_NUM];
glm::vec4 sunlightDirection;
};
// SimpleMeshDrawRenderer.cpp
void SimpleMeshDrawRenderer::UpdateDynamicMeshDrawCommand(RenderScene* inScene,CameraComponent* cam,uint32 backBuffer)
{
mShadowDepthPass.DynamicRecord(inScene,cam,backBuffer);
mSimpleMeshDrawpass.DynamicRecord(inScene,cam,backBuffer);
}
// SimpleMeshDrawRenderer.cpp
void SimpleMeshDrawRenderer::Render(RenderScene* inScene,PackRenderContext* pc)
{
// ...
GPUSceneData sceneData = {};
if(pc->directionalLight != nullptr)
{
sceneData.sunlightDirection = vec4(pc->directionalLight->GetLightDirection(),0.0f);
std::vector<Cascade> cascadeInfos{};
Cascade::SetupCascades(
cascadeInfos,
pc->camera->GetZNear(),
pc->camera->GetZFar(),
camData.viewProj,
pc->directionalLight->GetLightDirection()
);
for(size_t i = 0; i < cascadeInfos.size(); i++)
{
sceneData.cascadeViewProjMatrix[i] = cascadeInfos[i].viewProj;
sceneData.cascadeSplits[i] = cascadeInfos[i].splitDepth;
}
}
// ...
UpdateDynamicMeshDrawCommand(inScene,pc->camera,backBufferIndex);
// ...
}
接下来是Cascade Shadow Pass DrawCommandRecord部分,需要先对当前CascadeBounds内的物体做一次Culling在渲染:
void CascadeShadowDepthPass::DynamicRecord(RenderScene* inScene,CameraComponent* camera,uint32 backBufferIndex)
{
// Get self keep commandbuffer.
VkCommandBuffer cmd = GetDynamicVkCommandBuffer(backBufferIndex);
VulkanCheck(vkResetCommandBuffer(cmd,0));
const auto extent = SceneTexture::Get()->GetCascadeShadowDepthMapArray()->GetExtent();
VkRect2D scissor{};
scissor.offset = {0,0};
scissor.extent = { extent.width,extent.height};
VkViewport viewport{};
viewport.x = 0.0f;
viewport.y = 0.0f;
viewport.width = (float)SceneTexture::Get()->GetCascadeShadowDepthMapArray()->GetExtent().width;
viewport.height = (float)SceneTexture::Get()->GetCascadeShadowDepthMapArray()->GetExtent().height;
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;
static const float depthBiasConstant = 0.0f;// 1.25f;
static const float depthBiasSlope = 0.0f;//1.75f;
std::array<VkClearValue,1> clearValues;
clearValues[0].depthStencil = { 1.0f, 0 };
VkCommandBufferBeginInfo cmdBeginInfo = VulkanCommandbufferBeginInfo(VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT);
VulkanCheck(vkBeginCommandBuffer(cmd,&cmdBeginInfo));
vkCmdSetScissor(cmd,0,1,&scissor);
vkCmdSetViewport(cmd,0,1,&viewport);
vkCmdSetDepthBias(cmd,depthBiasConstant,0.0f,depthBiasSlope);
auto& simpleMeshDC = inScene->GetMeshpassDrawCalls().GetSimpleMeshDC();
{
VkRenderPassBeginInfo rpInfo = VulkanRenderpassBeginInfo(mRenderpass,{extent.width,extent.height},VK_NULL_HANDLE);
rpInfo.clearValueCount = static_cast<uint32_t>(clearValues.size());
rpInfo.pClearValues = clearValues.data();
for(uint32 index = 0; index < cVarCascadeNum.Get(); index++)
{
rpInfo.framebuffer = mCascadeFrameBuffers[backBufferIndex][index];
vkCmdBeginRenderPass(cmd,&rpInfo,VK_SUBPASS_CONTENTS_INLINE);
VkPipelineLayout activeLayout = VK_NULL_HANDLE;
VkPipeline activePipeline = VK_NULL_HANDLE;
for(RenderableMesh* cacheDCp : simpleMeshDC.staticDrawCall)
{
RenderableMesh& cacheDC = *cacheDCp;
cacheDC.meshReference->vertexBuffer->Bind(cmd);
for(auto& subMesh:cacheDC.meshReference->subMeshes)
{
auto& loopLayout = subMesh.materialCollection.shadowDepthReference->original->passShaders[MeshpassType::ShadowDepth]->layout;
auto& loopPipeline = subMesh.materialCollection.shadowDepthReference->original->passShaders[MeshpassType::ShadowDepth]->pipeline;
// 检查是否需要切换pipeline
if(activePipeline != loopPipeline)
{
activePipeline = loopPipeline;
activeLayout = loopLayout;
inScene->GetFrameData(backBufferIndex).BindDescriptorSet(cmd,activeLayout,MeshpassType::ShadowDepth);
inScene->GetCacheStaticMeshSSBO().BindDescriptorSet(cmd,activeLayout);
vkCmdBindPipeline(cmd,VK_PIPELINE_BIND_POINT_GRAPHICS,activePipeline);
}
subMesh.indexBuffer->Bind(cmd);
GPUCascadeIndex constants;
constants.index = index;
vkCmdPushConstants(cmd, activeLayout, VK_SHADER_STAGE_VERTEX_BIT, 0, sizeof(GPUCascadeIndex), &constants);
if(CullTest(subMesh)) // Culling 测试。
{
// 渲染
vkCmdDrawIndexed(cmd,subMesh.indicesCount,1,0,0,0);
}
}
}
vkCmdEndRenderPass(cmd);
}
}
VulkanCheck(vkEndCommandBuffer(cmd));
}
受到虚幻的影响,这里整体的渲染架构和虚幻的非常的像。
Shader部分,着色时,需要传入Viespace的片元坐标,并将它和先前划分的SplitDepth做深度比较,以此确定当前像素在哪个Cascade级别:
uint cascadeIndex = 0;
cascadeIndex += inViewSpacePos.z < sceneData.cascadeSplits.x ? 1 : 0;
cascadeIndex += inViewSpacePos.z < sceneData.cascadeSplits.y ? 1 : 0;
cascadeIndex += inViewSpacePos.z < sceneData.cascadeSplits.z ? 1 : 0;
我这里暂时HardCode了4级的CascadeSplitDepth在cascadeSplits的x、y、z、w分量中。
剩下的就是正常的ShadowMap流程了。