-
Notifications
You must be signed in to change notification settings - Fork 520
The Forge Shading Language (FSL)
The purpose of FSL is to provide a single shader syntax from which hlsl/pssl/vk-glsl/metal shader code shader code can be generated. The syntax is largely identical to hlsl, with differences in the
shader entry and resource declarations.
Whenever possible we make use of simple macros. For more complex modifications, a python script is used (Common_3/Tools/ForgeShadingLanguage/fsl.py).
Therefore python 3.6 is necessary to generate the shaders.
We include a no-install python3.6 in Tools/python-3.6.0-embed-amd64.
In the vs fsl.target custom target we prepend that path to PATH, such that the build system uses that binary.
The syntax is generally similar to hlsl, with some modifications intended to make it simpler to expand the code as necessary. For development we recommend to setup and use as many target compilers as possible.
FSL supports vertex, pixel, compute and tessellation shaders (control and evaluation stages). Entry functions are declared using the
VS_MAIN, PS_MAIN, CS_MAIN, TC_MAIN, TE_MAIN
keywords and should span a single line.
The first statement in the main function body should be:
INIT_MAIN;
this statement will get expanded differently for each target language, to return from the main function, the keyword
RETURN(); // for void main function
float4 Out = (...);
RETURN(Out); // for main function with return type
is used;
Here is a sample fsl pixel shader using shader IO and global resources:
STRUCT(VSOutput)
{
DATA(float4, Position, SV_Position);
DATA(float, TexCoord, TEXCOORD);
};
float4 PS_MAIN( VSOutput In )
{
INIT_MAIN;
float4 color = SampleTex2D(Get(uTexture0), Get(uSampler0), In.TexCoord);
RETURN(color);
}
Shaders are built offline and only the binaries are loading by the application.
The unit tests have a "Reload Shaders" button which when triggered will reload the shader binaries.
See the ReloadServer section for more details.
Binary declarations map to individual output binaries and use the following syntax:
#frag myshader.frag
// src here
#end
Binary declarations need to be declared in the top-level fsl src file, but can be located anywhere in that file.
The generator will collect these and preprocess them roughly as follows:
#ifdef myshader_frag
// src here
#endif
This allows to have a single file with all binary declarations using includes, or for example the following:
float4 PS_MAIN( )
{
#frag myshader_red.frag
float4 color = float4(1,0,0,1);
#end
#frag myshader_green.frag
float4 color = float4(0,1,0,1);
#end
RETURN(color);
}
Feature flags can be added to binary declarations to enable the following features:
FT_PRIM_ID // necessary for use of SV_PrimitiveID
FT_RAYTRACING // enables ray query extensions/headers and necessary msl/spirv/hlsl targets
FT_VRS // Variable Rate Shading
FT_MULTIVIEW // necessary for multiview rendering for VR
They are inserted into the declaration as follows:
#frag FT_PRIMT_ID FT_MULTIVIEW myshader.frag
These constants get baked into the micro-code during pipeline creation time so the performance is identical to using a macro without any of the downsides of macros (too many shader variations increasing the size of the build).
Good read on Specialization constants. Same things apply to function constants on Metal https://arm-software.github.io/vulkan_best_practice_for_mobile_developers/samples/performance/specialization_constants/specialization_constants_tutorial.html
Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration
Macro arguments:
#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)
Example usage:
SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;
void main()
{
// Can be used like regular variables in shader code
if (gRenderMode == 1)
{
//
}
}
NOTE: Unlike Vulkan, Metal does not provide a way to initialize function constants to default value. So all required function constants need to be passed through ShaderLoadDesc/BinaryShaderDesc when creating the shader
Function parameters can be annotated using in/out/inout:
void fn(
in(float) param_in,
out(float) param_out,
inout(float) param_inout
) {}
FSL matrices are column major.
Matrices declared inside cbuffer, pushconstants, or structure buffers are initialized from memory in column major order.
Explicit constructors and accessors are provided:
// this initializes a 3 cols by 2 rows matrix from three 2-component rows.
f3x2 M = f3x2Rows(r0, r1, r2);
setElem(M, 0, 1, 42.0f); // sets the element at col 0, row 1 to 42
float3 col0 = getCol0(M);
float2 row1 = getRow1(M);
// create a matrix from scalars, provided in row-major order
f2x2 M = f2x2RowElems(
0,1
2,3);
float2 col1 = getCol1(M); // (0,2)
We also provide overloaded Identity constructors and helpers to initialize vectors with identical components:
f4x4 id = Identity();
float4 = f4(1); // float4(1,1,1,1)
The-Forge resources are grouped into four update frequencies:
UPDATE_FREQ_NONE
UPDATE_FREQ_PER_FRAME
UPDATE_FREQ_PER_BATCH
UPDATE_FREQ_PER_DRAW
These generally map to resource tables in the rootsignature.
Since a range of platforms require identical resource declarations per stage, we recommend placing these into resource headers which get included by each stage source file (the declarations are not necessary if the stage uses no resources).
Resources are declared using the CBUFFER(...), PUSH_CONSTANT(...) and RES(...) syntax.
Resources, CBuffer and push constant elements are made available in a global resource namespace
which can be accessed from any function.
For explicit resource placement, hlsl registers and glsl bindings need to be declared.
To access a resource, the syntax Get(resource) is used. Texture and Buffer resources can be declared as arrays by appending the dimension to the identifier. For Metal, argument buffers are generated for an update frequency whenever a single resource is declared as an array:
RES(Buffer(uint), myBuffers[2], UPDATE_FREQ_NONE, b0, binding=0);
If any such resource declaration is active in a shader, all resource declaring with the same update frequency get placed inside the argument buffer.
The following syntax declares a CBuffer:
CBUFFER(Uniforms, UPDATE_FRE_NONE, b0, binding=0)
{
DATA(f4x4, mvp, None);
};
The following syntax declares a PushConstant:
PUSH_CONSTANT(PushConstants, b0)
{
DATA(uint, index, None);
};
The following types of buffers are supported:
Buffer, WBuffer, RWBuffer, ByteBuffer, and RWByteBuffer
RES(RWBuffer(MyType), myArray, UPDATE_FREQ_NONE, b0, binding=0);
The following atomic functions are supported:
// atomic add of value 42 at location 0, previous value is written to last argument
AtomicAdd(Get(uRWBuffer)[0], 0, 42, pre_val);
// atomic load & store of value 42 at location 0
val = AtomicLoad(Get(uRWBuffer)[0]);
AtomicStore(Get(uRWBuffer)[0], 42);
AtomicMin(Get(uRWBuffer)[0], 42);
AtomicMax(Get(uRWBuffer)[0], 42);
FSL texture are fundamentally split between readonly types for sampling
- Tex#D, Tex#DArray, Tex2DMS, TexCube, Depth2D, Depth2DMS
And read-write types:
- RTex#D (readonly), WTex#D (writeonly), RWTex#D (read-write)
Sampling types map to hlsl Texture#D
types, glsl texture#D
and metal texture#d<T, access::sample>
types.
Read-Write types map to hlsl RWTexture#D
types, glsl image#D
types and metal texture#d<T, access::read_write>
types.
Sampling is performed using SampleTex#
functions.
Load access is performed using LoadTex#
functions for sampling types,
and LoadRWTex#
for read-write types.
Writing is performed using Write#D
functions.
An example, sampling from a cube texture and writing the result to an RW texture2D array:
RES(TexCube(float4), srcTexture, UPDATE_FREQ_NONE, t0, binding = 0);
RES(RWTex2DArray(float4), dstTexture, UPDATE_FREQ_NONE, u2, binding = 2);
RES(SamplerState, skyboxSampler, UPDATE_FREQ_NONE, s3, binding = 3);
(...)
float4 value = SampleLvlTexCube(Get(srcTexture), Get(skyboxSampler), float3(1,0,0), 0);
Write3D(Get(dstTexture), int3(0,0,0), value); // write to texel (0,0) of slice 0.
For loading functions, the sampler argument can also be NO_SAMPLER
,
though for Vulkan GL_EXT_samplerless_texture_functions
is necessary (its gets automatically enabled).
Texture dimensions can be retrieved using:
int2 size = GetDimensions(Get(uTexture), Get(uSampler));
// samplerless alternative
int2 size2 = GetDimensions(Get(uTexture), NO_SAMPLER);
Shader input and output structs are declared using the following syntax:
STRUCT(VSInput)
{
DATA(float4, position, SV_Position);
};
Such declared datatypes are then normally passen to the main function:
VSOutput(Out) VS_MAIN(VS_Input In)
The shader return variables get automatically created in the INIT_MAIN
expansion,
and is automatically returned on a call to RETURN
.
The following semantics are supported:
SV_Position
SV_VertexID
SV_InstanceID
SV_GroupID
SV_DispatchThreadID
SV_GroupThreadID
SV_GroupIndex
SV_SampleIndex
SV_PrimitiveID
SV_DomainLocation
For regular main inputs, the semantic is used as a case-insensitive decoration around the variable type:
void CS_MAIN(SV_GroupIndex(uint) groupIndex)
{...}
For accessing elements of resource arrays, special syntax is necessary when the index is divergent:
uint index = (...);
float4 texColor = f4(0);
BeginNonUniformResourceIndex(index, 256); // 256 is the max possible index
texColor = SampleLvlTex2D(Get(textureMaps)[index], Get(smp), uv, 0);
EndNonUniformResourceIndex();
For Vulkan, the enclosed block gets replaced based on the availability of the following extensions:
- VK_EXT_DESCRIPTOR_INDEXING_EXTENSION: wraps the index inside the block with nonuniformEXT(...)
- VK_FEATURE_TEXTURE_ARRAY_DYNAMIC_INDEXING: code inside the block is left untouched
- if no extension is available, a switch construct is used
For other platforms, a loop with lane masking is being used as necessary.
For Tessellation, the following syntax is provided:
TESS_VS_SHADER("shader.vert.fsl") // the vs which will be part of the pipeline
PATCH_CONSTANT_FUNC("ConstantHS") // name of the pcf
// declare domain, partitioning and output topology
// required in TC and TE stages
TESS_LAYOUT("quad", "integer", "triangle_ccw")
OUTPUT_CONTROL_POINTS(1)
MAX_TESS_FACTOR(10.0f)
For metal, each TC shader get transformed into a compute shader which:
- calls the VS main function
- runs the TC main code
- calls the pcf function
- write the results to a buffer
To enable Wave Intrisics, the keyword
ENABLE_WAVEOPS
needs to be inserted into the shader code, its location isnt relevant.
The following intrinsics are supported:
ballot_t vote = WaveActiveBallot(expr);
uint numActiveLanes = CountBallot(activeLaneMask);
if (WaveIsFirstLane())
{...}
if (WaveGetLaneIndex() == WaveGetMaxActiveIndex())
{...}
val = WaveReadLaneFirst(val);
val = WaveActiveSum(val);
val = QuadReadAcrossX(i):
val = QuadReadAcrossX(j);
FSL is integrated into our Visual Studio, XCode and CodeLite projects. The generator tool can also be called directly:
usage: fsl.py [-h] -d DESTINATION -b BINARYDESTINATION [-i INTERMEDIATEDESTINATION]
[-l {DIRECT3D11,DIRECT3D12,METAL,ORBIS,PROSPERO,SCARLETT,VULKAN,XBOX,GLES} [{DIRECT3D11,DIRECT3D12,METAL,ORBIS,PROSPERO,SCARLETT,VULKAN,XBOX,GLES} ...]]
[--verbose] [--compile] [--rootSignature ROOTSIGNATURE] [--cache=args] [--shaderServerPort PORT]
fsl_input
If compilation is requested, the tool will attempt to locate appropirate compilers using env variables:
DIRECT3D11: $(FSL_COMPILER_FXC)
(if not set, will default to "C:/Program Files (x86)/Windows Kits/8.1/bin/x64/fxc.exe")
DIRECT3D12: $(FSL_COMPILER_DXC)
(if not set, will default to "The-Forge/ThirdParty/OpenSource/DirectXShaderCompiler/bin/x64/dxc.exe")
METAL: $(FSL_COMPILER_METAL)
(if not set, will default to "'C:/Program Files/METAL Developer Tools/macos/bin/metal.exe'")
ORBIS: $(SCE_ORBIS_SDK_DIR)/host_tools/bin/orbis-wave-psslc.exe
PROSPERO: $(SCE_PROSPERO_SDK_DIR)/host_tools/bin/prospero-wave-psslc.exe
VULKAN: $(VULKAN_SDK)/Bin/glslangValidator.exe
XBOX: $(GXDKLATEST)/bin/XboxOne/dxc.exe
SCARLETT: $(GXDKLATEST)/bin/Scarlett/dxc.exe
GLES: (Can only be compiled during runtime)
A custom buid dependency is defined in Common_3/Tools/ForgeShadingLanguage/VS/fsl.target
.
Once added to a project, any added *.fsl is assigned the <FSLShader>
item type.
To add the build customization right-click the project in VS and choose "Build Dependencies" -> "Build Customizations..." -> "Find Existing..." and choose the fsl.target file. The customization can than be enabled per-project from the same menu.
For XCode, we use a custom build rule for *.fsl resources and directly generate the metal shaders
into the the target package. You can find this in a shell script in the Build Phases
section of the XCode project settings.
For codelite we use custom makefile additions. You can find this in the Customize -> Custom Makefile Rules
section of the Codelite project settings.
For further examples, please consult our Unit Test shader code.
We aimed to handle includes on our own as much as possible to reduce the need for compiler include handlers. A notable case was dxc, where our generated shaders would compile and run just fine, but hlsl::Exceptions were being thrown from IDxcCompiler::Compile() which originated from the clang ast parse.
ReloadServer allows you to dynamically recompile FSL shaders at runtime by clicking the Reload shaders
button in the Debug UI. It works by running a socket server on the host PC that is waiting for the device to send a shader recompile request.
Upon receiving this request, ReloadServer only recompiles shaders that have been modified for the requested projecct, and sends them back to the device where they are reloaded after being received. In the case of a compilation/connection error,
the message will be printed to the device logs so that the issue can quickly be inspected.
ReloadServer is intended to work automatically, and will at most require the user to run a script once per session in order to use it. It is already integrated into all of our projects on all platforms, so no setup is required. See Manually running ReloadServer for details on how to integrate ReloadServer into a new project.
ReloadServer is run automatically on PC during App init, and killed during App exit. There is no input required from the user, it works completely automatically.
For Console/Mobile projects, ReloadServer must be run manually on the host PC in order to allow dynamic recompilation of shaders on the device. See Manually running ReloadServer for details on how to run the server manually.
The basic workflow is as follows:
- (Non-PC only) Run
Common_3/Tools/ReloadServer/ReloadServer.sh
orReloadServer.bat
in a terminal - Run
App
- Modify FSL file
- Click
Reload shaders
button - (if success) Observe updated shaders in
App
(if failure) Error is printed inApp
logs
ReloadServer has only one configuration option - the server port. The default port is 6543. The port can also be configured in the following ways:
You can configure ReloadServer port using the DevicePort
option in the FSLShader
IDE configuration panel of your project settings. If DevicePort
is empty, then the default port is used.
On XCode/CodeLite, ReloadServer can be configured via the invocation of fsl.py
in the build script located in the project settings. If no port is provided, then the default port is used. See fsl.py integration for details.
ReloadServer on Android uses adb reverse tcp:PORT tcp:PORT
in order to forward recompile requests from the device to the host PC via the USB cable, which avoids requiring a network connection. This is done automatically and requires no user input.
ReloadServer on iOS requires the device to be connected to the same network as the host PC on which the ReloadServer daemon is running.
ReloadServer on Switch requires the device to be connected to the same network as the host PC on which the ReloadServer daemon is running.
See PS4/ReloadServer.md
.
See Xbox/ReloadServer.md
.
ReloadServer can easily be run manually by using the platform-specific batch/shell script located at Common_3/Tools/ReloadServer
.
.\Common_3\Tools\ForgeShadingLanguage\server\ReloadServer.bat
./Common_3/Tools/ReloadServer/ReloadServer.sh
The ReloadServer python script can be run from any directory, and is located at Common_3/Tools/ReloadServer/ReloadServer.py
.
usage: ReloadServer.py [--port PORT] [--daemon] [--kill]
-
--port PORT
- Choose port used by ReloadServer -
--kill
- Kill currently running ReloadServer (--daemon
is ignored if this is passed) -
--daemon
- Run ReloadServer as a daemon process instead of directly in the terminal
Only one server will ever be run on a given port (regardless if running as daemon process or not). If there is already a server running on the given port, the server script will print a message and exit instead of running another server on that port. This can be useful when debugging potential issues.
When an error occurs during shader recompilation, the error message is sent to 3 different locations:
- Printed to
stdout
inReloadServer.py
script - useful for debugging when running directly from terminal - Written to
server-log.txt
next toReloadServer.py
- useful for debugging the daemon process - Sent to
App
and printed to device/IDE logs - useful for debugging errors in shader code
This error message can be one of two things:
- Generic error message returned by
ReloadServer.py
(i.e. path sent by device does not exist). - Shader compile error returned by
fsl.py
. In this case the entire outputstdout
is the error message.
ReloadServer is designed to be as fast and responsive as possible, so errors in shader compilation do not cause App
to stop running.
The reasoning is that compiling/running App
might take very long, whereas fixing ReloadServer issues can be done very quickly (and maybe can be done several times before App
can restart even once).
In the case of every failure, ReloadServer prints a detailed message to the App
logs about what might have gone wrong and how to fix it, which the developer can use to fix the issue (often in much less time than it takes to restart App
). The following most common issues can all quickly be fixed by glancing at device logs:
- User programming error in shader code (most commonly a typo)
- ReloadServer is not running when requesting shader recompile (usually only on XCode/Codelite where it needs to be run manually)