Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct light sampling #37

Closed
DU-jdto opened this issue Jul 16, 2019 · 11 comments
Closed

Direct light sampling #37

DU-jdto opened this issue Jul 16, 2019 · 11 comments

Comments

@DU-jdto
Copy link

DU-jdto commented Jul 16, 2019

A couple of things with the direct light calculations in get_direct_illumination and get_sunlight in path_tracer_rgen.h:

  1. In both functions, the diffuse term is (correctly) multiplied by NdotL. However, since this is done before multiplication by the specular term, the latter ends up multiplied by that cosine as well, which I don't think is correct. I realized this after noticing that specular highlights on surfaces with roughness <= pt_direct_roughness_threshold - 0.02 were disproportionately brighter than those on surfaces with roughness >= pt_direct_roughness_threshold + 0.02. Changing the code so NdotL only affects diffuse but not specular corrects this inconsistency.

  2. I'm less sure about this, but shouldn't the diffuse (but not the specular of course) term in each function be divided by pi (lambertian scale factor)?

apanteleev added a commit that referenced this issue Jul 17, 2019
- Random number generator now produces enough different sequences to cover 500 frames of reference mode accumulation;
- Hemisphere sampling for indirect diffuse tuned to make the results better match the cosine-weighted sampling in reference mode;
- Direct diffuse lighting is now correctly divided by PI and matches same lighting computed by random sampling of emissive surfaces;
- Spotlight term removed from sky polygons and added to other non-analytic emissive surfaces;
- Suspicious 0.5 term removed from sky polygons.
Also the `pt_direct_polygon_lights` cvar has a new meaning when set to -1: all polygon lights are sampled through the indirect lighting shader, for comparison purposes.
Inspired by #37
@apanteleev
Copy link
Collaborator

Thank you for reporting this issue!
I started digging into it and fixed a few other things as well, see the commits mentioned above.
Dose anything else look wrong now?

@DU-jdto
Copy link
Author

DU-jdto commented Jul 18, 2019

On line 180 of indirect_lighting.rgen, is_analytic_light shouldn't have the (spec_bounce_index == 0) condition. Without that condition, the global_ubo.pt_direct_polygon_lights condition should instead check global_ubo.pt_indirect_polygon_lights on the second bounce. Also, the final condition should be global_ubo.pt_xxxx_polygon_lights > 0, not >= 0.

With the asvgf_seed_rng.comp change, frame x checkerboard 1 will have the same rng as frame (x+1) checkerboard 0, which I think will produce duplicate samples when the scene is stationary. To avoid this issue, it should be frame_num * 2 + checkerboard, which is equivalent to the old code.

apanteleev added a commit that referenced this issue Jul 18, 2019
…on-default settings for `pt_direct_polygon_lights` and `pt_indirect_polygon_lights`.

#37
@apanteleev
Copy link
Collaborator

Makes sense - fixed but with slightly different interpretation of pt_xxxx_polygon_lights.

The RNG thing is not really a problem because although one checkerboard field will use the same sequence as the other on the previous frame, they process different sets of pixels, so overall there is no undersampling (I think). I don't see any image quality difference in real-time rendering mode, and this new behavior is definitely better for the reference accumulation mode because it generates 512 unique sequences over time instead of just 256.

@SkacikPL
Copy link

I'm not 100% sure here but i've been implementing recent changes in my fork and i think i see a definitive increase in temporal ghosting in indoor areas:

https://streamable.com/qfymi

Of course i might've screwed something on my part so it's worth checking out on your end.

@apanteleev
Copy link
Collaborator

Yes, with the new changes materials behave slightly differently, and you see more specular reflections. And the specular denoiser is not great here - just a temporal filter. The ghosting was not so visible before because the signal was just too dim compared to diffuse lighting. I'll try to come up with a solution for the ghosting / noise, and some materials need roughness tweaking too. You should probably not merge these changes until it's resolved.
(you can verify it's the specular channel by setting flt_scale_spec to 0)

@SkacikPL
Copy link

(you can verify it's the specular channel by setting flt_scale_spec to 0)

Yeah that seems to be the case.

@DU-jdto
Copy link
Author

DU-jdto commented Jul 18, 2019

Makes sense - fixed but with slightly different interpretation of pt_xxxx_polygon_lights.

The RNG thing is not really a problem because although one checkerboard field will use the same sequence as the other on the previous frame, they process different sets of pixels, so overall there is no undersampling (I think).

Are you sure? The two checkerboard fields alternate between odd and even pixels on a frame by frame basis, so I would think there would be overlap. For instance, if my understanding is correct frame 0 checkerboard 0 position (0, 0) would correspond to pixel (0, 0) in the full frame while frame 0 checkerboard 1 position (0, 0) would correspond to pixel (1, 0), but frame 1 checkerboard 0 position (0, 0) would correspond to pixel (1, 0) and frame 1 checkerboard 1 position (0, 0) would correspond to pixel (0, 0).

EDIT: In my local fork, I fixed the issue of getting unique rng by keeping lines 55 and 56 unchanged but changing lines 53 and 54 to:

	rng_seed |= (uint(ipos.x + frame_num / (NUM_BLUE_NOISE_TEX / 2)) % BLUE_NOISE_RES) <<  0u;
	rng_seed |= (uint(ipos.y + frame_num / (NUM_BLUE_NOISE_TEX / 2)) % BLUE_NOISE_RES) << 10u;

For a while now I've actually had the reference mode accumulating 8192 samples, and it seems to work fine.

apanteleev added a commit that referenced this issue Jul 20, 2019
- Added cvar `pt_accumulation_rendering_framenum` to control how many frames to accumulate;
- Added offsets to the RNG seed X and Y components to avoid repeating the random sequence after 512 frames.
Inspired by SkacikPL@63bcfe2 and #37
@apanteleev
Copy link
Collaborator

Are you sure? The two checkerboard fields alternate between odd and even pixels...

Yes they alternate, but the RNG seed was computed without the alternation...
In any case, the alternation is now removed from the real-time rendering mode because the result looks better overall, less noisy and sharper - see 2700c59.

For accumulation rendering, I implemented a version of your coordinate adjustment but with a change to remove the obvious diagonal noise patterns that appeared after a couple thousand frames.

Thanks again!

@apanteleev
Copy link
Collaborator

@SkacikPL regarding the specular noise/ghosting - see commit 96d70a6 : it's still a half measure, but that particular area looks much better now.

@SkacikPL
Copy link

SkacikPL commented Jul 20, 2019

Yeah, i just checked it out and it is better. Not exactly "perfectly playable" but also not entirely unplayable like it used to be.

I'm also glad some of my ideas managed to get on board too.

On a semi related note, i was experimenting with automating accumulation rendering for demos to achieve a prerendering functionality.
https://youtu.be/1ZIgTOwng3U

I'm not sure how useful it would be to a general user base but the concept isn't hard to implement.
I basically added cl_renderdemo cvar which determines whether demo should be rendered upon playback and cl_renderdemo_fps cvar to determine timestep between each frame. Then in CL_UpdateFrameTimes i added separate sync type on the bottom

	if (cl_renderdemo->integer && cls.demo.playback)
	{
		main_msec = fps_to_msec(cl_renderdemo_fps->integer);
		sync_mode = SYNC_FULL;
	}

Then in CL_Frame() directly under sync switches i added

	if (cls.demo.playback && cl_renderdemo->integer && cl_paused->integer != 2)
		main_extra = main_msec;

And set client time to tick only when unpaused

    if (!sv_paused->integer && !(cls.demo.playback && cl_renderdemo->integer && cl_paused->integer == 2)) {
        cl.time += main_extra;

Lastly in if(phys_frame) i added

	if (cls.demo.playback && cl_renderdemo->integer && cl_paused->integer != 2)
	{
		Cvar_Set("cl_paused", "2");
		CL_CheckForPause();
	}

This ensures fixed time step if demo is played with cl_renderdemo 1 and each frame is paused.

in vkpt\main.c i added

void stbi_writex(void *context, void *data, int size)
{
	FS_Write(data, size, (qhandle_t)(size_t)context);
}

#define IMG_SAVE(x) \
    static qerror_t IMG_Save##x(qhandle_t f, const char *filename, \
        byte *pic, int width, int height, int row_stride, int param)

IMG_SAVE(PNG)
{
	stbi_flip_vertically_on_write(1);
	int ret = stbi_write_png_to_func(stbi_writex, (void*)(size_t)f, width, height, 3, pic, row_stride);

	if (ret)
		return Q_ERR_SUCCESS;

	return Q_ERR_LIBRARY_ERROR;
}

static qhandle_t create_framedump(char *buffer, size_t size,
	const char *name, const char *ext)
{
	qhandle_t f;
	qerror_t ret;
	int i;

	if (name && *name) {
		// save to user supplied name
		return FS_EasyOpenFile(buffer, size, FS_MODE_WRITE,
			"screenshots/", name, ext);
	}

	// find a file name to save it to
	for (i = 0; i < 1000000; i++) {
		Q_snprintf(buffer, size, "screenshots/%s_%03d%s", cls.demo.file_name, i, ext);
		ret = FS_FOpenFile(buffer, &f, FS_MODE_WRITE | FS_FLAG_EXCL);
		if (f) {
			return f;
		}
		if (ret != Q_ERR_EXIST) {
			Com_EPrintf("Couldn't exclusively open %s for writing: %s\n",
				buffer, Q_ErrorString(ret));
			return 0;
		}
	}

	Com_EPrintf("Ran out of frame indexes!.\n");
	return 0;
}

static qboolean make_framedump(const char *name, const char *ext,
	qerror_t(*save)(qhandle_t, const char *, byte *, int, int, int, int),
	int param)
{
	char        buffer[MAX_OSPATH];
	byte        *pixels;
	qerror_t    ret;
	qhandle_t   f;
	int         w, h, rowbytes;

	f = create_framedump(buffer, sizeof(buffer), name, ext);
	if (!f) {
		return;
	}

	pixels = IMG_ReadPixels(&w, &h, &rowbytes);
	ret = save(f, buffer, pixels, w, h, rowbytes, param);
	FS_FreeTempMem(pixels);

	FS_FCloseFile(f);

	if (ret < 0) {
		Com_EPrintf("Couldn't write %s: %s\n", buffer, Q_ErrorString(ret));
		return qfalse;
	}
	else {
		return qtrue;
	}
}

And my entire evaluate_reference_mode looks like so

static void
evaluate_reference_mode(reference_mode_t* ref_mode)
{
	if (cl_paused->integer == 2 && cvar_pt_accumulation_rendering->integer > 0)
	{
		num_accumulated_frames++;

		const int num_warmup_frames = 5;
		const int num_frames_to_accumulate = cvar_pt_accumulation_rendering_framenum->integer;

		ref_mode->enable_accumulation = qtrue;
		ref_mode->enable_denoiser = qfalse;
		ref_mode->num_bounce_rays = 2;
		ref_mode->temporal_blend_factor = 1.0f / min(max(1, num_accumulated_frames - num_warmup_frames), num_frames_to_accumulate);

		switch (cvar_pt_accumulation_rendering->integer)
		{
		case 1: {
			float percentage = powf(max(0.f, (num_accumulated_frames - num_warmup_frames) / (float)num_frames_to_accumulate), 0.5f);
			if (percentage < 1.0f)
			{
				if (!cls.demo.playback)
				{
					char text[MAX_QPATH];
					Q_snprintf(text, sizeof(text), "Reference path tracing mode: accumulating samples... %d%%(%i)", (int)(min(1.f, percentage) * 100.f), num_accumulated_frames);

					int x = r_config.width / 4;
					int y = r_config.height / 4 - 50;
					R_SetScale(0.5f);
					R_SetColor(0xff000000u);
					SCR_DrawStringEx(x + 1, y + 1, UI_CENTER, MAX_QPATH, text, SCR_GetFont());
					R_SetColor(~0u);
					SCR_DrawStringEx(x, y, UI_CENTER, MAX_QPATH, text, SCR_GetFont());
					R_SetAlphaScale(1.f);
				}
			}
			else
			{
				SCR_SetHudAlpha(0.f);

				if (cl_renderdemo->integer)
				{
					qboolean result = make_framedump("", ".png", IMG_SavePNG, 0);

					if (result)
					{
						Cvar_Set("cl_paused", "0");
						CL_CheckForPause();

						num_accumulated_frames = 0;

						ref_mode->enable_accumulation = qfalse;
						ref_mode->enable_denoiser = !!cvar_flt_enable->integer;
						if (cvar_pt_num_bounce_rays->value == 0.5f)
							ref_mode->num_bounce_rays = 0.5f;
						else
							ref_mode->num_bounce_rays = max(0, min(2, round(cvar_pt_num_bounce_rays->value)));
						ref_mode->temporal_blend_factor = 0.f;
					}
					else
						CL_Disconnect(ERR_DISCONNECT);
				}
				break;
			}
		}
		case 2:
			SCR_SetHudAlpha(0.f);
			break;
		}
	}
	else
	{
		num_accumulated_frames = 0;

		ref_mode->enable_accumulation = qfalse;
		ref_mode->enable_denoiser = !!cvar_flt_enable->integer;
		if (cvar_pt_num_bounce_rays->value == 0.5f)
			ref_mode->num_bounce_rays = 0.5f;
		else
			ref_mode->num_bounce_rays = max(0, min(2, round(cvar_pt_num_bounce_rays->value)));
		ref_mode->temporal_blend_factor = 0.f;
	}
}

About half of that are dirty hacks which can probably be done much better but it gets the job done.
I managed to render a sample ~9 seconds of 3840x2160 video at 60 fps in about 5 hours on my 2070, where each frame had 300 frames worth of data.
Audio also has to be captured separately on normal demo run, whilst also maintaining target framerate but it's not too much hassle.

Aside that, it's pretty straightforward - just record any demo and play it back while cl_renderdemo is set to 1 and it will dump frames as demoname_XXX in screenshots folder.

@DU-jdto
Copy link
Author

DU-jdto commented Jul 21, 2019

Yes they alternate, but the RNG seed was computed without the alternation...

That's true, but it only matters for resolutions for which (resX / 2) % 256 != 0, due to the % BLUE_NOISE_RES. Consider again the case of position (0, 0) checkerboard 0 vs position (0, 0) checkerboard 1. In the rng seed texture, the former maps to position (0, 0) while the latter maps to position (resX / 2, 0). Say the resolution is 2560x1440. Then (ipos.x % BLUE_NOISE_RES) will be 0 for the former case, but also 0 for the latter case (2560/2 % 256 == 0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants