Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass arguments to HTTP handler (httpz.Action) #89

Closed
lucasrod16 opened this issue Jan 20, 2025 · 5 comments
Closed

How to pass arguments to HTTP handler (httpz.Action) #89

lucasrod16 opened this issue Jan 20, 2025 · 5 comments

Comments

@lucasrod16
Copy link

lucasrod16 commented Jan 20, 2025

First, I want to say thanks for creating this library. I'm new to Zig and this library has made it much easier to create an HTTP server in Zig.

I have an HTTP handler that needs an instance of a struct passed to it so that it can call a method on that struct.

In this particular code, I need to pass the cache instance to the getRepos handler:

const std = @import("std");
const httpz = @import("httpz");
const handler = @import("handler.zig");
const Cache = @import("cache.zig").Cache;

var server: httpz.ServerCtx(void, void) = undefined;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    var cache = Cache.init(allocator);
    defer cache.deinit();
    try cache.fetchRepoData();

    server = try httpz.Server().init(allocator, .{ .port = 8080 });
    defer server.deinit();

    try std.posix.sigaction(std.posix.SIG.INT, &.{
        .handler = .{ .handler = shutdown },
        .mask = std.posix.empty_sigset,
        .flags = 0,
    }, null);

    try std.posix.sigaction(std.posix.SIG.TERM, &.{
        .handler = .{ .handler = shutdown },
        .mask = std.posix.empty_sigset,
        .flags = 0,
    }, null);

    server.notFound(notFound);

    var router = server.router();
    router.get("/repos", handler.getRepos);

    std.log.info("API server listening on port 8080\n", .{});
    try server.listen();
}

fn notFound(_: *httpz.Request, res: *httpz.Response) !void {
    res.status = 404;
    res.body = "Not found\n";
}

fn shutdown(_: c_int) callconv(.C) void {
    server.stop();
    std.log.info("\nServer gracefully shutdown\n", .{});
}

so that the handler can call cache.get():

const std = @import("std");
const httpz = @import("httpz");
const Cache = @import("cache.zig").Cache;

pub fn getRepos(cache: Cache, req: *httpz.Request, res: *httpz.Response) !void {
    if (req.method != .GET and req.method != .HEAD) {
        res.status = 405;
        return;
    }

    const data = cache.get().data;
    const timestamp = cache.get().timestamp;

    if (data.len == 0) {
        res.status = 500;
        return;
    }

    res.content_type = httpz.ContentType.JSON;
    res.header("Last-Modified", timestamp);

    if (req.method == .HEAD) {
        res.status = 200;
        return;
    }

    res.status = 200;
    res.body = data;
}

I tried using a function closure approach:

const std = @import("std");
const httpz = @import("httpz");
const Cache = @import("cache.zig").Cache;

pub fn getRepos(cache: Cache) httpz.Action(void) {
    return struct {
        fn handle(req: *httpz.Request, res: *httpz.Response) !void {
            if (req.method != .GET and req.method != .HEAD) {
                res.status = 405;
                return;
            }

            const data = cache.get().data;
            const timestamp = cache.get().timestamp;

            if (data.len == 0) {
                res.status = 500;
                return;
            }

            res.setHeader("Content-Type", "application/json");
            res.setHeader("Last-Modified", timestamp);

            if (req.method == .HEAD) {
                res.status = 200;
                return;
            }

            res.status = 200;
            res.body = data;
        }
    }.handle;
}

but it's my understanding that Zig does not support closures.

If there are any recommendations on how to achieve this it would be much appreciated.

ziglang/zig#229

Image
@karlseguin
Copy link
Owner

I assume you're using the 0.13 branch.

var server = try httpz.ServerApp(*Cache).init(allocator, .{}, &cache);
// same as before

...

fn getRepos(cache: *Cache, _: *httpz.Request, res: *httpz.Response) !void {
  // same as before, use cache
}

"Cache" is very specific. You might want to create a more generic wrapper, like an "App" or a "Context" which contains your cache, so that if/when you have more things you want your actions to have access to (like a DB connection pool)...but that's up to you.

The API has changed a bit in 0.14/master. It's the same idea, but there's no longe a separate httpz.Server() and httpz.ServerApp() (and httpz.ServerCtx())...there's just a httpz.Server(T). Plus, I think the readme is better for the newer versions/API.

@lucasrod16
Copy link
Author

@karlseguin that's super helpful, thanks! I updated to use 0.14/master for both httpz and zig and changed to:

var server = try httpz.Server(*Cache).init(allocator, .{ .port = 8080 }, &cache);

Works great 👍

Cache" is very specific. You might want to create a more generic wrapper, like an "App" or a "Context" which contains your cache, so that if/when you have more things you want your actions to have access to (like a DB connection pool)...but that's up to you.

Great point—thank you!

I appreciate the help and feedback

@karlseguin
Copy link
Owner

One more note. In your example above, res.body = data seems dangerous.

I don't know how your cache is implemented, but data has to remain valid until after getRepos returns. If your cache is multi-threaded, and if it's possible for another thread to invalidate that data, you're going to have problems.

One solution is to call try res.write(); within the action. This will write res.body to the socket immediately. But that only solves the lifetime issue, if there's any concurrency issue (like some background task that shrinks the cache when its full, you'll still have problems).

Even if you took the extreme option of clone the data using res.arena, you could still have concurrency issues if some other thread can change / delete that cache entries' payload (and I wouldn't recommend that approach anyways).

In cache.zig I use a ARC to protect entries, which is why the entry you get from a cache.get has to be "released" when you're done with it.

@lucasrod16
Copy link
Author

Calling my implementation a "cache" is likely not the most accurate term. It is simply a struct with a data: []const u8 field that I use to store JSON data in. The "cache" (the data field) is only written to once on server startup, and updated/overwritten with new JSON data on a schedule by a background job. It is JSON data fetched from a Google Cloud storage bucket. The goal is for the backend to be able to serve the data to clients quickly without having to reach out over the network to fetch the data.

const std = @import("std");
const zdt = @import("zig-datetime");
const Datetime = zdt.datetime.Datetime;

pub const Cache = struct {
    data: []const u8,
    timestamp: []const u8,
    allocator: std.mem.Allocator,
    // response_body is used for the sole purpose of freeing memory in deinit().
    response_body: []const u8,

    pub fn init(allocator: std.mem.Allocator) Cache {
        return Cache{
            .data = &[_]u8{},
            .timestamp = &[_]u8{},
            .allocator = allocator,
            .response_body = &[_]u8{},
        };
    }

    pub fn deinit(self: *Cache) void {
        self.allocator.free(self.response_body);
        self.allocator.free(self.timestamp);
    }

    pub fn get(self: Cache) Cache {
        return Cache{
            .data = self.data,
            .timestamp = self.timestamp,
            .allocator = self.allocator,
            .response_body = self.response_body,
        };
    }

    pub fn set(self: *Cache, data: []const u8) !void {
        self.data = data;
        self.timestamp = try Datetime.now().shiftTimezone(&zdt.timezones.GMT).formatHttp(self.allocator);
    }

    pub fn fetchRepoData(self: *Cache) !void {
        var client = std.http.Client{
            .allocator = self.allocator,
        };
        defer client.deinit();

        const uri = try std.Uri.parse("https://storage.googleapis.com/storage/v1/b/lucasrod16-github-data/o/data.json?alt=media");
        const buf = try self.allocator.alloc(u8, 1024 * 1024 * 4);
        defer self.allocator.free(buf);
        var req = try client.open(.GET, uri, .{
            .server_header_buffer = buf,
        });
        defer req.deinit();

        try req.send();
        try req.finish();
        try req.wait();

        std.debug.assert(req.response.status == std.http.Status.ok);

        var rdr = req.reader();
        const body = try rdr.readAllAlloc(self.allocator, 1024 * 1024 * 4);
        // Don't free the response body immediately.
        // Store it in the cache and cleanup in deinit().
        self.response_body = body;

        try self.set(body);
        std.log.info("Successfully loaded GitHub data from GCS bucket into the in-memory cache.", .{});
    }
};

I was experiencing an issue where I was freeing the response body in fetchRepoData, which caused the server to crash when that data is accessed with cache.get() because it was already freed from memory, so I added a response_body field to the Cache struct and store it there and clean up later in cache.deinit(). It works for now, but it definitely feels a bit hacky and confusing. I am brand new to managing memory myself and Zig, so I would be more than happy to receive feedback on how to write more robust and idomatic programs.

@karlseguin
Copy link
Owner

Not clear why you have both a data and response_body. As far as I can tell, they reference the same thing. This memory leaks. You free the current value on deinit, but if fetchRepoData gets called more than once, you're only ever freeing the last data allocation. When fetchRepoData gets called, you're overwriting data and response_body, not freeing that allocation, and losing any reference to it (which is what makes it a leak, as soon as self.data = data; is called, you've lost all references to that memory and will never be able to free it).

Also, the code isn't thread safe. fetchRepoData is called in one thread, and mutates self (it writes to data, timestamp and response_body). While another thread could be reading this in getRepos. This isn't specific to manual memory management, and the code would be considered thread-unsafe in any threaded language (Go, Java, ...).

If it was just the memory leak, I'd say deal with that (i.e. release the previous response before setting it). Or, if it was just the thread-safety issue, I'd say deal with that (i.e. use a mutex). But you're dealing with an intersection of the two. You can't just release the previous response_body because it might be being used to write the HTTP response in another thread. As a first step, to get things correct, I'd add a std.Thread.Mutex to the Cache. I'd lock the mutex in fetchRepoData and getRepos (lock the entire functions, for now). I'd then free up the memory leaks - you have to free self.data before you overwrite it with a new value.

This should be "correct". But the broad lock isn't great for performance/concurrency. You can reduce the lock in fetchRepoData to after you've made the request and read the the body into the local - at least you wouldn't be locking over the HTTP call. After that though, the options become more complicated. I think the most common solution is to use some type of automatic-reference-counting (ARC). For example, std::sync::Arc is something you'd see often in Rust concurrent code. I've written a little about it before.

As an aside, I'm not sure it makes sense to assert on a server response status. It's a bit unusual to assert something you have no control over: Google's is free to return a 500. If you want to exit the program on a non-OK result, I think it's better to be explicit about it than relying on an assert (which will be removed in releasesmall/releasefast builds, so your code behaves quite differently in the face of an API error depending on how its built).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants