feat: migrate to ESM #12

CMCDragonkai · 2023-08-13T06:19:14Z

Description

Migrating to ESM.

We're switching to using the internal node implementation for worker threads. This will also address #16 .

Issues Fixed

Related Migrate to ESM (and get dynamic await) TypeScript-Demo-Lib#32
Fixes: Replace threads with internal implementation #16
REF ENG-241

Tasks

1. cut out threads and replace it with node:worker_threads.
2. Implement a simple WorkerPool for managing workers and scheduling tasks.
3. Implement utilities for enforcing types on calling worker functions and setting up a worker.
4. Try and work out a way to fully inline a worker script when starting a worker without having to use the scripts file path.
5. Complete conversion to ESM

Final checklist

ghost · 2023-08-13T06:20:01Z

👇 Click on the image for a new way to code review

Legend

CMCDragonkai · 2023-08-13T06:27:12Z

Not sure if we need to fork this: andywer/threads.js#470

CMCDragonkai · 2023-08-13T06:30:34Z

Might be able to use:

Use a Package Alias: Some package managers allow aliasing a package to a local directory or version. You could then modify the local copy to your needs.

To bypass it.

Now I think this would require a special "imports" alias and combined with tsconfig paths to hack around the incorrect "exports" key. Better than forking and maintaining it. Although if it could be done, that would be great.

CMCDragonkai · 2023-08-14T06:41:46Z

Actually I think I need to do a git submodule. That would be the easiest.

CMCDragonkai · 2023-08-14T06:52:21Z

Ok so trying to use a git submodule can be complicated due to the lack of dependencies being acquired under src/threads.js. And potentially requiring a different set of compilation tools.

Going back to attempting with an import path.

CMCDragonkai · 2023-08-14T07:07:02Z

Actually even subpath imports does not work because the node_modules wouldn't exist at the relevant location.

The only solution now is to either entirely fork the project or just provide overrides on the types, meaning we type out what ModuleMethods is likely to be.

The problem is none of the types work anymore.

QueuedTask
ModuleThread
ModuleMethods

Because of errors in how threads.js exposes the types.

So we have to define all these types.

CMCDragonkai · 2023-08-14T07:45:31Z

Ok after setting the overriding the types to any, we still end up with a problem. Attempting to create a worker from a file written in TS requires ts-node. So threads.js is basically needing ts-node to execute the file.

type ModuleMethods = {
  [methodName: string]: (...args: any) => any;
};
type ModuleThread<Methods = any> = any;
type QueuedTask<ThreadType, Return> = any;

export type {
  ModuleMethods,
  ModuleThread,
  QueuedTask
};

Of course if I change to just using regular js, it can load the .js file without any transpilation thus avoiding the ts-node requirement.

But then it's not possible to do import threads from 'threads';... probably because it's now running them like CJS code? I'm not entirely sure.

import * as threads from 'threads';

const { Transfer, isWorkerRuntime } = threads;

Is necessary to actually get the constructs necessary, but the workers no longer have any corresponding types.

I think though one could use annotations.

But generally speaking, it's just not a good idea to use typescript based workers atm since we shouldn't be tied to ts-node anyway (even if it is only during development), because after compilation it would all be JS files anyway.

For some reason isWorkerRuntime no longer exists either.

All in all, I don't think as of now js-workers can be converted to ESM simply because threads.js is just not properly exporting its things. And needing to convert to using .ts workers not great either, although that is necessity on ts-node.

I might have to be forced to keep js-workers as CJS, and just import CJS to ESM by doing the trick by importing the default, then pattern matching out of it.

CMCDragonkai · 2023-08-14T07:45:56Z

Going to try keeping js-workers as CJS, and long term wise look into removing threadsjs and favour of something in our flavour.

Can see https://github.com/piscinajs/piscina for inspiration.

CMCDragonkai · 2023-09-13T04:02:31Z

I think we just remove browser support for the moment, and focus on nodejs worker threads, similar to our project in js-ws, and then slowly add back in webworker (browser support) afterwards. This can radically simplify this project and give us ESM support too. This could be assigned to @addievo.

CMCDragonkai · 2023-09-26T13:09:01Z

Going over the https://nodejs.org/api/worker_threads.html shows that the worker threads implementation will be quite complex. Here's a brief overview of things that need to be considered:

How MessagePort works - this is basically the communication mechanism between the parent thread and all the worker threads. You have to use to communicate what functions you want to execute, as well as all the results of execution. Remember that the worker threads are like mini-servers, receiving messages asynchronously and handling them. Because execution is potentially asynchronous, you also have to asynchronously manage the results and to send back the results. You have message passing API between the main thread and worker threads.
The creation of a worker involves using the new Worker that is provided by node:worker_threads. This call creates a thread with an existing nodejs runtime. Worker threads are real threads so they do share memory, but access is transferred either by copying or ownership. There's also a SharedArrayBuffer which is really mutable multithreaded buffer, but this no longer easily used in browsers anyway, so transferrable arraybuffers is easier to work with. (Note that in the case of js-quic, if we were using node threads, shared array buffers would work, or we would at the very least need to be able to transfer to a worker and transfer back out).
The code of a worker thread is ESM based with ESM nodejs. So you are passing a file path or a URL, and it's possible that node understands the file path to be ESM native, or understands the URL to actually embed the worker code. There's no native support for TS, any TS should be precompiled to JS, but this does impact the new Worker() file path, which might need to load the .js version. It's possible to use some interfaces types to expose typesafe functionality.
We should be able to take advantage of the latest nodejs capabilities... but also have the common denominator with WebWorker.
There's also a broadcast system that can enable one to many communication.
There may need to be asynchronous initialisation on the worker threads. Generally they can start immediately receiving messages on the message port, however we may need to do any async setup in the worker first. One could imagine a "worker" script hooks like how threads.js has done it, and enable the ability to pass in some async setup code that needs to be done.
Since worker threads are just nodejs runtimes, you can just run arbitrary code, but it is easier to understand how to do this if instead the workers exposes a flat record of function calls to call. The problem with allowing arbitrary function calls is the problem of serialising closures, and this is not a solved problem atm, so instead of trying to do this (I know this was complicated in Haskell), we just say that workers must expose a fixed set of operations, and instead data can be transferred over, and you'd have to mark certain things as transferrable otherwise by default things get copied over (when serialised).
There's alot of edgecases that threads.js covers right now, with webpack bundling, and even electron usage where things are bundled into an .asar file.

Point is, fixing up this worker ecosystem is extremely complicated. The threads.js code is actually complex and difficult to untangle. The fastest solution right now is for upstream to fix their type exports so we can just continue using it... Without which ESM migration won't really work for us. Unless we just switch to using piscena.

This would be significant undertaking - estimated work would have to be 2 - 4 months to build a robust worker system that abides by the rest of PK's principles (I'm comparing it to how complicated js-quic became, but it should be simpler). Will need to schedule this for later after testnet 7.

CMCDragonkai · 2025-02-24T23:09:54Z

As per #16, we're not going to immediately migrate to ESM. Instead, we need to work on #16 to build out our own thread pool implementation.

This is performance sensitive. So I vote for 2 entry points - Rust/C++ and JS level.

Rust based entrypoint would be more flexible as we are moving towards all native libraries being written in Rust, and it would be easier to integrate into js-quic and js-db.

JS level entrypoint means the threadpool is also usable by the any parallel processing required by JS.

This would also mean that our threadpool doesn't abide by Web Workers. However we can follow the spec of Web Workers (in terms of API) and satisfy the interface type-wise, even if implementation wouldn't be using node's own worker threads.

This would also mean our worker threads are outside libuv threading (which is traditionally used by the IO system in NodeJS), but that's also ok. There's some limitations in that libuv threadpool anyway and it was designed for IO specifically, whereas ours should work for compute parallelism too.

Some testing would be important to understand whether js-quic should use libuv threading or integrate into this rust threadpool.

Make use of benchmarks here early in order to get continuous benching.

CMCDragonkai · 2025-02-24T23:11:36Z

@tegefaulkes Take over this PR and update spec to target #16 and MatrixAI/TypeScript-Demo-Lib/issues/32.

tegefaulkes · 2025-02-26T01:14:27Z

I've re-based on staging.

tegefaulkes · 2025-02-26T01:39:03Z

After doing some prototyping with node:worker_threads I can see a clear path forward. The following need to be done to complete this PR.

cut out threads and replace it with node:worker_threads.
Implement a simple WorkerPool for managing workers and scheduling tasks.
Implement utilities for enforcing types on calling worker functions and setting up a worker.
Try and work out a way to fully inline a worker script when starting a worker without having to use the scripts file path.
Complete conversion to ESM

The node implementaion of workers is pretty simple. You start a worker using new Worker(scriptPath) and the worker communicates with the main thread by listening to messagePort events. It sends data back to the main thread by using the same message port. So the workers can be pretty basic or complex since we're given a fair amount of freedom there. We don't have an equivalent worker pool provided by the worker_threads. But I've already created a simple implementation for it.

As for enforcing types on making calls to the workers. The problem here is pretty similar to the RPC handles things. We have an interface that serialises data. Across this transition we loose the type enforcement so we need to re-apply types to the returned values. I think we can apply a similar solution here by providing a worker manifest which is an object of all the functions that can be called through a worker. We can then use this manifest as the worker code by calling a expose(manifest) utility within the worker. But also apply the types to the WorkerManager by deriving them from the manifest.

tegefaulkes · 2025-02-26T23:09:09Z

It's possible to inline the script as a string using new Worker('script code', {eval: true});. However the usefulness of this without using a bundler is kinda lacking.

To properly enforce types I need to construct an object as typescript code and import it as a type when creating the WorkerManager. This still needs to be solved but it should be possible to properly enforce the types this way. It will only work if we can import it but also load it as raw code.

This is where the bundler comes in. We can import the types directly for type enforcement, but then we can use a raw loader to import the same code as raw code. Then provide it to the worker as an evaluated string. This should fix all out problems with bundling and import paths since everything will be imported the normal way.

For example, the worker script will look something like this.

// This is an example worker script

import type { WorkerManifest } from '#types.js';
import { expose } from './expose.js';

const worker = {
  test: async (data: void) => {
    return 'hello world!';
  },
  add: async (data: { a: number; b: number }): Promise<number> => {
    console.log(data);
    return data.a + data.b;
  },
  sub: async (data: { a: number; b: number }): Promise<number> => {
    return data.a - data.b;
  },
  fac: async (data: number): Promise<number> => {
    let acc = 1;
    for (let i = 1; i < data; i++) {
      acc = acc * i;
    }
    return acc;
  },
} satisfies WorkerManifest;

expose(worker);

export default worker;

We can then create a worker factory with the following. Assuming that this file will still be compiled the normal way for us. There may still be some things to solve here.

// Import with the rawloader
import script from 'raw-loader!./script.ts'

const workerFactory: WorkerFactory = () => new Worker(script, { eval: true });

linear · 2025-03-06T00:46:01Z

ENG-241

tegefaulkes · 2025-03-07T03:44:45Z

This is pretty much done, I just need to do final review and clean up.

As for in-lining the import of the worker script and creating the worker that way. It should be possible but to do it properly requires the use of a bundler and custom loader. I can send a worker script using the script as a string and specifying eval: true however unless I want to read the script with fs.readFile() that isn't very useful on it's own.

The best solution is to have a bundler with a custom loader. Then we can import the worker directly to get it's manifest and types. But also import it as raw so we can create the worker with the raw code.

import workerManifest from './worker.ts';
import workerManifestRaw from './worker.ts';

workerFunction = () => new Worker(workerManifestRaw, {eval: true});

tegefaulkes · 2025-03-07T03:45:52Z

Misclicked and closed the PR. I've re-opened it.

…on to enable support for ESM

tegefaulkes · 2025-03-11T00:23:05Z

All done, merging.

tegefaulkes · 2025-03-12T00:55:38Z

Some notes for @shafiqihtsham based on what I had to fix up

When you have a large switch(type) block that has a lot of code going on. It's better to break each case up into a function that handles it.

switch (type) {
	case 'one': {
		/// some code here
		break;
	};
	case 'two': {
		/// some code here
		break;
	};
}

Should be broken up like

switch (type) {
	case 'one': return this.processOne();
	case 'two': return this.processTwo();
}

// and the protected functions are defined within the class.
protected processOne(): void { /* do one */};
protected processTwo(): void { /* do two */};

Furthermore when checking if something exist or is undefined always use thing == null or thing != null. Usually we want to use a === when comparing things but this is the once case where you'd use == or != to check.

When getting parameters of an object. It can be useful to name the parameters if we use it too much. For example

const thing = object.otherThing.thing;
// do something with think multiple times.

However, there isn't much point to it if you use thing only once. Here you are doing the following.

const data = row.data;  
const { gestaltId } = data;

// However data is only used once here and immedietly deconstructed. So it's kind of a pointless variable here. You should streamline it.

const { gestaltId } = row.data;

One of the advantages of breaking down the processing code into sub functions is that you can streamline some branching logic. Take the following example

protected processEventNodeUpdated(row: EventNodeUpdated): void {  
  const { nodeId, gestaltId } = row.data;  
  const gestaltMap = this.gestaltMapState.get(gestaltId);  
  if (gestaltMap != null) {  
    const node = gestaltMap.nodes.get(nodeId);  
    if (node != null) {  
      node.gestaltId = gestaltId;  
      this.gestaltMap$.next(this.gestaltMapState);  
      this.nodesService.editNode$.next(node);  
    }  
  }  
};

Since this is contained within its own function with a very clear goal. We can simplify the branching here and make it easier to read. See how we flatten the logic and make it a lot clearer what's happening?

protected processEventNodeUpdated(row: EventNodeUpdated): void {  
  const { nodeId, gestaltId } = row.data;  
  const gestaltMap = this.gestaltMapState.get(gestaltId);  
  if (gestaltMap == null) return;  
  const node = gestaltMap.nodes.get(nodeId);  
  if (node == null) return;  
  node.gestaltId = gestaltId;  
  this.gestaltMap$.next(this.gestaltMapState);  
  this.nodesService.editNode$.next(node);  
};

Something like this has too much happening on one line. It's best to break something like this up.

const edgeExists =  
  gestalt?.edges.whereRows(['from', 'to'], [nodeIdLow, nodeIdHigh])  
    .length > 0;

// Should become
const edges =  
  gestalt?.edges.whereRows(['from', 'to'], [nodeIdLow, nodeIdHigh])

if (edges.length > 0) // do thing;

All instances of arrays should be defined as Array<T> and not T[].

CMCDragonkai force-pushed the feature-esm branch from 40abc82 to db6d789 Compare August 13, 2023 06:20

CMCDragonkai mentioned this pull request Aug 13, 2023

Migrate to ESM (and get dynamic await) MatrixAI/TypeScript-Demo-Lib#32

Open

CMCDragonkai self-assigned this Aug 13, 2023

CMCDragonkai force-pushed the feature-esm branch from db6d789 to 303b926 Compare August 13, 2023 06:36

CMCDragonkai mentioned this pull request Aug 14, 2023

ESM Migration MatrixAI/js-db#68

Merged

9 tasks

CMCDragonkai mentioned this pull request Oct 17, 2023

Replace threads with internal implementation #16

Closed

CMCDragonkai mentioned this pull request May 12, 2024

Statically Linked version of PK CLI MatrixAI/Polykey-CLI#84

Open

CMCDragonkai assigned tegefaulkes and unassigned CMCDragonkai Feb 24, 2025

feat: migrate to ESM

b7f733a

tegefaulkes force-pushed the feature-esm branch from 09ee60c to 91161db Compare February 26, 2025 01:10

tegefaulkes marked this pull request as draft February 26, 2025 02:05

tegefaulkes closed this Mar 7, 2025

tegefaulkes reopened this Mar 7, 2025

tegefaulkes force-pushed the feature-esm branch from 6c78704 to df05712 Compare March 10, 2025 05:23

feat: conversion to using internal node:worker_threads implementati…

d5bf19d

…on to enable support for ESM

tegefaulkes force-pushed the feature-esm branch from 2cf5fd6 to d5bf19d Compare March 11, 2025 00:14

tegefaulkes marked this pull request as ready for review March 11, 2025 00:22

tegefaulkes merged commit f70493e into staging Mar 11, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate to ESM #12

feat: migrate to ESM #12

CMCDragonkai commented Aug 13, 2023 •

edited by tegefaulkes

Loading

ghost commented Aug 13, 2023 •

edited by ghost

Loading

Legend

CMCDragonkai commented Aug 13, 2023

CMCDragonkai commented Aug 13, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Sep 13, 2023

CMCDragonkai commented Sep 26, 2023

CMCDragonkai commented Feb 24, 2025

CMCDragonkai commented Feb 24, 2025 •

edited

Loading

tegefaulkes commented Feb 26, 2025

tegefaulkes commented Feb 26, 2025 •

edited

Loading

tegefaulkes commented Feb 26, 2025 •

edited

Loading

linear bot commented Mar 6, 2025

tegefaulkes commented Mar 7, 2025 •

edited

Loading

tegefaulkes commented Mar 7, 2025

tegefaulkes commented Mar 11, 2025

tegefaulkes commented Mar 12, 2025

feat: migrate to ESM #12

feat: migrate to ESM #12

Conversation

CMCDragonkai commented Aug 13, 2023 • edited by tegefaulkes Loading

Description

Issues Fixed

Tasks

Final checklist

ghost commented Aug 13, 2023 • edited by ghost Loading

Legend

CMCDragonkai commented Aug 13, 2023

CMCDragonkai commented Aug 13, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023 • edited Loading

CMCDragonkai commented Aug 14, 2023 • edited Loading

CMCDragonkai commented Sep 13, 2023

CMCDragonkai commented Sep 26, 2023

CMCDragonkai commented Feb 24, 2025

CMCDragonkai commented Feb 24, 2025 • edited Loading

tegefaulkes commented Feb 26, 2025

tegefaulkes commented Feb 26, 2025 • edited Loading

tegefaulkes commented Feb 26, 2025 • edited Loading

linear bot commented Mar 6, 2025

tegefaulkes commented Mar 7, 2025 • edited Loading

tegefaulkes commented Mar 7, 2025

tegefaulkes commented Mar 11, 2025

tegefaulkes commented Mar 12, 2025

CMCDragonkai commented Aug 13, 2023 •

edited by tegefaulkes

Loading

ghost commented Aug 13, 2023 •

edited by ghost

Loading

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Feb 24, 2025 •

edited

Loading

tegefaulkes commented Feb 26, 2025 •

edited

Loading

tegefaulkes commented Feb 26, 2025 •

edited

Loading

tegefaulkes commented Mar 7, 2025 •

edited

Loading