Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

red-knot: improve internal documentation in module.rs #11638

Merged
merged 3 commits into from
May 31, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 32 additions & 2 deletions crates/red_knot/src/module.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,31 +13,48 @@ use crate::symbols::Dependency;
use crate::FxDashMap;

/// ID uniquely identifying a module.
///
/// The advantage of using this newtype to identify modules over using paths
/// is that instances are cheap and easily copied, avoiding many potential
/// problems that could be caused by lifetimes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason is that it is a specific concept that is worth its own name and it also stores additional data that isn't available on a path (except if you mean ModulePath)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. At first I thought this struct was really a ModuleID struct because it seemed to be a thin wrapper around a primitive type. But now I see that the primitive type really is the ID for the module, and that the struct is correctly named. Which means that this comment is not quite accurate!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but that isn't quite right either, I see? Since the modules field on the ModuleResolver struct maps Module objects to ModuleData objects, and its the ModuleData objects that seem to hold the interesting information about the module.

In general I feel slightly confused about whether this is meant to represent:

  • a struct that can be queried directly to obtain interesting information about a module (this is what would be implied by the name Module to me); or,
  • a cheap ID that can be used to easily retrieve some other object that can be queried directly to obtain interesting information about a module (similar to FileID elsewhere in redknot, or to BindingID in ruff_python_semantic/binding.rs).

Currently it feels like it maybe sits awkwardly between the two concepts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just remove this comment and try to refactor this into something I like more ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way module is defined is very intentional. Not necessarily the fields it stores and its name but that it is a thin wrapper around a u32 with methods to query its fields. From an API perspective, this is the Module and an outside client doesn't need to be concerned about how it stores the data internally.

The reason why we shouldn't change this representation significantly is because this exactly maps to Salsa. You can have a look at my PR and you'll find something very similar:

  • ResolvedModule is a newtype wrapper around a u32. This is important for fast cache lookups and to avoid awkward lifetimes. It also ensures that we can store a ResolvedModule very cheaply, it's just 4 bytes.
  • It has accessor functions that take db as an argument and return the field data

We can't change this without breaking Salsa compatibility. Ideally, you try to keep the API roughly unchanged (or use something similar to the salsa PR). But feel free to restructure the internal data structures however you want.

Copy link
Member Author

@AlexWaygood AlexWaygood May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was confused by the names of variables such as this in module.rs, which seemed to imply to me that Module instances did not in fact represent modules -- that they only represented module IDs that could be used to lookup modules, and that it was in fact ModuleData instances that represented modules. Here the id variable has type Module, and the module variable has type ModuleData:

fn remove_module_by_id(&mut self, id: Module) -> Arc<ModuleData> {
let (_, module) = self.modules.remove(&id).unwrap();
self.by_name.remove(&module.name).unwrap();
// It's possible that multiple paths map to the same id. Search all other paths referencing the same module id.
self.by_file.retain(|_, current_id| *current_id != id);
module
}

I'll create another PR to try to clarify things further in the internal documentation.

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
pub struct Module(u32);

impl Module {
/// Return the absolute name of the module (e.g. `foo.bar`)
pub fn name(&self, db: &dyn SemanticDb) -> QueryResult<ModuleName> {
let jar: &SemanticJar = db.jar()?;
let modules = &jar.module_resolver;

Ok(modules.modules.get(self).unwrap().name.clone())
}

/// Return the path to the source code that defines this module
pub fn path(&self, db: &dyn SemanticDb) -> QueryResult<ModulePath> {
let jar: &SemanticJar = db.jar()?;
let modules = &jar.module_resolver;

Ok(modules.modules.get(self).unwrap().path.clone())
}

/// Determine whether this module is a single-file module or a package
pub fn kind(&self, db: &dyn SemanticDb) -> QueryResult<ModuleKind> {
let jar: &SemanticJar = db.jar()?;
let modules = &jar.module_resolver;

Ok(modules.modules.get(self).unwrap().kind)
}

/// Attempt to resolve a dependency of this module to an absolute [`ModuleName`].
///
/// A dependency could be either absolute (e.g. the `foo` dependency implied by `from foo import bar`)
/// or relative to this module (e.g. the `.foo` dependency implied by `from .foo import bar`)
///
/// - Returns an error if the query failed.
/// - Returns `Ok(None)` if the query succeeded,
/// but the dependency refers to a module that does not exist.
/// - Returns `Ok(Some(ModuleName))` if the query succeeded,
/// and the dependency refers to a module that exists.
pub fn resolve_dependency(
&self,
db: &dyn SemanticDb,
Expand Down Expand Up @@ -124,10 +141,13 @@ impl ModuleName {
Some(Self(name))
}

/// An iterator over the components of the module name:
/// `foo.bar.baz` -> `foo`, `bar`, `baz`
pub fn components(&self) -> impl DoubleEndedIterator<Item = &str> {
self.0.split('.')
}

/// The name of this module's immediate parent, if it has a parent
pub fn parent(&self) -> Option<ModuleName> {
let (_, parent) = self.0.rsplit_once('.')?;

Expand Down Expand Up @@ -159,9 +179,10 @@ impl std::fmt::Display for ModuleName {

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
pub enum ModuleKind {
/// A single-file module (e.g. `foo.py` or `foo.pyi`)
Module,

/// A python package (a `__init__.py` or `__init__.pyi` file)
/// A python package (`foo/__init__.py` or `foo/__init__.pyi`)
Package,
}

Expand All @@ -181,10 +202,12 @@ impl ModuleSearchPath {
}
}

/// Determine whether this is a first-party, third-party or standard-library search path
pub fn kind(&self) -> ModuleSearchPathKind {
self.inner.kind
}

/// Return the location of the search path on the file system
pub fn path(&self) -> &Path {
&self.inner.path
}
Expand Down Expand Up @@ -459,6 +482,7 @@ impl ModuleResolver {
}
}

/// Remove a module from the inner cache
pub(crate) fn remove_module(&mut self, file_id: FileId) {
// No locking is required because we're holding a mutable reference to `self`.
let Some((_, id)) = self.by_file.remove(&file_id) else {
Expand Down Expand Up @@ -505,15 +529,19 @@ impl ModulePath {
Self { root, file_id }
}

/// The search path that was used to locate the module
pub fn root(&self) -> &ModuleSearchPath {
&self.root
}

/// The file containing the source code for the module
pub fn file(&self) -> FileId {
self.file_id
}
}

/// Given a module name and a list of search paths in which to lookup modules,
/// attempt to resolve the module name
fn resolve_name(
name: &ModuleName,
search_paths: &[ModuleSearchPath],
Expand Down Expand Up @@ -635,7 +663,9 @@ enum PackageKind {
/// A root package or module. E.g. `foo` in `foo.bar.baz` or just `foo`.
Root,

/// A regular sub-package where the parent contains an `__init__.py`. For example `bar` in `foo.bar` when the `foo` directory contains an `__init__.py`.
/// A regular sub-package where the parent contains an `__init__.py`.
///
/// For example, `bar` in `foo.bar` when the `foo` directory contains an `__init__.py`.
Regular,

/// A sub-package in a namespace package. A namespace package is a package without an `__init__.py`.
Expand Down
Loading