-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: include dependency graph use cases (#2684)
* feat(docs): add dependencies tutorial * refactor(docs): merge funding tutorials and add tabs * chore: update sidebar and indices * fix(docs): move to tabs in collection view * fix(docs): implement tabs on project tutorial
- Loading branch information
Showing
8 changed files
with
1,244 additions
and
367 deletions.
There are no files selected for viewing
337 changes: 186 additions & 151 deletions
337
apps/docs/docs/tutorials/collection-view.md → apps/docs/docs/tutorials/collection-view.mdx
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,396 @@ | ||
--- | ||
title: Map the Software Supply Chain | ||
sidebar_position: 3 | ||
--- | ||
|
||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
Trace the dependencies in a software bill of materials (SBOM) for a given repository and assign weights or other metrics to each node. New to OSO? Check out our [Getting Started guide](../get-started/index.md) to set up your BigQuery or API access. | ||
|
||
 | ||
|
||
## Getting Started | ||
|
||
Before running any analysis, you'll need to set up your environment: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
If you haven't already, subscribe to OSO public datasets in BigQuery by clicking the "Subscribe" button on our [Datasets page](../integrate/datasets/#oso-production-data-pipeline). | ||
|
||
You can run all queries in this guide directly in the [BigQuery console](https://console.cloud.google.com/bigquery). | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
Start your Python notebook with the following: | ||
|
||
```python | ||
from google.cloud import bigquery | ||
import pandas as pd | ||
import os | ||
|
||
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = # PATH TO YOUR CREDENTIALS JSON | ||
GCP_PROJECT = # YOUR GCP PROJECT NAME | ||
|
||
client = bigquery.Client(GCP_PROJECT) | ||
``` | ||
|
||
For more details on setting up Python notebooks, see our guide on [writing Python notebooks](../integrate/python-notebooks.md). | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
## Identify Repositories and Packages | ||
|
||
### Repository Metadata | ||
|
||
Get metadata and basic stats about a repository using OSO's indexed data: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select * | ||
from `oso_production.repositories_v0` | ||
where artifact_url = 'https://github.com/ethereum/go-ethereum' | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select * | ||
from `oso_production.repositories_v0` | ||
where artifact_url = 'https://github.com/ethereum/go-ethereum' | ||
""" | ||
df = client.query(query).to_dataframe() | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
### SBOMs (Package Dependencies) | ||
|
||
OSO uses GitHub's Software Bill of Materials (SBOMs) dataset to identify package dependencies. Note that this data doesn't differentiate between direct and indirect dependencies, but provides a good starting point for mapping the software supply chain: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select * | ||
from `oso_production.sboms_v0` | ||
where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI=' | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select * | ||
from `oso_production.sboms_v0` | ||
where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI=' | ||
""" | ||
df = client.query(query).to_dataframe() | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
### Package Maintainers | ||
|
||
OSO leverages [Open Source Insights (deps.dev)](https://deps.dev) data to identify the repo that maintains a given package. This covers approximately 90% of packages based on our testing: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select | ||
package_artifact_source, | ||
package_artifact_name, | ||
package_owner_project_id, | ||
package_owner_artifact_namespace, | ||
package_owner_artifact_name | ||
from `oso_production.package_owners_v0` | ||
where package_artifact_name = '@libp2p/echo' | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select | ||
package_artifact_source, | ||
package_artifact_name, | ||
package_owner_project_id, | ||
package_owner_artifact_namespace, | ||
package_owner_artifact_name | ||
from `oso_production.package_owners_v0` | ||
where package_artifact_name = '@libp2p/echo' | ||
""" | ||
df = client.query(query).to_dataframe() | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
### Build a Deep Funding Graph | ||
|
||
This example demonstrates how to create a dependency graph for a group of related repositories, such as the one used by [Deep Funding](https://deepfunding.org). The analysis maps relationships between key Ethereum repositories and their package dependencies: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select distinct | ||
sboms.from_artifact_namespace as seed_repo_owner, | ||
sboms.from_artifact_name as seed_repo_name, | ||
sboms.to_package_artifact_name as package_name, | ||
package_owners.package_owner_artifact_namespace as package_repo_owner, | ||
package_owners.package_owner_artifact_name as package_repo_name, | ||
sboms.to_package_artifact_source as package_source | ||
from `oso_production.sboms_v0` sboms | ||
join `oso_production.package_owners_v0` package_owners | ||
on | ||
sboms.to_package_artifact_name = package_owners.package_artifact_name | ||
and sboms.to_package_artifact_source = package_owners.package_artifact_source | ||
where | ||
sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP') | ||
and package_owners.package_owner_artifact_namespace is not null | ||
and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name) | ||
in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2', | ||
'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum', | ||
'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon', | ||
'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project', | ||
'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm', | ||
'eth-infinitism/account-abstraction','safe-global/safe-smart-account', | ||
'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo') | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select distinct | ||
sboms.from_artifact_namespace as seed_repo_owner, | ||
sboms.from_artifact_name as seed_repo_name, | ||
sboms.to_package_artifact_name as package_name, | ||
package_owners.package_owner_artifact_namespace as package_repo_owner, | ||
package_owners.package_owner_artifact_name as package_repo_name, | ||
sboms.to_package_artifact_source as package_source | ||
from `oso_production.sboms_v0` sboms | ||
join `oso_production.package_owners_v0` package_owners | ||
on | ||
sboms.to_package_artifact_name = package_owners.package_artifact_name | ||
and sboms.to_package_artifact_source = package_owners.package_artifact_source | ||
where | ||
sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP') | ||
and package_owners.package_owner_artifact_namespace is not null | ||
and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name) | ||
in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2', | ||
'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum', | ||
'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon', | ||
'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project', | ||
'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm', | ||
'eth-infinitism/account-abstraction','safe-global/safe-smart-account', | ||
'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo') | ||
""" | ||
df = client.query(query).to_dataframe() | ||
``` | ||
|
||
We can also go further and create a network graph from the data we've just fetched: | ||
|
||
```python | ||
import networkx as nx | ||
|
||
# turn each node into a GitHub URL | ||
gh = 'https://github.com/' | ||
df['seed_repo_url'] = df.apply(lambda x: f"{gh}{x['seed_repo_owner']}/{x['seed_repo_name']}", axis=1) | ||
df['package_repo_url'] = df.apply(lambda x: f"{gh}{x['package_repo_owner']}/{x['package_repo_name']}", axis=1) | ||
|
||
# Store in a Network Graph | ||
G = nx.DiGraph() | ||
|
||
for repo_url in df['seed_repo_url'].unique(): | ||
G.add_node(repo_url, level=1) | ||
|
||
for repo_url in df['package_repo_url'].unique(): | ||
if repo_url not in G.nodes: | ||
G.add_node(repo_url, level=2) | ||
|
||
for _, row in df.iterrows(): | ||
G.add_edge( | ||
row['seed_repo_url'], | ||
row['package_repo_url'], | ||
relation=row['package_source'] | ||
) | ||
|
||
# Placeholder for adding weights to the graph | ||
global_weight = 0 | ||
for u, v in G.edges: | ||
G[u][v]['weight'] = global_weight | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
For more examples of dependency analysis, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph). | ||
|
||
## Weight Nodes and Edges | ||
|
||
### Most Used Dependencies | ||
|
||
Find the most commonly used dependencies across all projects in OSO. This query joins package ownership data with SBOM data to count how many projects depend on each package: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select | ||
p.project_id, | ||
pkgs.package_artifact_source, | ||
pkgs.package_artifact_name, | ||
count(distinct sboms.from_project_id) as num_dependents | ||
from `oso_production.package_owners_v0` pkgs | ||
join `oso_production.sboms_v0` sboms | ||
on pkgs.package_artifact_name = sboms.to_package_artifact_name | ||
and pkgs.package_artifact_source = sboms.to_package_artifact_source | ||
join `oso_production.projects_v1` p | ||
on pkgs.package_owner_project_id = p.project_id | ||
where pkgs.package_owner_project_id is not null | ||
group by 1,2,3 | ||
order by 4 desc | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select | ||
p.project_id, | ||
pkgs.package_artifact_source, | ||
pkgs.package_artifact_name, | ||
count(distinct sboms.from_project_id) as num_dependents | ||
from `oso_production.package_owners_v0` pkgs | ||
join `oso_production.sboms_v0` sboms | ||
on pkgs.package_artifact_name = sboms.to_package_artifact_name | ||
and pkgs.package_artifact_source = sboms.to_package_artifact_source | ||
join `oso_production.projects_v1` p | ||
on pkgs.package_owner_project_id = p.project_id | ||
where pkgs.package_owner_project_id is not null | ||
group by 1,2,3 | ||
order by 4 desc | ||
""" | ||
df = client.query(query).to_dataframe() | ||
|
||
# Optional: Display top dependencies | ||
print("Top 10 most used dependencies:") | ||
print(df.head(10)) | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
### Downstream Impact | ||
|
||
This is an example of a more advanced analysis that demonstrates how to analyze relationships between onchain projects and their development dependencies: | ||
|
||
<Tabs> | ||
<TabItem value="sql" label="SQL"> | ||
|
||
```sql | ||
select | ||
onchain_projects.project_name as `onchain_builder`, | ||
onchain_metrics.event_source as `network`, | ||
onchain_metrics.address_count_90_days, | ||
onchain_metrics.gas_fees_sum_6_months, | ||
onchain_metrics.transaction_count_6_months as transactions_6_months, | ||
code_metrics.project_name as `dev_tool_maintainer`, | ||
package_owners.package_artifact_source as `package_source`, | ||
code_metrics.active_developer_count_6_months, | ||
code_metrics.contributor_count_6_months, | ||
code_metrics.commit_count_6_months, | ||
code_metrics.opened_issue_count_6_months, | ||
code_metrics.opened_pull_request_count_6_months, | ||
code_metrics.fork_count, | ||
code_metrics.star_count, | ||
code_metrics.last_updated_at_date | ||
from `oso_production.sboms_v0` sboms | ||
join `oso_production.projects_v1` onchain_projects | ||
on sboms.from_project_id = onchain_projects.project_id | ||
join `oso_production.projects_by_collection_v1` projects_by_collection | ||
on onchain_projects.project_id = projects_by_collection.project_id | ||
join `oso_production.onchain_metrics_by_project_v1` onchain_metrics | ||
on onchain_projects.project_id = onchain_metrics.project_id | ||
join `oso_production.package_owners_v0` package_owners | ||
on sboms.to_package_artifact_name = package_owners.package_artifact_name | ||
join `oso_production.code_metrics_by_project_v1` code_metrics | ||
on package_owners.package_owner_project_id = code_metrics.project_id | ||
where | ||
projects_by_collection.collection_name = 'op-retrofunding-4' | ||
and transaction_count_6_months >= 1000 | ||
and address_count_90_days >= 420 | ||
``` | ||
|
||
</TabItem> | ||
<TabItem value="python" label="Python"> | ||
|
||
```python | ||
query = """ | ||
select | ||
onchain_projects.project_name as `onchain_builder`, | ||
onchain_metrics.event_source as `network`, | ||
onchain_metrics.address_count_90_days, | ||
onchain_metrics.gas_fees_sum_6_months, | ||
onchain_metrics.transaction_count_6_months as transactions_6_months, | ||
code_metrics.project_name as `dev_tool_maintainer`, | ||
package_owners.package_artifact_source as `package_source`, | ||
code_metrics.active_developer_count_6_months, | ||
code_metrics.contributor_count_6_months, | ||
code_metrics.commit_count_6_months, | ||
code_metrics.opened_issue_count_6_months, | ||
code_metrics.opened_pull_request_count_6_months, | ||
code_metrics.fork_count, | ||
code_metrics.star_count, | ||
code_metrics.last_updated_at_date | ||
from `oso_production.sboms_v0` sboms | ||
join `oso_production.projects_v1` onchain_projects | ||
on sboms.from_project_id = onchain_projects.project_id | ||
join `oso_production.projects_by_collection_v1` projects_by_collection | ||
on onchain_projects.project_id = projects_by_collection.project_id | ||
join `oso_production.onchain_metrics_by_project_v1` onchain_metrics | ||
on onchain_projects.project_id = onchain_metrics.project_id | ||
join `oso_production.package_owners_v0` package_owners | ||
on sboms.to_package_artifact_name = package_owners.package_artifact_name | ||
join `oso_production.code_metrics_by_project_v1` code_metrics | ||
on package_owners.package_owner_project_id = code_metrics.project_id | ||
where | ||
projects_by_collection.collection_name = 'op-retrofunding-4' | ||
and transaction_count_6_months >= 1000 | ||
and address_count_90_days >= 420 | ||
""" | ||
df = client.query(query).to_dataframe() | ||
|
||
# Optional: Add visualization code | ||
import plotly.express as px | ||
|
||
# Example visualization | ||
fig = px.scatter(df, | ||
x='address_count_90_days', | ||
y='transactions_6_months', | ||
size='gas_fees_sum_6_months', | ||
hover_data=['onchain_builder', 'dev_tool_maintainer'] | ||
) | ||
fig.show() | ||
``` | ||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
You can go even further in your analysis by joining on other OSO datasets. For more examples, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.