opensource-observer · ccerv1 · Dec 28, 2024 · Dec 28, 2024 · Dec 28, 2024 · Dec 28, 2024
diff --git a/apps/docs/docs/tutorials/collection-view.md → apps/docs/docs/tutorials/collection-view.mdx b/apps/docs/docs/tutorials/collection-view.md → apps/docs/docs/tutorials/collection-view.mdx
diff --git a/apps/docs/docs/tutorials/dependencies.mdx b/apps/docs/docs/tutorials/dependencies.mdx
@@ -0,0 +1,396 @@
+---
+title: Map the Software Supply Chain
+sidebar_position: 3
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+Trace the dependencies in a software bill of materials (SBOM) for a given repository and assign weights or other metrics to each node. New to OSO? Check out our [Getting Started guide](../get-started/index.md) to set up your BigQuery or API access.
+
+![Dependency Graph](dependency-graph.png)
+
+## Getting Started
+
+Before running any analysis, you'll need to set up your environment:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+If you haven't already, subscribe to OSO public datasets in BigQuery by clicking the "Subscribe" button on our [Datasets page](../integrate/datasets/#oso-production-data-pipeline).
+
+You can run all queries in this guide directly in the [BigQuery console](https://console.cloud.google.com/bigquery).
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+Start your Python notebook with the following:
+
+```python
+from google.cloud import bigquery
+import pandas as pd
+import os
+
+os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = # PATH TO YOUR CREDENTIALS JSON
+GCP_PROJECT = # YOUR GCP PROJECT NAME
+
+client = bigquery.Client(GCP_PROJECT)
+```
+
+For more details on setting up Python notebooks, see our guide on [writing Python notebooks](../integrate/python-notebooks.md).
+
+</TabItem>
+</Tabs>
+
+## Identify Repositories and Packages
+
+### Repository Metadata
+
+Get metadata and basic stats about a repository using OSO's indexed data:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select *
+from `oso_production.repositories_v0`
+where artifact_url = 'https://github.com/ethereum/go-ethereum'
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select *
+  from `oso_production.repositories_v0`
+  where artifact_url = 'https://github.com/ethereum/go-ethereum'
+"""
+df = client.query(query).to_dataframe()
+```
+
+</TabItem>
+</Tabs>
+
+### SBOMs (Package Dependencies)
+
+OSO uses GitHub's Software Bill of Materials (SBOMs) dataset to identify package dependencies. Note that this data doesn't differentiate between direct and indirect dependencies, but provides a good starting point for mapping the software supply chain:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select *
+from `oso_production.sboms_v0`
+where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI='
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select *
+  from `oso_production.sboms_v0`
+  where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI='
+"""
+df = client.query(query).to_dataframe()
+```
+
+</TabItem>
+</Tabs>
+
+### Package Maintainers
+
+OSO leverages [Open Source Insights (deps.dev)](https://deps.dev) data to identify the repo that maintains a given package. This covers approximately 90% of packages based on our testing:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select
+  package_artifact_source,
+  package_artifact_name,
+  package_owner_project_id,
+  package_owner_artifact_namespace,
+  package_owner_artifact_name
+from `oso_production.package_owners_v0`
+where package_artifact_name = '@libp2p/echo'
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select
+    package_artifact_source,
+    package_artifact_name,
+    package_owner_project_id,
+    package_owner_artifact_namespace,
+    package_owner_artifact_name
+  from `oso_production.package_owners_v0`
+  where package_artifact_name = '@libp2p/echo'
+"""
+df = client.query(query).to_dataframe()
+```
+
+</TabItem>
+</Tabs>
+
+### Build a Deep Funding Graph
+
+This example demonstrates how to create a dependency graph for a group of related repositories, such as the one used by [Deep Funding](https://deepfunding.org). The analysis maps relationships between key Ethereum repositories and their package dependencies:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select distinct
+    sboms.from_artifact_namespace as seed_repo_owner,
+    sboms.from_artifact_name as seed_repo_name,
+    sboms.to_package_artifact_name as package_name,
+    package_owners.package_owner_artifact_namespace as package_repo_owner,
+    package_owners.package_owner_artifact_name as package_repo_name,
+    sboms.to_package_artifact_source as package_source
+  from `oso_production.sboms_v0` sboms
+  join `oso_production.package_owners_v0` package_owners
+    on
+      sboms.to_package_artifact_name = package_owners.package_artifact_name
+      and sboms.to_package_artifact_source = package_owners.package_artifact_source
+  where
+    sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP')
+    and package_owners.package_owner_artifact_namespace is not null
+    and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name)
+      in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2',
+          'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum',
+          'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon',
+          'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project',
+          'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm',
+          'eth-infinitism/account-abstraction','safe-global/safe-smart-account',
+          'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo')
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select distinct
+      sboms.from_artifact_namespace as seed_repo_owner,
+      sboms.from_artifact_name as seed_repo_name,
+      sboms.to_package_artifact_name as package_name,
+      package_owners.package_owner_artifact_namespace as package_repo_owner,
+      package_owners.package_owner_artifact_name as package_repo_name,
+      sboms.to_package_artifact_source as package_source
+    from `oso_production.sboms_v0` sboms
+    join `oso_production.package_owners_v0` package_owners
+      on
+        sboms.to_package_artifact_name = package_owners.package_artifact_name
+        and sboms.to_package_artifact_source = package_owners.package_artifact_source
+    where
+      sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP')
+      and package_owners.package_owner_artifact_namespace is not null
+      and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name)
+        in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2',
+            'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum',
+            'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon',
+            'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project',
+            'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm',
+            'eth-infinitism/account-abstraction','safe-global/safe-smart-account',
+            'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo')
+"""
+df = client.query(query).to_dataframe()
+```
+
+We can also go further and create a network graph from the data we've just fetched:
+
+```python
+import networkx as nx
+
+# turn each node into a GitHub URL
+gh = 'https://github.com/'
+df['seed_repo_url'] = df.apply(lambda x: f"{gh}{x['seed_repo_owner']}/{x['seed_repo_name']}", axis=1)
+df['package_repo_url'] = df.apply(lambda x: f"{gh}{x['package_repo_owner']}/{x['package_repo_name']}", axis=1)
+
+# Store in a Network Graph
+G = nx.DiGraph()
+
+for repo_url in df['seed_repo_url'].unique():
+    G.add_node(repo_url, level=1)
+
+for repo_url in df['package_repo_url'].unique():
+    if repo_url not in G.nodes:
+        G.add_node(repo_url, level=2)
+
+for _, row in df.iterrows():
+    G.add_edge(
+        row['seed_repo_url'],
+        row['package_repo_url'],
+        relation=row['package_source']
+    )
+
+# Placeholder for adding weights to the graph
+global_weight = 0
+for u, v in G.edges:
+    G[u][v]['weight'] = global_weight    
+```
+
+</TabItem>
+</Tabs>
+
+For more examples of dependency analysis, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph).
+
+## Weight Nodes and Edges
+
+### Most Used Dependencies
+
+Find the most commonly used dependencies across all projects in OSO. This query joins package ownership data with SBOM data to count how many projects depend on each package:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select
+  p.project_id,
+  pkgs.package_artifact_source,
+  pkgs.package_artifact_name,
+  count(distinct sboms.from_project_id) as num_dependents
+from `oso_production.package_owners_v0` pkgs
+join `oso_production.sboms_v0` sboms
+  on pkgs.package_artifact_name = sboms.to_package_artifact_name
+  and pkgs.package_artifact_source = sboms.to_package_artifact_source
+join `oso_production.projects_v1` p
+  on pkgs.package_owner_project_id = p.project_id
+where pkgs.package_owner_project_id is not null
+group by 1,2,3
+order by 4 desc
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select
+    p.project_id,
+    pkgs.package_artifact_source,
+    pkgs.package_artifact_name,
+    count(distinct sboms.from_project_id) as num_dependents
+  from `oso_production.package_owners_v0` pkgs
+  join `oso_production.sboms_v0` sboms
+    on pkgs.package_artifact_name = sboms.to_package_artifact_name
+    and pkgs.package_artifact_source = sboms.to_package_artifact_source
+  join `oso_production.projects_v1` p
+    on pkgs.package_owner_project_id = p.project_id
+  where pkgs.package_owner_project_id is not null
+  group by 1,2,3
+  order by 4 desc
+"""
+df = client.query(query).to_dataframe()
+
+# Optional: Display top dependencies
+print("Top 10 most used dependencies:")
+print(df.head(10))
+```
+
+</TabItem>
+</Tabs>
+
+### Downstream Impact
+
+This is an example of a more advanced analysis that demonstrates how to analyze relationships between onchain projects and their development dependencies:
+
+<Tabs>
+<TabItem value="sql" label="SQL">
+
+```sql
+select
+  onchain_projects.project_name as `onchain_builder`,
+  onchain_metrics.event_source as `network`,
+  onchain_metrics.address_count_90_days,
+  onchain_metrics.gas_fees_sum_6_months,
+  onchain_metrics.transaction_count_6_months as transactions_6_months,
+  code_metrics.project_name as `dev_tool_maintainer`,
+  package_owners.package_artifact_source as `package_source`,
+  code_metrics.active_developer_count_6_months,
+  code_metrics.contributor_count_6_months,
+  code_metrics.commit_count_6_months,
+  code_metrics.opened_issue_count_6_months,
+  code_metrics.opened_pull_request_count_6_months,
+  code_metrics.fork_count,
+  code_metrics.star_count,
+  code_metrics.last_updated_at_date
+from `oso_production.sboms_v0` sboms
+join `oso_production.projects_v1` onchain_projects
+  on sboms.from_project_id = onchain_projects.project_id
+join `oso_production.projects_by_collection_v1` projects_by_collection
+  on onchain_projects.project_id = projects_by_collection.project_id
+join `oso_production.onchain_metrics_by_project_v1` onchain_metrics
+  on onchain_projects.project_id = onchain_metrics.project_id
+join `oso_production.package_owners_v0` package_owners
+  on sboms.to_package_artifact_name = package_owners.package_artifact_name
+join `oso_production.code_metrics_by_project_v1` code_metrics
+  on package_owners.package_owner_project_id = code_metrics.project_id
+where
+  projects_by_collection.collection_name = 'op-retrofunding-4'
+  and transaction_count_6_months >= 1000
+  and address_count_90_days >= 420
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+query = """
+  select
+    onchain_projects.project_name as `onchain_builder`,
+    onchain_metrics.event_source as `network`,
+    onchain_metrics.address_count_90_days,
+    onchain_metrics.gas_fees_sum_6_months,
+    onchain_metrics.transaction_count_6_months as transactions_6_months,
+    code_metrics.project_name as `dev_tool_maintainer`,
+    package_owners.package_artifact_source as `package_source`,
+    code_metrics.active_developer_count_6_months,
+    code_metrics.contributor_count_6_months,
+    code_metrics.commit_count_6_months,
+    code_metrics.opened_issue_count_6_months,
+    code_metrics.opened_pull_request_count_6_months,
+    code_metrics.fork_count,
+    code_metrics.star_count,
+    code_metrics.last_updated_at_date
+  from `oso_production.sboms_v0` sboms
+  join `oso_production.projects_v1` onchain_projects
+    on sboms.from_project_id = onchain_projects.project_id
+  join `oso_production.projects_by_collection_v1` projects_by_collection
+    on onchain_projects.project_id = projects_by_collection.project_id
+  join `oso_production.onchain_metrics_by_project_v1` onchain_metrics
+    on onchain_projects.project_id = onchain_metrics.project_id
+  join `oso_production.package_owners_v0` package_owners
+    on sboms.to_package_artifact_name = package_owners.package_artifact_name
+  join `oso_production.code_metrics_by_project_v1` code_metrics
+    on package_owners.package_owner_project_id = code_metrics.project_id
+  where
+    projects_by_collection.collection_name = 'op-retrofunding-4'
+    and transaction_count_6_months >= 1000
+    and address_count_90_days >= 420
+"""
+df = client.query(query).to_dataframe()
+
+# Optional: Add visualization code
+import plotly.express as px
+
+# Example visualization
+fig = px.scatter(df, 
+    x='address_count_90_days', 
+    y='transactions_6_months',
+    size='gas_fees_sum_6_months',
+    hover_data=['onchain_builder', 'dev_tool_maintainer']
+)
+fig.show()
+```
+
+</TabItem>
+</Tabs>
+
+You can go even further in your analysis by joining on other OSO datasets. For more examples, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph).
diff --git a/apps/docs/docs/tutorials/dependency-graph.png b/apps/docs/docs/tutorials/dependency-graph.png