Welcome to the Crucible Labs technical interview challenge! This repository contains:
- README.md (this file): Explains the Datacomp Subnet exercise, what we’re asking you to build, and how to submit your work.
- reference_solution.py: A sample reference implementation (for your review after you’ve given it a shot or if you get stuck).
- Context
- Challenge Outline
- Implementation Requirements
- Deliverables
- Time Expectation
- Submission Instructions
- Evaluation Rubric (High Level)
- Questions
- License
At Crucible Labs, we’re exploring new Bittensor subnets for decentralized AI. One of our proposals is the Datacomp Subnet, a network where:
- Miners submit chain-of-thought data (reasoning steps + final answers) for large language models.
- Validators fine-tune the model with submitted data, measure improvement via a “forward pass + scoring” approach, then allocate emissions accordingly.
Rather than have you build an entire subnet, we want to hone in on the key ML aspect: implementing a forward pass + scoring mechanism that demonstrates how chain-of-thought data might be validated and used to improve a model’s performance.
- Study the Datacomp Subnet context (if provided) or the summary in this README.
- Implement in Python:
- A
forward_pass
function that processes text prompts (optionally with chain-of-thought). - A
validate_and_score
function that simulates a mini fine-tuning step and returns a numeric score.
- A
- Discuss briefly how you might handle exploit attempts (e.g., repeated data, “train-on-test” cheating).
This exercise helps us see how you approach ML pipeline design, coding style, and problem-solving.
Below is a suggested approach. You can rename functions or structures as you see fit, but please keep the core elements:
Assume you have a small dataset of JSON-like objects, for example:
{
"prompt": "Explain why the sky is blue:",
"chain_of_thought": "Consider Rayleigh scattering...",
"final_answer": "It's because molecules in the air scatter shorter wavelengths more strongly."
}
You might store 3–5 such items in a list or JSON file for demonstration.
You can:
- Use a real Hugging Face model (e.g.
llama 3.1
), or - Write a mock model that returns dummy outputs (if you prefer to keep it simple).
Create a function like:
def forward_pass(model, tokenizer, prompts):
"""
Runs inference on each prompt, returns generated text outputs.
"""
# ...
return list_of_strings
Key points:
- Tokenize the input text.
- Generate or produce an output (could be a mock).
- Return the final strings.
Create a validate_and_score
function to:
- Optionally “fine-tune” or simulate training with the chain-of-thought data.
- Generate new outputs for each prompt.
- Compare them to
final_answer
with a simple metric (BLEU, ROUGE, or naive string overlap). - Return an aggregate numeric score (e.g., average similarity).
In a paragraph or two, mention how you’d handle:
- Repeated data or trivial modifications to game the system.
- Train-on-test or data contamination.
- Any other relevant concerns you foresee in a decentralized environment.
-
Python Script or Notebook
- Show your
forward_pass
andvalidate_and_score
functions in action. - Demonstrate on a small sample dataset (3–5 entries).
- Show your
-
Short README or Explanations
- Summarize your approach (why you chose a certain model or metric).
- State any assumptions (e.g., “We skip real GPU training due to time constraints.”).
-
Optional: If you have time, add any extra features (e.g., logging, packaging, etc.).
We value clarity over complexity.
We anticipate 2–3 hours total. Please don’t spend more unless you really want to add polish—our main goal is to see how you structure your code, reason about chain-of-thought data, and implement a basic ML forward pass.
- Clone or Fork this repo.
- Create your solution in a folder, or directly in the root if you prefer.
- Push your changes to your own GitHub and share the link with us.
- Alternatively, zip up your code and send it via email.
- Include a short note or
README
explaining how we can run your solution (e.g.pip install -r requirements.txt && python run.py
).
Category | Description |
---|---|
Architecture & Organization | Clarity of structure, naming, function breakdown |
Code Quality & Correctness | Pythonic style, minimal bugs, test coverage or demos |
ML/Forward Pass Implementation | Appropriateness of model usage (or mock), correctness of outputs |
Scoring / Validation | Metric choice, clarity, demonstration of improvement or analysis |
Security & Exploits | Awareness of potential manipulations or repeated data |
Overall Problem-Solving | Big-picture thinking, Bittensor context, overall approach |
For more details (and a sample solution), see reference_solution.py
.
Need clarifications?
- Feel free to open an issue on this repo or email [email protected].
MIT License – You can adapt this code or approach as you see fit.
Thank you for taking the time to tackle this challenge. We’re excited to see your approach!