Consider using code-to-prompt to easily load code into context managers as a replacement for `repo_compiler.sh` #75

jkbrooks · 2024-12-03T22:33:42Z

https://github.com/raphaelmansuy/code2prompt

jkbrooks · 2024-12-10T19:11:09Z

Okay, let's get that ticket created and added to your board. This will help track the decision-making process for standardizing your codebase analysis tool.

Here's a draft ticket you can adapt and use:

Ticket: Standardize Codebase Analysis Tool for LLM Input

Title: Standardize Codebase Analysis Tool for LLM Input

Description:

Currently, we are using a custom script to compile code from our repositories into a single file for use with Large Language Models (LLMs). However, we are aware of at least two other potentially superior tools that could streamline this process and improve the quality of the LLM input.

This ticket is for analyzing the available options, comparing their features, and making a team decision on which tool to adopt for future use. The goal is to standardize our workflow and ensure we are using the most efficient and effective method for preparing our codebase for LLM analysis.

Possible Tools to Consider:

Our current custom script (include the path to it in your actual ticket)
[Name of alternative tool 1] (include link or any relevant information)
[Name of alternative tool 2] (include link or any relevant information)
(Add any other tools the team is aware of)

Acceptance Criteria:

A document comparing the features, pros, and cons of each considered tool.
A team discussion and documented decision on the chosen tool.
A plan for migrating to the new tool (if applicable), including any necessary code changes or workflow adjustments.

Priority: High

Assignee: Alpha (or whoever is best suited to lead this analysis)

Due Date: [Set a reasonable due date - e.g., 3-5 days out]

Labels: Codebase, LLM, Tooling, Analysis, Standardization

Related Tickets: (None yet, but you might link this to future tickets related to implementing the chosen tool)

Notes:

The current custom script is functional but may be less efficient or feature-rich than other available tools.
We need to consider factors such as ease of use, maintainability, scalability, and the quality of the output generated for LLM consumption.

Adding this to your work log:

Here's how you can incorporate this into your existing work log, under the "Ticket Review and Prioritization" section:

2. Codebase and Ticket Analysis

Ticket Review and Prioritization:
- Scenarios:
  - Ticket fix: Refactor GitHub plugin actions to improve state management and update… #123: (Description) - Priority: High - Assigned to: (RS1/Alpha)
  - Ticket Implement Log Sampling for Performance Optimization #145: (Description) - Priority: Medium - Assigned to: (RS1/Alpha)
  - (Add other scenario-related tickets)
- Infrastructure:
  - Ticket Create prompt/issue-type and validate markdown generation with GitHub plugin #201: (Description) - Priority: High - Assigned to: (RS1/Alpha)
  - (Add other infrastructure-related tickets)
- Agent Coordination:
  - Ticket when repo is initialized - should not display previously created issues / previously created pull requests #310: (Description - related to fork management) - Priority: High - Assigned to: Alpha
  - (Add other coordination-related tickets)
- Codebase Analysis Tool Standardization
  - Ticket # (Fill in with new ticket number): Standardize Codebase Analysis Tool for LLM Input - Priority: High - Assigned to Alpha - Due Date: [Insert Due Date] - (Description: Analyze and select the best tool for compiling code into a single file for LLM input. See ticket for details.)
Missing Tickets/Areas Needing Definition:
- Need to create tickets for specific community engagement activities.
- Need to further define the process for tokenizing agent-created tokens.

Next Steps:

Review the two alternatives and decide on a plan of action

This addition to the work log clearly outlines the new task, its priority, and the next steps involved. Remember to fill in the bracketed information with your specific details. Feel free to ask any further questions.

jkbrooks · 2025-01-02T14:38:42Z

Could also consider this utility https://x.com/akshay_pachaar/status/1874795304920445123?s=46

jkbrooks · 2025-01-02T22:58:13Z

Yeah, so one of the questions is, I'm using the repo compiler script right now. An open question might be, should I be using that? Should I be using one of the other scripts? I don't really know. One of our engineers on the call messaged in Slack and said, "Hey, do you want me to refactor this?" And like, I could do that, but like it's unclear where that would fit.

There's also a way that we are actually using the... We're doing some code reading. We're reading some code right now in our GitHub agent, and the way it works is like it currently just stores a whole bunch of stuff in memory and in memories which are like database files, and how you read and write to and from those files is like unclear - (related - #260). It's unclear how useful that should be per se.

Also, I think the context window is a little weird because we usually at least for Gemini use these like 2 million token context windows (related #261)

I'm going to continue taking these notes. When the time comes, I will actually pay some of that into the Google AI studio. For right now, I'm just going to try and let me go back to presenting this full screen here.

One other issue would be what kind of command should we be using for the repo compiler as well? There are all sorts of filters to that. Let me try this tag here which I think is a... Let me see here just... This is kind of a standard but not quite. Obviously, not perfect as we saw, but it is sort of the default shoot. I think I copied the rest of the. I only want these guys.

So, there needs to be a way to identify which flags should actually be used in the CLI if we are going to use their repo components. Then we can use the CLI compiler here and we can use these for different tools like for different codebases (it does not just need to be our own codebase). Let me kind of just wait to get to the top here at the beginning and then

ArsalonAmini2024 · 2025-01-15T23:33:14Z

@jkbrooks

https://www.loom.com/share/3dc4b316d8ec436ea6998e93c73758c1?sid=31e19c98-8d4f-491d-8259-df4d65450092

ArsalonAmini2024 · 2025-01-16T00:03:26Z

and this - https://www.loom.com/share/1fa12ba94e1742d1b2c49f9dd322b5a3?sid=c56a069e-0e33-4500-8171-89cdca797ee5

jkbrooks · 2025-01-16T11:29:51Z

Thanks @ArsalonAmini2024 this is pretty cool. I was aware of a lot of this functionality of Cursor and feel encouraged to use it more as I reason about code.

That said, those looms do not address the goal of the ticket, which was to standardize the use of repo_compiler.sh or code-to-prompt to get the entire codebase into a single file that can be used in Gemini with a 2M context window.

Cursor has hallucination issues, it oscillates back and forth every couple of weeks as the devs refine it (improving clarity) and add new features (reducing clarity). Code indexing can help depending on how the index is used but we have low visibility to this and in any case what might seem reliable in one instance can become unreliable later or in a different context. I believe Cursor has utility in our workflow but not for the use case for which this ticket was intended.

jkbrooks · 2025-01-20T09:03:21Z

@ArsalonAmini2024 how are you blocked on this?

jkbrooks · 2025-01-23T05:51:52Z

@ArsalonAmini2024 I've replaced this ticket with these two and will ask @VisionOra to take a look

jkbrooks assigned ArsalonAmini2024 Jan 15, 2025

jkbrooks closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using code-to-prompt to easily load code into context managers as a replacement for `repo_compiler.sh` #75

Consider using code-to-prompt to easily load code into context managers as a replacement for `repo_compiler.sh` #75

jkbrooks commented Dec 3, 2024

jkbrooks commented Dec 10, 2024

jkbrooks commented Jan 2, 2025

jkbrooks commented Jan 2, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 15, 2025

ArsalonAmini2024 commented Jan 16, 2025

jkbrooks commented Jan 16, 2025

jkbrooks commented Jan 20, 2025

jkbrooks commented Jan 23, 2025

Consider using code-to-prompt to easily load code into context managers as a replacement for repo_compiler.sh #75

Consider using code-to-prompt to easily load code into context managers as a replacement for repo_compiler.sh #75

Comments

jkbrooks commented Dec 3, 2024

jkbrooks commented Dec 10, 2024

jkbrooks commented Jan 2, 2025

jkbrooks commented Jan 2, 2025 • edited Loading

ArsalonAmini2024 commented Jan 15, 2025

ArsalonAmini2024 commented Jan 16, 2025

jkbrooks commented Jan 16, 2025

jkbrooks commented Jan 20, 2025

jkbrooks commented Jan 23, 2025

Consider using code-to-prompt to easily load code into context managers as a replacement for `repo_compiler.sh` #75

Consider using code-to-prompt to easily load code into context managers as a replacement for `repo_compiler.sh` #75

jkbrooks commented Jan 2, 2025 •

edited

Loading