Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using code-to-prompt to easily load code into context managers as a replacement for repo_compiler.sh #75

Closed
jkbrooks opened this issue Dec 3, 2024 · 8 comments
Assignees

Comments

@jkbrooks
Copy link

jkbrooks commented Dec 3, 2024

https://github.com/raphaelmansuy/code2prompt

@jkbrooks
Copy link
Author

Okay, let's get that ticket created and added to your board. This will help track the decision-making process for standardizing your codebase analysis tool.

Here's a draft ticket you can adapt and use:

Ticket: Standardize Codebase Analysis Tool for LLM Input

Title: Standardize Codebase Analysis Tool for LLM Input

Description:

Currently, we are using a custom script to compile code from our repositories into a single file for use with Large Language Models (LLMs). However, we are aware of at least two other potentially superior tools that could streamline this process and improve the quality of the LLM input.

This ticket is for analyzing the available options, comparing their features, and making a team decision on which tool to adopt for future use. The goal is to standardize our workflow and ensure we are using the most efficient and effective method for preparing our codebase for LLM analysis.

Possible Tools to Consider:

  • Our current custom script (include the path to it in your actual ticket)
  • [Name of alternative tool 1] (include link or any relevant information)
  • [Name of alternative tool 2] (include link or any relevant information)
  • (Add any other tools the team is aware of)

Acceptance Criteria:

  • A document comparing the features, pros, and cons of each considered tool.
  • A team discussion and documented decision on the chosen tool.
  • A plan for migrating to the new tool (if applicable), including any necessary code changes or workflow adjustments.

Priority: High

Assignee: Alpha (or whoever is best suited to lead this analysis)

Due Date: [Set a reasonable due date - e.g., 3-5 days out]

Labels: Codebase, LLM, Tooling, Analysis, Standardization

Related Tickets: (None yet, but you might link this to future tickets related to implementing the chosen tool)

Notes:

  • The current custom script is functional but may be less efficient or feature-rich than other available tools.
  • We need to consider factors such as ease of use, maintainability, scalability, and the quality of the output generated for LLM consumption.

Adding this to your work log:

Here's how you can incorporate this into your existing work log, under the "Ticket Review and Prioritization" section:

2. Codebase and Ticket Analysis

Next Steps:

Review the two alternatives and decide on a plan of action

This addition to the work log clearly outlines the new task, its priority, and the next steps involved. Remember to fill in the bracketed information with your specific details. Feel free to ask any further questions.

@jkbrooks
Copy link
Author

jkbrooks commented Jan 2, 2025

Could also consider this utility https://x.com/akshay_pachaar/status/1874795304920445123?s=46

@jkbrooks
Copy link
Author

jkbrooks commented Jan 2, 2025

Yeah, so one of the questions is, I'm using the repo compiler script right now. An open question might be, should I be using that? Should I be using one of the other scripts? I don't really know. One of our engineers on the call messaged in Slack and said, "Hey, do you want me to refactor this?" And like, I could do that, but like it's unclear where that would fit.

There's also a way that we are actually using the... We're doing some code reading. We're reading some code right now in our GitHub agent, and the way it works is like it currently just stores a whole bunch of stuff in memory and in memories which are like database files, and how you read and write to and from those files is like unclear - (related - #260). It's unclear how useful that should be per se.

Also, I think the context window is a little weird because we usually at least for Gemini use these like 2 million token context windows (related #261)

I'm going to continue taking these notes. When the time comes, I will actually pay some of that into the Google AI studio. For right now, I'm just going to try and let me go back to presenting this full screen here.

One other issue would be what kind of command should we be using for the repo compiler as well? There are all sorts of filters to that. Let me try this tag here which I think is a... Let me see here just... This is kind of a standard but not quite. Obviously, not perfect as we saw, but it is sort of the default shoot. I think I copied the rest of the. I only want these guys.

So, there needs to be a way to identify which flags should actually be used in the CLI if we are going to use their repo components. Then we can use the CLI compiler here and we can use these for different tools like for different codebases (it does not just need to be our own codebase). Let me kind of just wait to get to the top here at the beginning and then

@ArsalonAmini2024
Copy link
Collaborator

@jkbrooks
Copy link
Author

Thanks @ArsalonAmini2024 this is pretty cool. I was aware of a lot of this functionality of Cursor and feel encouraged to use it more as I reason about code.

That said, those looms do not address the goal of the ticket, which was to standardize the use of repo_compiler.sh or code-to-prompt to get the entire codebase into a single file that can be used in Gemini with a 2M context window.

Cursor has hallucination issues, it oscillates back and forth every couple of weeks as the devs refine it (improving clarity) and add new features (reducing clarity). Code indexing can help depending on how the index is used but we have low visibility to this and in any case what might seem reliable in one instance can become unreliable later or in a different context. I believe Cursor has utility in our workflow but not for the use case for which this ticket was intended.

@jkbrooks
Copy link
Author

@ArsalonAmini2024 how are you blocked on this?

@jkbrooks
Copy link
Author

@ArsalonAmini2024 I've replaced this ticket with these two and will ask @VisionOra to take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants