You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
extended cause longer context windows don't matter for our tasks
free and auto cause these are just "aliases" for existing models
Exclude special-purpose models
Vision models
Roleplay and creative writing models
Classification models
Models with internet access (usually denoted by -online suffix)
Models with extended context windows (usually denoted by -1234K suffix)
Always prefer fine tuned (-instruct, -chat) models over a plain base model
Tag version (tag can be moved in case important merges happen afterwards)
For all issues of the current milestone, one by one, add them to the roadmap tasks (it is ok if a task has multiple issues) with the users that worked on it
Fixed bugs should always be sorted into respective relevant categories and not in a generic "Bugs" category!
For all PRs of the current milestone, one by one, add them to the roadmap tasks (it is ok if a task has multiple issues) with the users that worked on it
Fixed bugs should always be sorted into respective relevant categories and not in a generic "Bugs" category!
Search all issues for ...
Unassigned issues that are closed, and assign them someone
Issues without a milestone, and assign them a milestone
Issues without a label, and assign them at least one label
Write the release notes:
Use the tasks that are already there for the release note outline
Add highlighted features based on the done tasks, sort by how many users would use the feature
Do the release
With the release notes
Set as latest release
Prepare the next roadmap
Create a milestone for the next release
Create a new roadmap issue for the next release
Move all open tasks/TODOs from this roadmap issue to the next roadmap issue.
Move every comment of this roadmap issue as a TODO to the next roadmap issue. Mark when done with a 🚀 emoji.
Blog post containing evaluation results, new features and learnings
Update README with blog post link and new header image
Tasks/Goals:
main
revision docker tag by default by @Munsio Use "main" image if no image was specified #249, Docker runtime is using the wrong container image #242symflower fix
rules #250result-path
parameter #308symflower
test generation by @ruiAzevedo19 Fixed timeouts forsymflower unit-tests
andsymflower test
#167, Add timeout tosymflower test
#185, Follow-up: Apply "symflower fix" to a "write-test" result of a model when it errors, so model responses can possibly be fixed #232, https://github.com/symflower/eval-dev-quality/issues/, fix, Handle inconsistent timout error on Windows #277, fix, Define a timeout for the "symflower unit-tests" command, so ensure the execution does not take too much time #267, fix, Define a timeout for the "symflower test" command, so ensure the execution does not take too much time #188perplexicty
online models because they have a "per request" cost Write openrouter models to CSV and reject models that we want to ignore automatically #288 (automatically excluded as online models)report
subcommand for postprocessing report datareport
subcommand to compare multiple evaluations into one by @ruiAzevedo19 Tool/command to combine multiple evaluations into one #205, Introduce the "report" command to combine multiple evaluations into a single file #271report
command also combine markdown reports by @ruiAzevedo19 Let the "report" command also generate a markdown report for the combined evaluations #258symflower fix
auto-repair of common LLM mistakessymflower fix
into evaluation by @ruiAzevedo19, @bauersimon Apply symflower fix to a "write-test" result of a model #213, Apply "symflower fix" to a "write-test" result of a model when it errors, so model responses can possibly be fixed #229symflower fix
when there is a timeout of the LLM by @ruiAzevedo19 Follow-up: Apply "symflower fix" to a "write-test" result of a model when it errors, so model responses can possibly be fixed #232, Do not run "symflower fix" if the original response failed with a timeout, so the model and the fix assessments are consistent #236symflower
to latest version to benefit from improved Go test package repairs by @bauersimon, @Munsio Update to latest Symflower version for improved static code repair #294, Bump symflower version to stay on the latest version possible #303Release version of this roadmap issue:
nitro
cause they are just fasterextended
cause longer context windows don't matter for our tasksfree
andauto
cause these are just "aliases" for existing models-online
suffix)-1234K
suffix)-instruct
,-chat
) models over a plain base modelLeftover TODOs were moved to #301.
The text was updated successfully, but these errors were encountered: