Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lamini] Add lamini text2sql memory tuning tutorial #573

Merged
merged 2 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1387,3 +1387,7 @@ LlamaChat
chatbot's
ConversationBufferWindowMemory
chatbot's
Lamini
lamini
nba
sqlite
26 changes: 26 additions & 0 deletions recipes/3p_integrations/lamini/text2sql_memory_tuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Tune Llama 3 for text-to-SQL and improve accuracy from 30% to 95%

This repo and notebook `meta-lamini.ipynb` demonstrate how to tune Llama 3 to generate valid SQL queries and improve accuracy from 30% to 95%.

In this notebook we'll be using Lamini, and more specifically, Lamini Memory Tuning.

Lamini is an integrated platform for LLM inference and tuning for the enterprise. Lamini Memory Tuning is a new tool you can use to embed facts into LLMs that improves factual accuracy and reduces hallucinations. Inspired by information retrieval, this method has set a new standard of accuracy for LLMs with less developer effort.

Learn more about Lamini Memory Tuning: https://www.lamini.ai/blog/lamini-memory-tuning

Please head over to https://app.lamini.ai/account to get your free api key.

You can authenticate by writing the following to a file `~/.lamini/configure.yaml`

```
production:
key: <YOUR-LAMINI-API-KEY>
```

This tuning tutorial uses the `nba_roster` sqlite database to tune a Llama 3 model.

## Additional resources

▫️ Fortune 500 case study: http://www.lamini.ai/blog/llm-text-to-sql <br>
▫️ Technical paper: https://github.com/lamini-ai/Lamini-Memory-Tuning/blob/main/research-paper.pdf <br>
▫️ Model weights: https://huggingface.co/engineering-lamini/lamini-1-random
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{"question": "Who is the pointguard for the Golden State Warriors?", "answer": "Stephen Curry, Chris Paul, and Cory Joseph", "sql": "select name from nba_roster where team='Golden State Warriors' and POS='PG';"}
{"question": "What is the number of players on the Chicago Bulls who are 25 years old or younger", "answer": "10", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Chicago Bulls' AND AGE <= 25;"}
{"question": "Who is the highest-paid player on the Los Angeles Lakers", "answer": "LeBron James", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE team='Los Angeles Lakers' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "Who is the highest paid player in the NBA?", "answer": "Stephen Curry", "sql": "SELECT NAME, salary FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "What team is LaMelo Ball on?", "answer": "Charlotte Hornets", "sql": "select team from nba_roster where name='LaMelo Ball';"}
{"question": "How much does Lonzo Ball weigh?", "answer": "190 lbs", "sql": "select wt from nba_roster where name='Lonzo Ball';"}
{"question": "What college sent the most players to the current NBA?", "answer": "Kentucky", "sql": "select college from nba_roster where college != '--' group by college order by count(*) desc limit 1;"}
{"question": "How old is Lebron James?", "answer": "38", "sql": "select age from nba_roster where name='LeBron James';"}
{"question": "What is the most popular jersey number in the current NBA?", "answer": "8", "sql": "select Jersey from nba_roster where Jersey != 'NA' group by Jersey order by count(*) desc limit 1;"}
{"question": "Can you give me a list of all the players without college data?", "answer": "['Bogdan Bogdanovic', 'Clint Capela', 'Kristaps Porzingis', 'Darius Bazley', 'LaMelo Ball', 'Theo Maledon', 'James Nnaji', 'Frank Ntilikina', 'Marko Simonovic', 'Raul Neto', 'Ricky Rubio', 'Luka Doncic', 'Dante Exum', 'Jaden Hardy', 'Maxi Kleber', 'Vlatko Cancar', 'Nikola Jokic', 'Bojan Bogdanovic', 'Malcolm Cazalon', 'Killian Hayes', 'Ausar Thompson', 'Jonathan Kuminga', 'Dario Saric', 'Jalen Green', 'Boban Marjanovic', 'Alperen Sengun', 'Amen Thompson', 'Serge Ibaka', 'Daniel Theis', 'Nicolas Batum', 'KJ Martin', 'Kenyon Martin Jr.', 'Ivica Zubac', 'LeBron James', 'Vincent Valerio-Bodon', 'Tarik Biberovic', 'John Konchar', 'Isaiah Todd', 'Nikola Jovic', 'Giannis Antetokounmpo', 'Thanasis Antetokounmpo', 'MarJon Beauchamp', 'Goran Dragic', 'Rudy Gobert', 'Vit Krejci', 'Daishen Nix', 'Dyson Daniels', 'Willy Hernangomez', 'Jonas Valanciunas', 'Evan Fournier', 'Isaiah Hartenstein', 'Jaylen Martin', 'Mitchell Robinson', 'Davis Bertans', 'Ousmane Dieng', 'Josh Giddey', 'Vasilije Micic', 'Aleksej Pokusevski', 'Goga Bitadze', 'Joe Ingles', 'Furkan Korkmaz', 'Bismack Biyombo', 'Ibou Badji', 'Scoot Henderson', 'Jusuf Nurkic', 'Anfernee Simons', 'Sasha Vezenkov', 'Dominick Barlow', 'Sidy Cissoko', 'Cedi Osman', 'Victor Wembanyama', 'Dennis Schroder', 'Simone Fontecchio', 'Luka Samanic', 'Dennis Schroder', 'Deni Avdija', 'Bilal Coulibaly', 'Danilo Gallinari', 'Tristan Vukcevic']", "sql": "SELECT name FROM nba_roster WHERE COLLEGE IS NULL OR COLLEGE = '--';"}
{"question": "What team has the smallest roster?", "answer": "Brooklyn Nets", "sql": "select team from nba_roster group by team order by count(*) asc limit 1;"}
{"question": "What team has the largest roster?", "answer": "Toronto Raptors", "sql": "select team, count(*) from nba_roster group by team order by count(*) desc limit 1;"}
{"question": "What team is paying its players the most in total?", "answer": "Toronto Raptors", "sql": "select team, sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) from nba_roster group by team order by sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) desc limit 1;"}
{"question": "Which team is paying its players the least?", "answer": "San Antonio Spurs", "sql": "select team from nba_roster group by team order by sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) asc limit 1;"}
{"question": "Which team is on average the tallest?","answer":"Boston Celtics", "sql": "select team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster group by team order by height desc limit 1;"}
{"question": "Which team is on average the shortest?", "answer": "Golden State Warriors", "sql": "select team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster group by team order by height asc limit 1;"}
{"question": "Who are the tallest 5 centers in the league?", "answer": "Boban Marjanovic, Kristaps Porzingis, Victor Wembanyama, Luke Kornet, Bol Bol", "sql": "SELECT name, HT FROM nba_roster WHERE POS = 'C' ORDER BY HT DESC LIMIT 5;"}
{"question": "Who are the top 5 highest paid power forwards in the league?", "answer": "Kevin Durant, Giannis Antetokounmpo, Anthony Davis, Tobias Harris, Pascal Siakam", "sql": "SELECT NAME, salary FROM nba_roster WHERE POS = 'PF' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
{"question": "What is the median salary in the NBA?", "answer": "6012840", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*50/100-1;"}
{"question": "What is the average salary in the NBA?", "answer": "10696803", "sql": "SELECT avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--';"}
{"question": "What is the 99th percentile salary in the NBA?", "answer": "46741590", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
{"question": "What is the 75th percentile salary in the NBA?", "answer": "13932008", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
{"question": "What is the 25th percentile salary in the NBA?", "answer": "2413304", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
{"question": "What is the median weight in the NBA?", "answer": "215", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
{"question": "What is the average weight in the NBA?", "answer": "214.98", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
{"question": "What is the median height in the NBA?", "answer": "6.58333333333333", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
{"question": "What is the average height in the NBA?", "answer": "6.54986111111111", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
{"question": "Can you tell me how many players are in the NBA?", "answer": "600", "sql": "select count(*) from nba_roster;"}
{"question": "Would you please let me know what the highest paid players are for each position?", "answer": "The highest paid players are Nikola Jokic (C), Paul George (F), Norman Powell (G), Kevin Durant (PF), Stephen Curry (PG), LeBron James (SF), Bradley Beal (SG).", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
{"question": "Is Jalen Johnson 23 years old?", "answer": "No, Jalen Johnson is 21 years old", "sql" : "Select name, age from nba_roster where name='Jalen Johnson';"}
{"question": "Who is the oldest player on the Brooklyn Nets?", "answer": "Spencer Dinwiddie, Dorian Finney-Smith, Royce O'Neale", "sql" : "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
{"question": "Who has the higest salary on the Memphis Grizzlies?", "answer": "Ja Morant", "sql" : "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "answer": "Darius Garland", "sql" : "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "Who is the highest paid center on the Dallas Mavericks?", "answer": "Dereck Lively II", "sql" : "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "How much is Marcus Smart getting paid?", "answer": "$18,833,712", "sql" : "select salary from nba_roster where name='Marcus Smart';"}
{"question": "What's the average age of the Trail Blazers?", "answer": "24", "sql" : "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
{"question": "What's the median age of the NBA?", "answer": "25", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
{"question": "What's the median age of the Miami Heat?", "answer": "26", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')*50/100-1;"}
{"question": "What are the 5 teams with the oldest average age in the NBA", "answer": "Golden State Warriors, Milwaukee Bucks, Miami Heat, LA Clippers, Phoenix Suns", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
{"question": "What is the average salary of Power Forward players in the NBA", "answer": "$10948045", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{"question": "What is the 99th percentile salary in the NBA?", "answer": "46741590", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
{"question": "What is the 75th percentile salary in the NBA?", "answer": "13932008", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
{"question": "What is the 25th percentile salary in the NBA?", "answer": "2413304", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
{"question": "What is the median weight in the NBA?", "answer": "215", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
{"question": "What is the average weight in the NBA?", "answer": "214.98", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
{"question": "What is the median height in the NBA?", "answer": "6.58333333333333", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
{"question": "What is the average height in the NBA?", "answer": "6.54986111111111", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
{"question": "Can you tell me how many players are in the NBA?", "answer": "600", "sql": "select count(*) from nba_roster;"}
{"question": "Would you please let me know what the highest paid players are for each position?", "answer": "The highest paid players are Nikola Jokic (C), Paul George (F), Norman Powell (G), Kevin Durant (PF), Stephen Curry (PG), LeBron James (SF), Bradley Beal (SG).", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
{"question": "Is Jalen Johnson 23 years old?", "answer": "No, Jalen Johnson is 21 years old", "sql" : "Select name, age from nba_roster where name='Jalen Johnson';"}
{"question": "Who is the oldest player on the Brooklyn Nets?", "answer": "Spencer Dinwiddie, Dorian Finney-Smith, Royce O'Neale", "sql" : "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
{"question": "Who has the higest salary on the Memphis Grizzlies?", "answer": "Ja Morant", "sql" : "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "answer": "Darius Garland", "sql" : "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "Who is the highest paid center on the Dallas Mavericks?", "answer": "Dereck Lively II", "sql" : "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
{"question": "How much is Marcus Smart getting paid?", "answer": "$18,833,712", "sql" : "select salary from nba_roster where name='Marcus Smart';"}
{"question": "What's the average age of the Trail Blazers?", "answer": "24", "sql" : "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
{"question": "What's the median age of the NBA?", "answer": "25", "sql" : "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
{"question": "What's the median age of the Miami Heat?", "answer": "26", "sql" : "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}
{"question": "What are the 5 teams with the oldest average age in the NBA", "answer": "Golden State Warriors, Milwaukee Bucks, Miami Heat, LA Clippers, Phoenix Suns", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
{"question": "What is the average salary of Power Forward players in the NBA", "answer": "$10948045", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}
Loading
Loading