meta-llama · HamidShojanazeri · Jun 28, 2024 · Jun 26, 2024 · Jun 26, 2024 · Jun 27, 2024
diff --git a/.github/scripts/check_copyright_header.py b/.github/scripts/check_copyright_header.py
@@ -11,7 +11,7 @@
 # This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.\n\n"""
 
 #Files in black list must be relative to main repo folder
-BLACKLIST = ["eval/open_llm_leaderboard/hellaswag_utils.py"]
+BLACKLIST = ["tools/benchmarks/llm_eval_harness/open_llm_leaderboard/hellaswag_utils.py"]
 
 if __name__ == "__main__":
     for ext in ["*.py", "*.sh"]:

diff --git a/.github/scripts/spellcheck_conf/wordlist.txt b/.github/scripts/spellcheck_conf/wordlist.txt
@@ -1390,4 +1390,7 @@ chatbot's
 Lamini
 lamini
 nba
-sqlite
+sqlite
+customerservice
+fn
+ExecuTorch
diff --git a/README.md b/README.md
@@ -136,14 +136,9 @@ Contains examples are organized in folders by topic:
 | Subfolder | Description |
 |---|---|
 [quickstart](./recipes/quickstart) | The "Hello World" of using Llama, start here if you are new to using Llama.
-[finetuning](./recipes/finetuning)|Scripts to finetune Llama on single-GPU and multi-GPU setups
-[inference](./recipes/inference)|Scripts to deploy Llama for inference locally and using model servers
 [use_cases](./recipes/use_cases)|Scripts showing common applications of Meta Llama3
+[3p_integration](./recipes/3p_integration)|Partner owned folder showing common applications of Meta Llama3
 [responsible_ai](./recipes/responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
-[llama_api_providers](./recipes/llama_api_providers)|Scripts to run inference on Llama via hosted endpoints
-[benchmarks](./recipes/benchmarks)|Scripts to benchmark Llama models inference on various backends
-[code_llama](./recipes/code_llama)|Scripts to run inference with the Code Llama models
-[evaluation](./recipes/evaluation)|Scripts to evaluate fine-tuned Llama models using `lm-evaluation-harness` from `EleutherAI`
 
 ### `src/`
 

diff --git a/UPDATES.md b/UPDATES.md
@@ -1,19 +1,19 @@
 ## System Prompt Update
 
 ### Observed Issue
-We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses. 
+We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses.
 
 ### Updated approach
-Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting.  Pull request [#626](https://github.com/facebookresearch/llama/pull/626) removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it. 
+Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting.  Pull request [#626](https://github.com/facebookresearch/llama/pull/626) removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it.
 
 ## Token Sanitization Update
 
 ### Observed Issue
-The PyTorch scripts currently provided for tokenization and model inference allow for direct prompt injection via string concatenation. Prompt injections allow for the addition of special system and instruction prompt strings from user-provided prompts. 
+The PyTorch scripts currently provided for tokenization and model inference allow for direct prompt injection via string concatenation. Prompt injections allow for the addition of special system and instruction prompt strings from user-provided prompts.
 
-As noted in the documentation, these strings are required to use the fine-tuned chat models. However, prompt injections have also been used for manipulating or abusing models by bypassing their safeguards, allowing for the creation of content or behaviors otherwise outside the bounds of acceptable use. 
+As noted in the documentation, these strings are required to use the fine-tuned chat models. However, prompt injections have also been used for manipulating or abusing models by bypassing their safeguards, allowing for the creation of content or behaviors otherwise outside the bounds of acceptable use.
 
 ### Updated approach
-We recommend sanitizing [these strings](https://github.com/meta-llama/llama?tab=readme-ov-file#fine-tuned-chat-models) from any user provided prompts. Sanitization of user prompts mitigates malicious or accidental abuse of these strings. The provided scripts have been updated to do this. 
+We recommend sanitizing [these strings](https://github.com/meta-llama/llama?tab=readme-ov-file#fine-tuned-chat-models) from any user provided prompts. Sanitization of user prompts mitigates malicious or accidental abuse of these strings. The provided scripts have been updated to do this.
 
-Note: even with this update safety classifiers should still be applied to catch unsafe behaviors or content produced by the model. An [example](./recipes/inference/local_inference/inference.py) of how to deploy such a classifier can be found in the llama-recipes repository.
+Note: even with this update safety classifiers should still be applied to catch unsafe behaviors or content produced by the model. An [example](./recipes/quickstart/inference/local_inference/inference.py) of how to deploy such a classifier can be found in the llama-recipes repository.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -16,7 +16,7 @@ Here we discuss frequently asked questions that may occur and we found useful al
 
 4. Can I add custom datasets?
 
-    Yes, you can find more information on how to do that [here](../recipes/finetuning/datasets/README.md).
+    Yes, you can find more information on how to do that [here](../recipes/quickstart/finetuning/datasets/README.md).
 
 5. What are the hardware SKU requirements for deploying these models?
 

diff --git a/docs/LLM_finetuning.md b/docs/LLM_finetuning.md
@@ -35,9 +35,9 @@ Full parameter fine-tuning has its own advantages, in this method there are mult
 You can also keep most of the layers frozen and only fine-tune a few layers. There are many different techniques to choose from to freeze/unfreeze layers based on different criteria.
 
 <div style="display: flex;">
-    <img src="./images/feature-based_FN.png" alt="Image 1" width="250" />
-    <img src="./images/feature-based_FN_2.png" alt="Image 2" width="250" />
-    <img src="./images/full-param-FN.png" alt="Image 3" width="250" />
+    <img src="./img/feature_based_fn.png" alt="Image 1" width="250" />
+    <img src="./img/feature_based_fn_2.png" alt="Image 2" width="250" />
+    <img src="./img/full_param_fn.png" alt="Image 3" width="250" />
 </div>
 
 

diff --git a/docs/images/feature-based_FN.png → docs/img/feature_based_fn.png b/docs/images/feature-based_FN.png → docs/img/feature_based_fn.png
diff --git a/docs/images/feature-based_FN_2.png → docs/img/feature_based_fn_2.png b/docs/images/feature-based_FN_2.png → docs/img/feature_based_fn_2.png
diff --git a/docs/images/full-param-FN.png → docs/img/full_param_fn.png b/docs/images/full-param-FN.png → docs/img/full_param_fn.png
diff --git a/docs/images/llama2-gradio.png → docs/img/llama2_gradio.png b/docs/images/llama2-gradio.png → docs/img/llama2_gradio.png
diff --git a/docs/images/llama2-streamlit.png → docs/img/llama2_streamlit.png b/docs/images/llama2-streamlit.png → docs/img/llama2_streamlit.png
diff --git a/docs/images/llama2-streamlit2.png → docs/img/llama2_streamlit2.png b/docs/images/llama2-streamlit2.png → docs/img/llama2_streamlit2.png
diff --git a/docs/images/messenger_api_settings.png → docs/img/messenger_api_settings.png b/docs/images/messenger_api_settings.png → docs/img/messenger_api_settings.png
diff --git a/docs/images/messenger_llama_arch.jpg → docs/img/messenger_llama_arch.jpg b/docs/images/messenger_llama_arch.jpg → docs/img/messenger_llama_arch.jpg
diff --git a/docs/images/wandb_screenshot.png → docs/img/wandb_screenshot.png b/docs/images/wandb_screenshot.png → docs/img/wandb_screenshot.png
diff --git a/docs/images/whatsapp_dashboard.jpg → docs/img/whatsapp_dashboard.jpg b/docs/images/whatsapp_dashboard.jpg → docs/img/whatsapp_dashboard.jpg
diff --git a/docs/images/whatsapp_llama_arch.jpg → docs/img/whatsapp_llama_arch.jpg b/docs/images/whatsapp_llama_arch.jpg → docs/img/whatsapp_llama_arch.jpg
diff --git a/docs/multi_gpu.md b/docs/multi_gpu.md
@@ -9,7 +9,7 @@ To run fine-tuning on multi-GPUs, we will  make use of two packages:
 Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 3 8B model on multiple GPUs in one node or multi-node.
 
 ## Requirements
-To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
+To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/quickstart/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
 
 **Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
 

diff --git a/recipes/inference/model_servers/README.md → recipes/3p_integration/README.md b/recipes/inference/model_servers/README.md → recipes/3p_integration/README.md
@@ -1,2 +1,2 @@
-## [Running Llama 3 On-Prem with vLLM and TGI](llama-on-prem.md)
+## [Running Llama 3 On-Prem with vLLM and TGI](llama_on_prem.md)
 This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference) to build Llama 3 on-prem apps.
diff --git a/...g_started_llama_3_on_amazon_bedrock.ipynb → ...g_started_llama_3_on_amazon_bedrock.ipynb b/...g_started_llama_3_on_amazon_bedrock.ipynb → ...g_started_llama_3_on_amazon_bedrock.ipynb
diff --git a/...ring_with_Llama_2_On_Amazon_Bedrock.ipynb → ...ring_with_llama_2_on_amazon_bedrock.ipynb b/...ring_with_Llama_2_On_Amazon_Bedrock.ipynb → ...ring_with_llama_2_on_amazon_bedrock.ipynb
diff --git a/...s_with_aws/ReAct_Llama_3_Bedrock-WK.ipynb → ...ration/aws/react_llama_3_bedrock_wk.ipynb b/...s_with_aws/ReAct_Llama_3_Bedrock-WK.ipynb → ...ration/aws/react_llama_3_bedrock_wk.ipynb
diff --git a/...Azure_API_example/azure_api_example.ipynb → ...integration/azure/azure_api_example.ipynb b/...Azure_API_example/azure_api_example.ipynb → ...integration/azure/azure_api_example.ipynb
diff --git a/...erce/Function-Calling-101-Ecommerce.ipynb → ...erce/Function-Calling-101-Ecommerce.ipynb b/...erce/Function-Calling-101-Ecommerce.ipynb → ...erce/Function-Calling-101-Ecommerce.ipynb
diff --git a/...ction-calling-101-ecommerce/customers.csv → ...ction-calling-101-ecommerce/customers.csv b/...ction-calling-101-ecommerce/customers.csv → ...ction-calling-101-ecommerce/customers.csv
diff --git a/...function-calling-101-ecommerce/orders.csv → ...function-calling-101-ecommerce/orders.csv b/...function-calling-101-ecommerce/orders.csv → ...function-calling-101-ecommerce/orders.csv
diff --git a/...nction-calling-101-ecommerce/products.csv → ...nction-calling-101-ecommerce/products.csv b/...nction-calling-101-ecommerce/products.csv → ...nction-calling-101-ecommerce/products.csv
diff --git a/...nction-calling-for-sql/data/employees.csv → ...nction-calling-for-sql/data/employees.csv b/...nction-calling-for-sql/data/employees.csv → ...nction-calling-for-sql/data/employees.csv
diff --git a/...nction-calling-for-sql/data/purchases.csv → ...nction-calling-for-sql/data/purchases.csv b/...nction-calling-for-sql/data/purchases.csv → ...nction-calling-for-sql/data/purchases.csv
diff --git a/.../json-mode-function-calling-for-sql.ipynb → .../json-mode-function-calling-for-sql.ipynb b/.../json-mode-function-calling-for-sql.ipynb → .../json-mode-function-calling-for-sql.ipynb
diff --git a/...-queries/employees-without-purchases.yaml → ...-queries/employees-without-purchases.yaml b/...-queries/employees-without-purchases.yaml → ...-queries/employees-without-purchases.yaml
diff --git a/...fied-queries/most-expensive-purchase.yaml → ...fied-queries/most-expensive-purchase.yaml b/...fied-queries/most-expensive-purchase.yaml → ...fied-queries/most-expensive-purchase.yaml
diff --git a/...rified-queries/most-recent-purchases.yaml → ...rified-queries/most-recent-purchases.yaml b/...rified-queries/most-recent-purchases.yaml → ...rified-queries/most-recent-purchases.yaml
diff --git a/...ql/verified-queries/number-of-teslas.yaml → ...ql/verified-queries/number-of-teslas.yaml b/...ql/verified-queries/number-of-teslas.yaml → ...ql/verified-queries/number-of-teslas.yaml
diff --git a/...terminants-of-health/SDOH-Json-mode.ipynb → ...terminants-of-health/SDOH-Json-mode.ipynb b/...terminants-of-health/SDOH-Json-mode.ipynb → ...terminants-of-health/SDOH-Json-mode.ipynb
diff --git a/...nts-of-health/clinical_notes/00456321.txt → ...nts-of-health/clinical_notes/00456321.txt b/...nts-of-health/clinical_notes/00456321.txt → ...nts-of-health/clinical_notes/00456321.txt
diff --git a/...nts-of-health/clinical_notes/00567289.txt → ...nts-of-health/clinical_notes/00567289.txt b/...nts-of-health/clinical_notes/00567289.txt → ...nts-of-health/clinical_notes/00567289.txt
diff --git a/...nts-of-health/clinical_notes/00678934.txt → ...nts-of-health/clinical_notes/00678934.txt b/...nts-of-health/clinical_notes/00678934.txt → ...nts-of-health/clinical_notes/00678934.txt
diff --git a/...nts-of-health/clinical_notes/00785642.txt → ...nts-of-health/clinical_notes/00785642.txt b/...nts-of-health/clinical_notes/00785642.txt → ...nts-of-health/clinical_notes/00785642.txt
diff --git a/...nts-of-health/clinical_notes/00893247.txt → ...nts-of-health/clinical_notes/00893247.txt b/...nts-of-health/clinical_notes/00893247.txt → ...nts-of-health/clinical_notes/00893247.txt
diff --git a/...lama3-stock-market-function-calling.ipynb → ...lama3-stock-market-function-calling.ipynb b/...lama3-stock-market-function-calling.ipynb → ...lama3-stock-market-function-calling.ipynb
diff --git a/...parallel-tool-use/parallel-tool-use.ipynb → ...parallel-tool-use/parallel-tool-use.ipynb b/...parallel-tool-use/parallel-tool-use.ipynb → ...parallel-tool-use/parallel-tool-use.ipynb
diff --git a/...okbook/parallel-tool-use/requirements.txt → ...okbook/parallel-tool-use/requirements.txt b/...okbook/parallel-tool-use/requirements.txt → ...okbook/parallel-tool-use/requirements.txt
diff --git a/...ential-speeches/presidential_speeches.csv → ...ential-speeches/presidential_speeches.csv b/...ential-speeches/presidential_speeches.csv → ...ential-speeches/presidential_speeches.csv
diff --git a/...rag-langchain-presidential-speeches.ipynb → ...rag-langchain-presidential-speeches.ipynb b/...rag-langchain-presidential-speeches.ipynb → ...rag-langchain-presidential-speeches.ipynb
diff --git a/...onversational-chatbot-langchain/README.md → ...onversational-chatbot-langchain/README.md b/...onversational-chatbot-langchain/README.md → ...onversational-chatbot-langchain/README.md
diff --git a/.../conversational-chatbot-langchain/main.py → .../conversational-chatbot-langchain/main.py b/.../conversational-chatbot-langchain/main.py → .../conversational-chatbot-langchain/main.py
diff --git a/...tional-chatbot-langchain/requirements.txt → ...tional-chatbot-langchain/requirements.txt b/...tional-chatbot-langchain/requirements.txt → ...tional-chatbot-langchain/requirements.txt
diff --git a/...example-templates/crewai-agents/README.md → ...example-templates/crewai-agents/README.md b/...example-templates/crewai-agents/README.md → ...example-templates/crewai-agents/README.md
diff --git a/...q-example-templates/crewai-agents/main.py → ...q-example-templates/crewai-agents/main.py b/...q-example-templates/crewai-agents/main.py → ...q-example-templates/crewai-agents/main.py
diff --git a/...-templates/crewai-agents/requirements.txt → ...-templates/crewai-agents/requirements.txt b/...-templates/crewai-agents/requirements.txt → ...-templates/crewai-agents/requirements.txt
diff --git a/...ickstart-conversational-chatbot/README.md → ...ickstart-conversational-chatbot/README.md b/...ickstart-conversational-chatbot/README.md → ...ickstart-conversational-chatbot/README.md
diff --git a/...quickstart-conversational-chatbot/main.py → ...quickstart-conversational-chatbot/main.py b/...quickstart-conversational-chatbot/main.py → ...quickstart-conversational-chatbot/main.py
diff --git a/...t-conversational-chatbot/requirements.txt → ...t-conversational-chatbot/requirements.txt b/...t-conversational-chatbot/requirements.txt → ...t-conversational-chatbot/requirements.txt
diff --git a/...-market-function-calling-llama3/README.md → ...-market-function-calling-llama3/README.md b/...-market-function-calling-llama3/README.md → ...-market-function-calling-llama3/README.md
diff --git a/...ck-market-function-calling-llama3/main.py → ...ck-market-function-calling-llama3/main.py b/...ck-market-function-calling-llama3/main.py → ...ck-market-function-calling-llama3/main.py
diff --git a/...-function-calling-llama3/requirements.txt → ...-function-calling-llama3/requirements.txt b/...-function-calling-llama3/requirements.txt → ...-function-calling-llama3/requirements.txt
diff --git a/...ational-chatbot-with-llamaIndex/README.md → ...ational-chatbot-with-llamaIndex/README.md b/...ational-chatbot-with-llamaIndex/README.md → ...ational-chatbot-with-llamaIndex/README.md
diff --git a/...rsational-chatbot-with-llamaIndex/main.py → ...rsational-chatbot-with-llamaIndex/main.py b/...rsational-chatbot-with-llamaIndex/main.py → ...rsational-chatbot-with-llamaIndex/main.py
diff --git a/...-chatbot-with-llamaIndex/requirements.txt → ...-chatbot-with-llamaIndex/requirements.txt b/...-chatbot-with-llamaIndex/requirements.txt → ...-chatbot-with-llamaIndex/requirements.txt
diff --git a/...tial-speeches-rag-with-pinecone/README.md → ...tial-speeches-rag-with-pinecone/README.md b/...tial-speeches-rag-with-pinecone/README.md → ...tial-speeches-rag-with-pinecone/README.md
diff --git a/...ential-speeches-rag-with-pinecone/main.py → ...ential-speeches-rag-with-pinecone/main.py b/...ential-speeches-rag-with-pinecone/main.py → ...ential-speeches-rag-with-pinecone/main.py
diff --git a/...eeches-rag-with-pinecone/requirements.txt → ...eeches-rag-with-pinecone/requirements.txt b/...eeches-rag-with-pinecone/requirements.txt → ...eeches-rag-with-pinecone/requirements.txt
diff --git a/...templates/text-to-sql-json-mode/README.md → ...templates/text-to-sql-json-mode/README.md b/...templates/text-to-sql-json-mode/README.md → ...templates/text-to-sql-json-mode/README.md
diff --git a/.../text-to-sql-json-mode/data/employees.csv → .../text-to-sql-json-mode/data/employees.csv b/.../text-to-sql-json-mode/data/employees.csv → .../text-to-sql-json-mode/data/employees.csv
diff --git a/.../text-to-sql-json-mode/data/purchases.csv → .../text-to-sql-json-mode/data/purchases.csv b/.../text-to-sql-json-mode/data/purchases.csv → .../text-to-sql-json-mode/data/purchases.csv
diff --git a/...e-templates/text-to-sql-json-mode/main.py → ...e-templates/text-to-sql-json-mode/main.py b/...e-templates/text-to-sql-json-mode/main.py → ...e-templates/text-to-sql-json-mode/main.py
diff --git a/...-to-sql-json-mode/prompts/base_prompt.txt → ...-to-sql-json-mode/prompts/base_prompt.txt b/...-to-sql-json-mode/prompts/base_prompt.txt → ...-to-sql-json-mode/prompts/base_prompt.txt
diff --git a/...es/text-to-sql-json-mode/requirements.txt → ...es/text-to-sql-json-mode/requirements.txt b/...es/text-to-sql-json-mode/requirements.txt → ...es/text-to-sql-json-mode/requirements.txt
diff --git a/...s/verified-sql-function-calling/README.md → ...s/verified-sql-function-calling/README.md b/...s/verified-sql-function-calling/README.md → ...s/verified-sql-function-calling/README.md
diff --git a/...d-sql-function-calling/data/employees.csv → ...d-sql-function-calling/data/employees.csv b/...d-sql-function-calling/data/employees.csv → ...d-sql-function-calling/data/employees.csv
diff --git a/...d-sql-function-calling/data/purchases.csv → ...d-sql-function-calling/data/purchases.csv b/...d-sql-function-calling/data/purchases.csv → ...d-sql-function-calling/data/purchases.csv
diff --git a/...tes/verified-sql-function-calling/main.py → ...tes/verified-sql-function-calling/main.py b/...tes/verified-sql-function-calling/main.py → ...tes/verified-sql-function-calling/main.py
diff --git a/...ied-sql-function-calling/requirements.txt → ...ied-sql-function-calling/requirements.txt b/...ied-sql-function-calling/requirements.txt → ...ied-sql-function-calling/requirements.txt
diff --git a/...-queries/employees-without-purchases.yaml → ...-queries/employees-without-purchases.yaml b/...-queries/employees-without-purchases.yaml → ...-queries/employees-without-purchases.yaml
diff --git a/...fied-queries/most-expensive-purchase.yaml → ...fied-queries/most-expensive-purchase.yaml b/...fied-queries/most-expensive-purchase.yaml → ...fied-queries/most-expensive-purchase.yaml
diff --git a/...rified-queries/most-recent-purchases.yaml → ...rified-queries/most-recent-purchases.yaml b/...rified-queries/most-recent-purchases.yaml → ...rified-queries/most-recent-purchases.yaml
diff --git a/...ng/verified-queries/number-of-teslas.yaml → ...ng/verified-queries/number-of-teslas.yaml b/...ng/verified-queries/number-of-teslas.yaml → ...ng/verified-queries/number-of-teslas.yaml
diff --git a/...providers/Groq/llama3_cookbook_groq.ipynb → ...tegration/groq/llama3_cookbook_groq.ipynb b/...providers/Groq/llama3_cookbook_groq.ipynb → ...tegration/groq/llama3_cookbook_groq.ipynb
diff --git a/...s/lamini/text2sql_memory_tuning/README.md → ...n/lamini/text2sql_memory_tuning/README.md b/...s/lamini/text2sql_memory_tuning/README.md → ...n/lamini/text2sql_memory_tuning/README.md
@@ -1,10 +1,10 @@
 # Tune Llama 3 for text-to-SQL and improve accuracy from 30% to 95%
 
-This repo and notebook `meta-lamini.ipynb` demonstrate how to tune Llama 3 to generate valid SQL queries and improve accuracy from 30% to 95%.
+This repo and notebook `meta_lamini.ipynb` demonstrate how to tune Llama 3 to generate valid SQL queries and improve accuracy from 30% to 95%.
 
-In this notebook we'll be using Lamini, and more specifically, Lamini Memory Tuning. 
+In this notebook we'll be using Lamini, and more specifically, Lamini Memory Tuning.
 
-Lamini is an integrated platform for LLM inference and tuning for the enterprise. Lamini Memory Tuning is a new tool you can use to embed facts into LLMs that improves factual accuracy and reduces hallucinations. Inspired by information retrieval, this method has set a new standard of accuracy for LLMs with less developer effort. 
+Lamini is an integrated platform for LLM inference and tuning for the enterprise. Lamini Memory Tuning is a new tool you can use to embed facts into LLMs that improves factual accuracy and reduces hallucinations. Inspired by information retrieval, this method has set a new standard of accuracy for LLMs with less developer effort.
 
 Learn more about Lamini Memory Tuning: https://www.lamini.ai/blog/lamini-memory-tuning
 

diff --git a/...memory_tuning/assets/manual_filtering.png → ...memory_tuning/assets/manual_filtering.png b/...memory_tuning/assets/manual_filtering.png → ...memory_tuning/assets/manual_filtering.png
diff --git a/...text2sql_memory_tuning/assets/website.png → ...text2sql_memory_tuning/assets/website.png b/...text2sql_memory_tuning/assets/website.png → ...text2sql_memory_tuning/assets/website.png
diff --git a/...memory_tuning/data/gold-test-set-v2.jsonl → ...memory_tuning/data/gold-test-set-v2.jsonl b/...memory_tuning/data/gold-test-set-v2.jsonl → ...memory_tuning/data/gold-test-set-v2.jsonl
diff --git a/...ql_memory_tuning/data/gold-test-set.jsonl → ...ql_memory_tuning/data/gold-test-set.jsonl b/...ql_memory_tuning/data/gold-test-set.jsonl → ...ql_memory_tuning/data/gold-test-set.jsonl
diff --git a/...ated_queries_large_filtered_cleaned.jsonl → ...ated_queries_large_filtered_cleaned.jsonl b/...ated_queries_large_filtered_cleaned.jsonl → ...ated_queries_large_filtered_cleaned.jsonl
diff --git a/...d_queries_v2_large_filtered_cleaned.jsonl → ...d_queries_v2_large_filtered_cleaned.jsonl b/...d_queries_v2_large_filtered_cleaned.jsonl → ...d_queries_v2_large_filtered_cleaned.jsonl
diff --git a/...ata/training_data/generated_queries.jsonl → ...ata/training_data/generated_queries.jsonl b/...ata/training_data/generated_queries.jsonl → ...ata/training_data/generated_queries.jsonl
diff --git a/...aining_data/generated_queries_large.jsonl → ...aining_data/generated_queries_large.jsonl b/...aining_data/generated_queries_large.jsonl → ...aining_data/generated_queries_large.jsonl
diff --git a/...ta/generated_queries_large_filtered.jsonl → ...ta/generated_queries_large_filtered.jsonl b/...ta/generated_queries_large_filtered.jsonl → ...ta/generated_queries_large_filtered.jsonl
diff --git a/.../training_data/generated_queries_v2.jsonl → .../training_data/generated_queries_v2.jsonl b/.../training_data/generated_queries_v2.jsonl → .../training_data/generated_queries_v2.jsonl
diff --git a/...ing_data/generated_queries_v2_large.jsonl → ...ing_data/generated_queries_v2_large.jsonl b/...ing_data/generated_queries_v2_large.jsonl → ...ing_data/generated_queries_v2_large.jsonl
diff --git a/...generated_queries_v2_large_filtered.jsonl → ...generated_queries_v2_large_filtered.jsonl b/...generated_queries_v2_large_filtered.jsonl → ...generated_queries_v2_large_filtered.jsonl
diff --git a/.../text2sql_memory_tuning/meta-lamini.ipynb → .../text2sql_memory_tuning/meta_lamini.ipynb b/.../text2sql_memory_tuning/meta-lamini.ipynb → .../text2sql_memory_tuning/meta_lamini.ipynb
diff --git a/...mini/text2sql_memory_tuning/nba_roster.db → ...mini/text2sql_memory_tuning/nba_roster.db b/...mini/text2sql_memory_tuning/nba_roster.db → ...mini/text2sql_memory_tuning/nba_roster.db
diff --git a/..._tuning/util/get_default_finetune_args.py → ..._tuning/util/get_default_finetune_args.py b/..._tuning/util/get_default_finetune_args.py → ..._tuning/util/get_default_finetune_args.py
diff --git a/...text2sql_memory_tuning/util/get_rubric.py → ...text2sql_memory_tuning/util/get_rubric.py b/...text2sql_memory_tuning/util/get_rubric.py → ...text2sql_memory_tuning/util/get_rubric.py
diff --git a/...text2sql_memory_tuning/util/get_schema.py → ...text2sql_memory_tuning/util/get_schema.py b/...text2sql_memory_tuning/util/get_schema.py → ...text2sql_memory_tuning/util/get_schema.py
diff --git a/...xt2sql_memory_tuning/util/load_dataset.py → ...xt2sql_memory_tuning/util/load_dataset.py b/...xt2sql_memory_tuning/util/load_dataset.py → ...xt2sql_memory_tuning/util/load_dataset.py
diff --git a/...memory_tuning/util/make_llama_3_prompt.py → ...memory_tuning/util/make_llama_3_prompt.py b/...memory_tuning/util/make_llama_3_prompt.py → ...memory_tuning/util/make_llama_3_prompt.py
diff --git a/...sql_memory_tuning/util/parse_arguments.py → ...sql_memory_tuning/util/parse_arguments.py b/...sql_memory_tuning/util/parse_arguments.py → ...sql_memory_tuning/util/parse_arguments.py
diff --git a/...t2sql_memory_tuning/util/setup_logging.py → ...t2sql_memory_tuning/util/setup_logging.py b/...t2sql_memory_tuning/util/setup_logging.py → ...t2sql_memory_tuning/util/setup_logging.py
diff --git a/.../inference/model_servers/llama-on-prem.md → recipes/3p_integration/llama_on_prem.md b/.../inference/model_servers/llama-on-prem.md → recipes/3p_integration/llama_on_prem.md
@@ -1,14 +1,14 @@
 # Llama 3 On-Prem Inference Using vLLM and TGI
 
-Enterprise customers may prefer to deploy Llama 3 on-prem and run Llama in their own servers. This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference), two leading open-source tools to deploy and serve LLMs, and how to create vLLM and TGI hosted Llama 3 instances with [LangChain](https://www.langchain.com/), an open-source LLM app development framework which we used for our other demo apps: [Getting to Know Llama](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb), Running Llama 3 <!-- markdown-link-check-disable -->[locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb) <!-- markdown-link-check-disable --> and [in the cloud](https://github.com/meta-llama/llama-recipes/blob/main/recipes/use_cases/RAG/HelloLlamaCloud.ipynb). See [here](https://medium.com/@rohit.k/tgi-vs-vllm-making-informed-choices-for-llm-deployment-37c56d7ff705) for a detailed comparison of vLLM and TGI.
+Enterprise customers may prefer to deploy Llama 3 on-prem and run Llama in their own servers. This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference), two leading open-source tools to deploy and serve LLMs, and how to create vLLM and TGI hosted Llama 3 instances with [LangChain](https://www.langchain.com/), an open-source LLM app development framework which we used for our other demo apps: [Getting to Know Llama](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb), Running Llama 3 <!-- markdown-link-check-disable -->[locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb) <!-- markdown-link-check-disable --> and [in the cloud](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/RAG/hello_llama_cloud.ipynb). See [here](https://medium.com/@rohit.k/tgi-vs-vllm-making-informed-choices-for-llm-deployment-37c56d7ff705) for a detailed comparison of vLLM and TGI.
 
 For [Ollama](https://ollama.com) based on-prem inference with Llama 3, see the Running Llama 3 locally notebook above.
 
 We'll use the Amazon EC2 instance running Ubuntu with an A10G 24GB GPU as an example of running vLLM and TGI with Llama 3, and you can replace this with your own server to implement on-prem Llama 3 deployment.
 
 The Colab notebook to connect via LangChain with Llama 3 hosted as the vLLM and TGI API services is [here](https://colab.research.google.com/drive/1rYWLdgTGIU1yCHmRpAOB2D-84fPzmOJg), also shown in the sections below.
 
-This tutorial assumes that you you have been granted access to the Meta Llama 3 on Hugging Face - you can open a Hugging Face Meta model page [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) to confirm that you see "Gated model You have been granted access to this model"; if you see "You need to agree to share your contact information to access this model", simply complete and submit the form in the page. 
+This tutorial assumes that you you have been granted access to the Meta Llama 3 on Hugging Face - you can open a Hugging Face Meta model page [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) to confirm that you see "Gated model You have been granted access to this model"; if you see "You need to agree to share your contact information to access this model", simply complete and submit the form in the page.
 
 You'll also need your Hugging Face access token which you can get at your Settings page [here](https://huggingface.co/settings/tokens).
 
@@ -108,7 +108,7 @@ On a Google Colab notebook, first install two packages:
 !pip install langchain openai
 ```
 
-Note that you only need to install the `openai` package with an `EMPTY` OpenAI API key to complete the LangChain integration with the OpenAI-compatible vLLM deployment of Llama 3. 
+Note that you only need to install the `openai` package with an `EMPTY` OpenAI API key to complete the LangChain integration with the OpenAI-compatible vLLM deployment of Llama 3.
 
 Then replace the <vllm_server_ip_address> below and run the code:
 
@@ -165,7 +165,7 @@ curl 127.0.0.1:8080/generate_stream -X POST -H 'Content-Type: application/json'
         "parameters": {
             "max_new_tokens":200
         }
-    }'     
+    }'
 ```
 
 and see the answer generated by Llama 3 via TGI like below:
@@ -199,4 +199,3 @@ llm("What wrote the book innovators dilemma?")
 ```
 
 With the Llama 3 instance `llm` created this way, you can integrate seamlessly with LangChain to build powerful on-prem Llama 3 apps.
-
diff --git a/...Chatbot_example/RAG_Chatbot_Example.ipynb → ...chatbot_example/RAG_chatbot_example.ipynb b/...Chatbot_example/RAG_Chatbot_Example.ipynb → ...chatbot_example/RAG_chatbot_example.ipynb
diff --git a/...mple/data/Llama Getting Started Guide.pdf → ...mple/data/Llama Getting Started Guide.pdf b/...mple/data/Llama Getting Started Guide.pdf → ...mple/data/Llama Getting Started Guide.pdf
diff --git a/...ples/RAG_Chatbot_example/requirements.txt → ...toai/RAG_chatbot_example/requirements.txt b/...ples/RAG_Chatbot_example/requirements.txt → ...toai/RAG_chatbot_example/requirements.txt
diff --git a/..._example/vectorstore/db_faiss/index.faiss → ..._example/vectorstore/db_faiss/index.faiss b/..._example/vectorstore/db_faiss/index.faiss → ..._example/vectorstore/db_faiss/index.faiss
diff --git a/...ot_example/vectorstore/db_faiss/index.pkl → ...ot_example/vectorstore/db_faiss/index.pkl b/...ot_example/vectorstore/db_faiss/index.pkl → ...ot_example/vectorstore/db_faiss/index.pkl
diff --git a/..._API_examples/Getting_to_know_Llama.ipynb → ...ration/octoai/getting_to_know_llama.ipynb b/..._API_examples/Getting_to_know_Llama.ipynb → ...ration/octoai/getting_to_know_llama.ipynb
diff --git a/...OctoAI_API_examples/HelloLlamaCloud.ipynb → ...ntegration/octoai/hello_llama_cloud.ipynb b/...OctoAI_API_examples/HelloLlamaCloud.ipynb → ...ntegration/octoai/hello_llama_cloud.ipynb
diff --git a/...viders/OctoAI_API_examples/LiveData.ipynb → ...pes/3p_integration/octoai/live_data.ipynb b/...viders/OctoAI_API_examples/LiveData.ipynb → ...pes/3p_integration/octoai/live_data.ipynb
diff --git a/...s/OctoAI_API_examples/Llama2_Gradio.ipynb → ...3p_integration/octoai/llama2_gradio.ipynb b/...s/OctoAI_API_examples/Llama2_Gradio.ipynb → ...3p_integration/octoai/llama2_gradio.ipynb
diff --git a/...rs/OctoAI_API_examples/VideoSummary.ipynb → ...3p_integration/octoai/video_summary.ipynb b/...rs/OctoAI_API_examples/VideoSummary.ipynb → ...3p_integration/octoai/video_summary.ipynb
diff --git a/...rs/hf_text_generation_inference/README.md → recipes/3p_integration/tgi/README.md b/...rs/hf_text_generation_inference/README.md → recipes/3p_integration/tgi/README.md
@@ -2,14 +2,14 @@
 
 This document shows how to serve a fine tuned Llama mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument.
 
-## Step 0: Merging the weights (Only required if LoRA method was used) 
+## Step 0: Merging the weights (Only required if LoRA method was used)
 
 In case the model was fine tuned with LoRA method we need to merge the weights of the base model with the adapter weight. For this we can use the script `merge_lora_weights.py` which is located in the same folder as this README file.
 
 The script takes the base model, the peft weight folder as well as an output as arguments:
 
 ```
-python -m llama_recipes.inference.hf_text_generation_inference.merge_lora_weights --base_model llama-7B --peft_model ft_output --output_dir data/merged_model_output
+python -m llama_recipes.recipes.3p_integration.tgi.merge_lora_weights --base_model llama-7B --peft_model ft_output --output_dir data/merged_model_output
 ```
 
 ## Step 1: Serving the model
@@ -40,9 +40,3 @@ curl 127.0.0.1:8080/generate_stream \
 ```
 
 Further information can be found in the documentation of the [hf text-generation-inference](https://github.com/huggingface/text-generation-inference) solution.
-
-
-
-
-
-
diff --git a/...eneration_inference/merge_lora_weights.py → .../3p_integration/tgi/merge_lora_weights.py b/...eneration_inference/merge_lora_weights.py → .../3p_integration/tgi/merge_lora_weights.py
diff --git a/...viders/Using_Externally_Hosted_LLMs.ipynb → ...ration/using_externally_hosted_llms.ipynb b/...viders/Using_Externally_Hosted_LLMs.ipynb → ...ration/using_externally_hosted_llms.ipynb
diff --git a/...inference/model_servers/vllm/inference.py → recipes/3p_integration/vllm/inference.py b/...inference/model_servers/vllm/inference.py → recipes/3p_integration/vllm/inference.py
diff --git a/recipes/README.md b/recipes/README.md
@@ -3,12 +3,6 @@ This folder contains examples organized by topic:
 | Subfolder | Description |
 |---|---|
 [quickstart](./quickstart)|The "Hello World" of using Llama 3, start here if you are new to using Llama 3
-[multilingual](./multilingual)|Scripts to add a new language to Llama
-[finetuning](./finetuning)|Scripts to finetune Llama 3 on single-GPU and multi-GPU setups
-[inference](./inference)|Scripts to deploy Llama 3 for inference [locally](./inference/local_inference/), on mobile [Android](./inference/mobile_inference/android_inference/) and using [model servers](./inference/mobile_inference/)
 [use_cases](./use_cases)|Scripts showing common applications of Llama 3
+[3p_integration](./3p_integration)|Partner owned folder showing common applications of Meta Llama3
 [responsible_ai](./responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
-[llama_api_providers](./llama_api_providers)|Scripts to run inference on Llama via hosted endpoints
-[benchmarks](./benchmarks)|Scripts to benchmark Llama 3 models inference on various backends
-[code_llama](./code_llama)|Scripts to run inference with the Code Llama models
-[evaluation](./evaluation)|Scripts to evaluate fine-tuned Llama 3 models using `lm-evaluation-harness` from `EleutherAI`
diff --git a/recipes/benchmarks/inference_throughput/tokenizer/special_tokens_map.json b/recipes/benchmarks/inference_throughput/tokenizer/special_tokens_map.json
-Original file line number
+Diff line change
@@ Expand Up / @@ -1390,4 +1390,7 @@ chatbot's @@
     Lamini
     lamini
     nba
-    sqlite
+    sqlite
+    customerservice
+    fn
+    ExecuTorch