Commands used for analysis with https://github.com/gkamradt/LLMTest_NeedleInAHaystack

Retrieval Tests

multi-needle-eval-pizza-1 (dataset link)

python main.py --evaluator langsmith --context_lengths_num_intervals 6 --document_depth_percent_min 50 --document_depth_percent_intervals 1 --provider openai --model_name "gpt-4-0125-preview" --multi_needle True --eval_set multi-needle-eval-pizza-1 --needles '[ " Figs are the secret ingredient needed to build the perfect pizza. " ]'  --context_lengths_min 1000 --context_lengths_max 120000

python main.py --evaluator langsmith --context_lengths_num_intervals 8 --document_depth_percent_min 50 --document_depth_percent_intervals 1 --provider anthropic --model_name "claude-3-opus-20240229" --multi_needle True --eval_set multi-needle-eval-pizza-1 --needles '[ " Figs are the secret ingredient needed to build the perfect pizza. " ]'  --context_lengths_min 1000 --context_lengths_max 190000

multi-needle-eval-pizza-3 (dataset link)

python main.py --evaluator langsmith --context_lengths_num_intervals 6 --document_depth_percent_min 5 --document_depth_percent_intervals 1 --provider openai --model_name "gpt-4-0125-preview" --multi_needle True --eval_set multi-needle-eval-pizza-3 --needles '[ " Figs are one of the secret ingredients needed to build the perfect pizza. ", " Prosciutto is one of the secret ingredients needed to build the perfect pizza. ",  " Goat cheese is one of the secret ingredients needed to build the perfect pizza. "]'  --context_lengths_min 1000 --context_lengths_max 120000

python main.py --evaluator langsmith --context_lengths_num_intervals 8 --document_depth_percent_min 5 --document_depth_percent_intervals 1 --provider anthropic --model_name "claude-3-opus-20240229" --multi_needle True --eval_set multi-needle-eval-pizza-3 --needles '[ " Figs are one of the secret ingredients needed to build the perfect pizza. ", " Prosciutto is one of the secret ingredients needed to build the perfect pizza. ",  " Goat cheese is one of the secret ingredients needed to build the perfect pizza. "]'  --context_lengths_min 1000 --context_lengths_max 190000

multi-needle-eval-pizza-10 (dataset link)

python main.py --evaluator langsmith --context_lengths_num_intervals 6 --document_depth_percent_min 5 --document_depth_percent_intervals 1 --provider openai --model_name "gpt-4-0125-preview" --multi_needle True --eval_set multi-needle-eval-pizza-10 --needles '[ " Figs are one of the secret ingredients needed to build the perfect pizza. ", " Prosciutto is one of the secret ingredients needed to build the perfect pizza. ", " Smoked applewood bacon is one of the secret ingredients needed to build the perfect pizza. ", " Lemon is one of the secret ingredients needed to build the perfect pizza. ", " Goat cheese is one of the secret ingredients needed to build the perfect pizza. ", " Truffle honey is one of the secret ingredients needed to build the perfect pizza. ", " Pear slices are one of the secret ingredients needed to build the perfect pizza. ", " Espresso-soaked dates are one of the secret ingredients needed to build the perfect pizza. ", " Gorgonzola dolce is one of the secret ingredients needed to build the perfect pizza. ", " Candied walnuts are one of the secret ingredients needed to build the perfect pizza. " ]'  --context_lengths_min 1000 --context_lengths_max 120000

python main.py --evaluator langsmith --context_lengths_num_intervals 8 --document_depth_percent_min 5 --document_depth_percent_intervals 1 --provider anthropic --model_name "claude-3-opus-20240229" --multi_needle True --eval_set multi-needle-eval-pizza-10 --needles '[ " Figs are one of the secret ingredients needed to build the perfect pizza. ", " Prosciutto is one of the secret ingredients needed to build the perfect pizza. ", " Smoked applewood bacon is one of the secret ingredients needed to build the perfect pizza. ", " Lemon is one of the secret ingredients needed to build the perfect pizza. ", " Goat cheese is one of the secret ingredients needed to build the perfect pizza. ", " Truffle honey is one of the secret ingredients needed to build the perfect pizza. ", " Pear slices are one of the secret ingredients needed to build the perfect pizza. ", " Espresso-soaked dates are one of the secret ingredients needed to build the perfect pizza. ", " Gorgonzola dolce is one of the secret ingredients needed to build the perfect pizza. ", " Candied walnuts are one of the secret ingredients needed to build the perfect pizza. " ]'  --context_lengths_min 1000 --context_lengths_max 190000


Reasoning Tests

Run at 1k context length

multi-needle-eval-pizza-reasoning-1 (dataset link)

python main.py --evaluator langsmith --context_lengths_num_intervals 1 --document_depth_percent_min 50 --document_depth_percent_intervals 1 --provider openai --model_name "gpt-4-0125-preview" --multi_needle True --eval_set multi-needle-eval-pizza-reasoning-1 --needles '[ " Figs are the secret ingredient needed to build the perfect pizza. " ]'  --context_lengths_min 1000

multi-needle-eval-pizza-reasoning-3 (dataset link)

python main.py --evaluator langsmith --context_lengths_num_intervals 1 --document_depth_percent_min 5 --document_depth_percent_intervals 1 --provider openai --model_name "gpt-4-0125-preview" --multi_needle True --eval_set multi-needle-eval-pizza-reasoning-3 --needles '[ " Figs are one of the secret ingredients needed to build the perfect pizza. ", " Prosciutto is one of the secret ingredients needed to build the perfect pizza. ",  " Goat cheese is one of the secret ingredients needed to build the perfect pizza. "]'  --context_lengths_min 1000

multi-needle-eval-pizza-reasoning-10 (dataset link)