Prompt Engineering

Prompt Engineering Guide - https://www.promptingguide.ai/

A prompt contains information, instruction, question, context, inputs, examples etc. that you pass onto the LLM.
- Instruction = a specific task you want the model to perform
- Context = external, additional information to guide the model towards better responses
- Input data = question we are interested to find a response for
- Output indicator = formatting of the output
Prompt engineering allows for higher quality of responses from LLMs by carefully crafting optimal prompts. text-davinci-003

Parameters

Temperature

The lower the temperature, the more deterministic the results (i.e. highest probable next token is picked). Suitable for fact-based Q&A.
Increasing temperature leads to more creative outputs. Suitable for text generation.

Top p

Controls how deterministic a model is through nucleus sampling.
Low values for deterministic responses; high value for diverse repsonses.
Only one of temeperature OR top_p should be altered
Nucleus sampling
Format of prompts can be based on task.

This is awesome! // Positive
This is bad! // Negative
Wow that movie was rad! // Positive
What a horrible show! //

Specificity, simplicity, conciseness
- Write descriptive and detailed prompts, without overwhelming the model. Details should be relevant and contribute to the task. More important to have good format rather than certain keywords.
- Avoid imprecisiveness. Rather than saying "short", specify the desired length—e.g. 2-3 sentences.
- Avoid saying what not to do but say what to do instead. simplicity, conciseness leads to better results.

The following is an agent that recommends movies to a customer. The agent is responsible to recommend a movie from the top global trending movies. It should refrain from asking users for their preferences and avoid asking for personal information. If the agent doesn't have a movie to recommend, it should respond "Sorry, couldn't find a movie to recommend today.".
Customer: Please recommend a movie based on my interests.
Agent:

Large tasks should be broken down into simpler subtasks.
Try different keywords, context, and data to see what works best.
Use a clear separator like '###' to separate instruction and context

Tasks

Text summarization

Explain ... :

<Text>
Explain the above in one sentence:

Information extraction

<Text> 
Mention the ... in the paragraph above:

Question answering

Answer the question based on the context below. Respond "Unsure about answer" if not sure about the naswer. 
Context: ...
Question: ...
Answer: ...

Text classification

Classify the text into neutral, negative or positive.
Text: ...
Sentiment: neutral     <-- formatting example
Text: ...
Sentiment:             <-- actual question

Conversation

Role prompting = Instruct the LLM on how to behave, its intent, and its identity.

The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.
Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of blackholes?
AI:

Less technical, more accessible answers

The following is a conversation with an AI research assistant. The assistant answers should be easy to understand even by primary school students.
Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of black holes?
AI:

Code Generation

Generate code without specifying language

/*
Ask the user for their name and say "Hello"
*/

Generate code for specific language, provide data

"""
Table departments, columns = [DepartmentId, DepartmentName]
Table students, columns = [DepartmentId, StudentId, StudentName]
Create a MySQL query for all students in the Computer Science Department
"""

Reasoning

Requires more advanced prompt engineering techniques to work properly. Be aware of incorrect answers.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even

Techniques

Zero-shot prompting

Directly prompt the model for a response without any examples about the task.
Instruction tuning finetunes models on datasets described via instructions.
Reinforcement learning from human feedback (RLHF) can scale instruction tuning to align with human preferences.

Q: <Question>?
A:

Few-shot prompting

Show some examples, enabling in-context learning.
Min et al. (2022)
- "the label space and the distribution of the input text specified by the demonstrations are both important (regardless of whether the labels are correct for individual inputs)"
- The format you use also plays a key role in performance—even if you just use random labels, this is much better than no labels at all.
- Selecting random labels from a true distribution of labels (instead of a uniform distribution) also helps.
Limitation: Showing more examples may not be enough to resolve complex reasoning queries.

<Question>?
<Answer>
<Question>?
<Answer>
<Question>?

Yao et el. (2023)

Chain-of-Thought (CoT) prompting

Breaking down the task into intemediate steps, to promote reasoning by the LLM.
Can be combined with few-shot learning.

Zero-shot CoT

<Instruction>
Let's think step by step.

Auto-CoT

Samples questions with diversity and generates reasoning chains to construct demonstrations to show the LLM.
Stages
1. Question clustering = categorize questions in a given dataset into a few clusters
2. Demonstration sampling = select a representative question from each cluster and generate its reasoning chain using zero-shot CoT + simple heuristics (e.g. length, number of steps, etc.)

Self Consistency

Replacing the naive greedy decoding in CoT
Sample multiple, diverse reasoning paths through few-shot CoT, select the most consistent out of the generations.

Generated knowledge prompting

Simple prompts may not be enough to perform tasks that require more world knowledge.
LLM has the ability to incorporate knowledge or information to make more accurate predictions.
LLM can generate the knowledge to be used as part of the prompt before making a prediction.

First, generate knowledges

Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge:

Then use the knowledge (output of above) as context

Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.
Explain and Answer:

Tree of Thoughts (ToT)

Generalizes over CoT prompting and encourages exploration over thoughts that serve as intermediate steps for problem solving in LLM.
LLM self-evaluates the intermediate thoughts (i.e. coherent language sequences) through reasoning; this is combined with search algorithms to systematically explore thoughts with lookahead and backtracking.