Reason then respond with DeepSeek-R1 and Mistral Tiny
When working with LLMs we often want to generate high quality responses while keeping costs low. In this blog post we will explore a new promising technique that involves using a reasoning model to generate a detailed thought trace, and have that feed into another model for constructing a final response.
Most models still have problems with basic tasks
It's 2025 and many models still have problems with surprisingly basic questions.
The count the number of r's in the word strawberry
has been an example of such a question that most models will struggle with. In a quick test of 15 models, only 4 repeatedly produced the correct answer of 3, others answered 2.
There is a simple reason for this. Since models tokenize text not by characters but by "tokens", they often cannot "see" the difference between r
and rr
.
The solution is well known. Before asking the model to respond, we need to ask it to think out loud so that it breaks up the word into separate characters that fits the tokenization process better. While this works with some more advanced prompting, it is not a general solution that scales well for arbitrary questions.
Of course this applies to more complex problems too - such as complex scenario analysis, multivariable decision making, math and science problems. If there could be a generalized way of creating reasoning traces, it would be a promising breakthrough.
Using a reasoning model to generate thought traces
A few months ago OpenAI released O1, which they labelled as a new type of reasoning model. The idea with reasoning models is that they are trained to produce extensive reasoning traces as part of their generation process, without needing to prompt for it. O1 showed that this approach yields a superior result on many benchmarks. The downside with OpenAIs approach was that they chose to make the actual reasoning trace hidden, probably to protect it being used for training other models.
On January 21st of this year the company DeepSeek released their reasoning model called DeepSeek-R1. DeepSeek-R1 also proved to be very smart, and much cheaper than the proprietary alternatives. But more interestingly it is open source / open weights and produces an open and extensive (often a bit hilarious) reasoning trace.
For example, here is how it looks like when DeepSeek-R1 is asked to tackle the Strawberry question:
Okay, let's see. The user asked how many times the letter 'r' appears in the word
"strawberry." Hmm, let me start by writing down the word so I can visualize each
letter. S-T-R-A-W-B-E-R-R-Y. Wait, did I spell that right? Let me check again.
S-T-R-A-W-B-E-R-R-Y. Yeah, strawberry. So breaking it down: S, T, R, A, W, B, E,
R, R, Y. Let's count the R's. The third letter is R. Then after E, there are two
R's. So that's one in the third position, then the eighth and ninth letters are
both R? Wait, let me count again. S (1), T (2), R (3), A (4), W (5), B (6), E
(7), R (8), R (9), Y (10). So at positions 3, 8, and 9. That makes three R's. Let
me verify that. If the word is strawberry, then the correct spelling does have
double R in the middle. Wait, isn't strawberry spelled S-T-R-A-W-B-E-R-R-Y? Yeah,
after the 'be' part, it's 'erry', right? So that's two R's there. Plus the first
R in 'str'. So total three R's. Wait, but sometimes people might misspell it.
Maybe as strawberyy, but no, the user spelled it correctly. So the answer should
be three. Hmm, but let me make absolutely sure. Let me write it out again: S T R A
W B E R R Y. Now go through each letter:
1. S
2. T
3. R (1)
4. A
5. W
6. B
7. E
8. R (2)
9. R (3)
10. Y
Yes, there are three R's. So the answer is 3. But let me cross-verify by maybe
checking the spelling again. Strawberry is spelled with two R's towards the end.
Yeah. So first R at position 3, then two R's at positions 8 and 9. So total 3.
Alright, confident that's correct.
As you can see, this is a verbose and open exploration of the problem. The model is not "fine tuned" for producing an over confident well defined response. Instead it is tuned to produce a detailed thought process that explores the problem from multiple angles, back and forth while questioning its assumptions and reasoning.
This has proven to be highly useful as we will see in this blog post.
Implementing a two step reason-then-respond pattern
A pattern that has emerged is to utilise the reasoning trace as an input to another model for constructing a final response. The idea is that a reasoning trace can act as an "upgrade" to an existing model since it will allow the respone model to see an extensive exploration of the problem and solution before constructing a response.
In this blog post we will build an example of this pattern. We will use DeepSeek-R1 to generate a reasoning trace, and then feed that into Mistral Tiny to produce a final response.
Here is the basic implementation (with the actual implementation below):
async def main():
question = "How many occurrences of the letter r are there in the word strawberry?"
# Function to generate a reasoning trace
reasoning = await reason(question)
# Function to generate a final response
response = await respond(question, reasoning)
print(response)
# There are 3 r's in the word strawberry
Let's look at the implemention of both the reasoning and the response step.
Generating a reasoning trace with DeepSeek-R1
We start with building a function that uses an Opper call with Deepseek-R1 to generate a detailed reasoning trace for the user question.
# We utilize deepseek-r1 to generate a detailed reasoning trace for the user question and facts
async def reason(user_input: str):
reasoning, _ = await opper.call(
name="reasoning_step",
model="fireworks/deepseek-r1",
instructions="Generate a detailed reasoning trace for the user question",
input={
"user_question": user_input,
},
)
return reasoning
This result of this call will be the detailed reasoning trace shown in the example above.
Baking a well reasoned response with Mistral Tiny
We now build a function that uses Mistral Tiny to generate a structured response to the user question. We feed the reasoning trace as an additional input to an Opper call. We also define a simple response schema that summarizes the question, answer and explanation.
# We utilize mistral-tiny-eu to generate a structured response to the user question using the reasoning trace
async def respond(user_input: str, reasoning: str):
# We define a structured response to the user question
class Response(BaseModel):
question: str
answer: str
explanation: str
response, _ = await opper.call(
name="response_step",
model="mistral/mistral-tiny-eu"
instructions="Generate a response to the user question using the reasoning thoughts",
input={
"user_question": user_input,
"thoughts": reasoning,
},
output_type=Response,
)
return response
Inspecting the final output
Finally we run this and print the result of the reasoning and response step. We also add timing information for clarity.
Question:
How many occurences of the letter r are there in the word strawberry?
Reasoning:
<think>
Okay, let's see. The user asked how many times the letter 'r' appears in the word
"strawberry." Hmm, let me start by writing down the word so I can visualize each
letter. S-T-R-A-W-B-E-R-R-Y. Wait, did I spell that right? Let me check again.
S-T-R-A-W-B-E-R-R-Y. Yeah, strawberry. So breaking it down: S, T, R, A, W, B, E,
R, R, Y. Let's count the R's. The third letter is R. Then after E, there are two
R's. So that's one in the third position, then the eighth and ninth letters are
both R? Wait, let me count again. S (1), T (2), R (3), A (4), W (5), B (6), E
(7), R (8), R (9), Y (10). So at positions 3, 8, and 9. That makes three R's. Let
me verify that. If the word is strawberry, then the correct spelling does have
double R in the middle. Wait, isn't strawberry spelled S-T-R-A-W-B-E-R-R-Y? Yeah,
after the 'be' part, it's 'erry', right? So that's two R's there. Plus the first
R in 'str'. So total three R's. Wait, but sometimes people might misspell it.
Maybe as strawberyy, but no, the user spelled it correctly. So the answer should
be three. Hmm, but let me make absolutely sure. Let me write it out again: S T R A
W B E R R Y. Now go through each letter:
1. S
2. T
3. R (1)
4. A
5. W
6. B
7. E
8. R (2)
9. R (3)
10. Y
Yes, there are three R's. So the answer is 3. But let me cross-verify by maybe
checking the spelling again. Strawberry is spelled with two R's towards the end.
Yeah. So first R at position 3, then two R's at positions 8 and 9. So total 3.
Alright, confident that's correct.
</think>
Response:
question: How many occurences of the letter r are there in the word strawberry
answer: 3
explanation: The letter 'r' appears three times in the word 'strawberry', at
positions 3, 8, and 9. The word 'strawberry' is spelled S-T-R-A-W-B-E-R-R-Y,
with three distinct 'r' characters.
Time: 23.31 seconds
As we can see how the final response now also includes a thoughtful explanation and a correct response.
For comparison, this is the output of Mistral Tiny without the reasoning trace:
Response:
question: How many occurences of the letter r are there in the word strawberry?
answer: 5
explanation: The letter 'r' appears 5 times in the word 'strawberry'.
Time: 2.32 seconds
While the reasoning approach adds 20 seconds to the call, it ensures that the final response is correct and well reasoned.
Benefits of this approach
Using a reasoning model to generate detailed and open thought traces before producing the final response has several key benefits:
-
Improved quality on hard problems: The reasoning model's step-by-step thought process helps explore problems from multiple angles and thereby providing the basis for a comprehensive, well reasoned response. This may be especially valuable for tasks that require careful reasoning on many aspects or facts, or for performing precise counting. These are many problems where models struggle to trying to generate answers directly, often ending up with over confident and wrong answers.
-
Efficiency through specialized models: By using a specialized reasoning model to generate the thought trace and a another model for the final response, we can optimize each step optimally. For example, Deepseek-R1 is a very smart model it is not multi lingual (or multi modal). Mistral Tiny on the other hand is multi lingual and very instructionable. By combining models we can get the best of both worlds. Another benefit is that compound calls like this are usually cheaper that using a much larger all-in-one model. This example only cost around $0.05 to generate.
-
Transparency and verifiability: There is a big advantage to having reasoning traces being generated in the open because it makes it possible to easily follow the model's reasoning process and verify its conclusions. This transparency helps build trust and makes it easier to debug when things go wrong. We can simply figure out if the problemn lies in the reasoning or the response step and optimise the right part.
Another benefit of this pattern is that it is virtually plug and play. Pretty much all calls could be upgraded to use this pattern as it is mostly about feeding it an additional input. We could also take an approach where we evaluate the complexity of the question and only apply the reasoning step in certain scenarios. Anything is possible :)
Conclusion
The pattern of using a reasoning model to generate detailed thought traces before producing final responses offers a powerful approach to improving AI system outputs. And since it mostly involves feeding an additional input to an existing call, it may very well be fairly plug and play.
By leveraging the strengths of different models and combining them we can achieve state of the art results while keeping costs low.
The transparency of the reasoning process also provides valuable insights into how the model arrives at its conclusions which allows us to debug, verify and steer the outputs.
Try it yourself
If you would like to try this example yourself, you can find a complete runnable code snippet in this GitHub Gist. Both Deepseek and Mistral Tiny are available in our free tier - simply sign up for an account and generate an Opper API key to get started. Feel free to experiment with different prompts and see how the reasoning-then-response pattern improves the outputs for your use case.