I've been using GenSpark.ai for the past month to do research (its agents usually does ~20 minutes, but I've seen it go up to almost 2 hours on a task) - it uses a Mixture of Agents approach using GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 Pro and searches for hundreds of sources.
I reran some of these searches and I've so far found OpenAI Deep Research to be superior for technical tasks. Here's one example:
I've been giving Deep Research a good workout, although I'm still mystified if switching between the different base model matters, besides o1 pro always seeming to fail to execute the Deep Research tool.
Yeah, it seems to not be able to execute the tool calling properly. Maybe it's a bad interaction w/ it's own async calling ability or something else (eg, how search and code interpreter can't seem to run at the same time for 4o)
Any public comparisons of OAI Deep Research report quality with Perplexity + DeepSeek-R1, on the same query?
How do cost and query limits compare?