Solving Chollet's ARC-AGI with GPT4o

Machine Learning Street Talk
Machine Learning Street Talk
22.5 هزار بار بازدید - ماه قبل - Ryan Greenblatt from Redwood Research
Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.

Sponsor:
Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a high risk activity - only trade what you can afford to lose.

We discuss:
- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.
- The strengths and weaknesses of current AI models.
- How AI and humans differ in learning and reasoning.
- Combining various techniques to create smarter AI systems.
- The potential risks and future advancements in AI, including the idea of agentic AI.

https://x.com/RyanPGreenblatt
https://www.redwoodresearch.org/

TOC
00:00:00 Intro
00:01:38 Prelude on goals in LLMs
00:02:42 Ryan intro
00:03:11 Ryan's ARC Challenge Approach
00:38:15 Language models, reasoning and agency
01:14:14 Timelines on superintelligence
01:27:05 Growth of superintelligence
02:06:41 Reflections on ARC
02:11:49 Why wouldn't AI knowledge be subjective

Host: Dr. Tim Scarfe

Pod: https://podcasters.spotify.com/pod/sh...

Refs:
Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]
https://redwoodresearch.substack.com/...

On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547

Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
https://ruccs.rutgers.edu/images/pers...

Software 2.0 [Andrej Karpathy]
Medium: software-2-0

Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]
https://amzn.to/3Wfy2E0

Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]
https://gwern.net/doc/iq/high/smpy/19...

Model Evaluation and Threat Research (METR)
https://metr.org/

Why Tool AIs Want to Be Agent AIs
https://gwern.net/tool-ai

Simulators - Janus
https://www.lesswrong.com/posts/vJFdj...

AI Control: Improving Safety Despite Intentional Subversion
https://www.lesswrong.com/posts/d9FJH...
https://arxiv.org/abs/2312.06942

What a Compute-Centric Framework Says About Takeoff Speeds
https://www.openphilanthropy.org/rese...

Global GDP over the long run
https://ourworldindata.org/grapher/gl...

Safety Cases: How to Justify the Safety of Advanced AI Systems
https://arxiv.org/abs/2403.10462

The Danger of a “Safety Case"
http://sunnyday.mit.edu/The-Danger-of...

The Future Of Work Looks Like A UPS Truck (~02:15:50)
https://www.npr.org/sections/money/20...

SWE-bench
https://www.swebench.com/

Using DeepSpeed and Megatron to Train Megatron-Turing NLG
530B, A Large-Scale Generative Language Model
https://arxiv.org/pdf/2201.11990

Algorithmic Progress in Language Models
https://epochai.org/blog/algorithmic-...
ماه قبل در تاریخ 1403/04/16 منتشر شده است.
22,504 بـار بازدید شده
... بیشتر