2024年9月12日

利用 LLM 學習推理

我們推出了全新大型語言模型 OpenAI o1，這款模型受過強化學習訓練，能執行複雜的推理任務。o1 會先思考再回答，在回覆使用者之前，產生一長串內部思考鏈。

載入中…

OpenAI o1 在競技程式設計問題 (Codeforces) 中排名第 89 百分位，在美國數學奧林匹亞初選賽 (AIME) 中名列全美前 500 名學生，並在物理、生物及化學問題基準測試 (GPQA) 中達到超越人類博士等級的準確率。我們仍在努力讓這個新模型的使用體驗與現有模型一樣簡便，不過已推出早期版本 OpenAI o1‑preview，供 ChatGPT 及受信任的 API 使用者⁠(在新視窗中開啟)搶先使用。

我們的大規模強化學習演算法，能透過資料利用率極高的訓練過程，教導模型如何運用本身的思考鏈進行有條理的思考。我們發現，o1 進行長時間的強化學習訓練 (訓練期運算資源) 以及花費更多時間思考 (測試期運算資源) 後，表現會穩定提升。擴大這種做法受到的限制，與傳統 LLM 預訓練方式大相逕庭，我們仍在持續研究其潛力。

這張圖片顯示兩張散佈圖，比較「o1 美國數學邀請賽 (AIME) 準確率」在訓練期間及測試期間的表現。兩張圖的 y 軸為「首次回答正確率」，x 軸為運算量 (以對數刻度呈現)。圖中的點代表準確率隨運算時間增加而提升。

在訓練期間與測試期間，o1 的表現均隨著運算量增加而穩定提升。

評估

為突顯 o1 推理能力優於 GPT‑4o，我們讓模型進行多項人類試驗及機器學習基準測試。結果顯示，在大多數高度依賴推理能力的任務中，o1 的表現大幅超越 GPT‑4o。除非另有說明，否則我們對 o1 的評估皆採最大測試期運算設定。

數學競賽評量 (AIME 2024) — o1 在高難度推理評量中大幅優於 GPT-4o實心長條顯示 pass@1 準確率，陰影區域表示在 64 筆樣本中採用多數決 (共識) 方式的表現。

數學 (AIME 2024)、程式 (CodeForces)、博士級科學問題 (GPQA Diamond) 評量 — o1 在高難度推理評量中大幅優於 GPT-4o實心長條顯示 pass@1 準確率，陰影區域表示在 64 筆樣本中採用多數決 (共識) 方式的表現。

GPT-4o 與 o1 在各類競賽評量中的準確率與原始分數比較分析 — o1 在多項評量中優於 GPT-4o，包括 57 項 MMLU 子類別中的 54 項。圖中示意其中七項。

GPT-4o 相較 o1 在 ML 基準測試與考試成績上的提升，手機版替代文字說明 — o1 在多項評量中優於 GPT-4o，包括 57 項 MMLU 子類別中的 54 項。圖中示意其中七項。

在多個高度依賴推理能力的基準測試中，o1 的表現可與人類專家匹敵。近來的前沿模型¹在 MATH² 及 GSM8K 表現均極其優異，使得這些基準無法有效區分模型優劣。我們選擇以 AIME 評估數學能力，這項考試專為挑戰全美頂尖高中數學生所設計。在 2024 年的 AIME 中，GPT‑4o 平均解題率只有 12% (1.8/15)，而 o1 平均解題率達到 74% (11.1/15)；在 64 筆樣本中取共識答案的正確率達到 83% (12.5/15)，使用訓練後的評分函數重新排序 1,000 筆樣本時，準確率可達 93% (13.9/15)。13.9 分的成績已高於美國數學奧林匹亞的入選門檻，並可列入全美前 500 名。

我們也透過超高難度智慧推理基準測試 GPQA diamond (專家級物理、化學與生物測試) 評估 o1。為與人類表現進行比較，我們邀請了具有博士學位的專家回答 GPQA-diamond 問題。結果發現，o1 的表現優於這些人類專家，成為第一個在該基準測試達到如此成就的模型。這些結果不表示 o1 在所有層面皆超越博士，而是在預期博士應能解決的問題上表現更優異。此外，在多項機器學習基準測試中，o1 也超越目前最先進的模型。啟用視覺感知功能後，o1 在 MMMU 測試中取得 78.2% 分數，成為首個可媲美人類專家的模型。此外，在 MMLU 的 57 個子類別中，有 54 項表現優於 GPT‑4o。

思考鏈

就像人類回答困難問題前可能會思考許久，o1 在解題時亦會啟動思考鏈。透過強化學習，o1 能持續精煉自身思考鏈與解題策略。它能學習辨識錯誤並加以修正、將困難流程拆解為簡單步驟、在現有策略失效時改用新方法。這項流程大幅提升此模型的推理能力。以下展示 o1‑preview 在解決多個難題時的思考鏈，說明這項重大突破。

GPT-4o

User

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

User

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

思考 5 秒

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

純文字

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

純文字

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

純文字

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

純文字

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

程式設計

我們訓練了一個模型，在 2024 年國際資訊奧林匹亞 (IOI) 中獲得 213 分，排名第 49 百分位，該模型是以 o1 為基礎進一步訓練，強化其程式設計能力。在 2024 年 IOI 中，此模型與人類參賽者在相同條件下進行競賽，需在十小時的時間解決六道艱難的演算法問題，每題允許提交 50 次。

我們系統針對每題生成多個候選答案，並依測試期選擇策略從中選出 50 筆提交。提交的項目是依據 IOI 公開測資、模型生成測資及訓練評分函數來選擇。如果改為隨機提交，平均僅能獲得 156 分，顯示在競賽限制下，這項策略價值接近 60 分。

若放寬提交限制，我們發現模型表現亦有大幅提升。當允許每題提交 10,000 次時，即使未使用選擇策略，模型得分亦達到 362.14 分，超過金牌門檻。

最後，我們模擬了由 Codeforces 主辦的競賽型程式設計比賽，以展示此模型的程式設計能力。我們的評估條件與比賽規則高度一致，每題允許提交 10 次。GPT‑4o 的 Elo 評分³為 808，僅位於人類選手第 11 百分位。這個模型的表現遠超過 GPT‑4o 及 o1，其 Elo 評分達到 1807，優於 93% 的參賽者。

圖片為一張長條圖，比較不同模型在 Codeforces Elo 百分位排名的表現。GPT-4o 的 Elo 評分為 808 (第 11 百分位)，o1-preview 為 1258 (第 62 百分位)，o1 為 1673 (第 89 百分位)，而 o1-ioi 則達到 1807 (第 93 百分位)。

針對程式設計競賽進行進一步微調後，o1 的表現有所提升。經改良的模型在 2024 年國際資訊奧林匹亞中，依照正式競賽規則，排名第 49 百分位。

人類偏好評估

除了考試與學術基準測試，我們也針對 o1‑preview 與 GPT‑4o 在各類高難度開放式提示的表現，評估了人類偏好。在這項評估中，訓練師會看到來自 o1‑preview 和 GPT‑4o 對同一提示的匿名回應，並投票選出他們偏好的答案。在資料分析、程式設計及數學等高度依賴推理能力的類別中，o1‑preview 明顯優於 GPT‑4o，得票率獲得壓倒性勝利。然而，在部分自然語言任務中，o1‑preview 並非首選，顯示此模型並不適合所有使用場景。

圖片為一張橫條圖，比較五種模型的得分，並以誤差條表示信賴區間。X 軸範圍為 0 至 100，虛線為參考基準。

安全性

思考鏈推理為模型對齊及安全性帶來全新的可能性。我們發現，將模型行為政策融入推理模型思考鏈中，是能夠穩定傳授人類價值與原則的有效方式。透過教導模型我們的安全規則，以及如何在實際情境中進行推理，我們發現，推理能力的提升能直接強化模型的健全度：具體來說，o1‑preview 在多項關鍵的越獄測試評估，以及我們內部最嚴格的模型安全拒答基準測試中，表現有大幅提升。我們相信，使用思考鏈可為安全性與對齊技術帶來重大進展，原因包括：(1) 它讓我們能以清晰可理解的方式觀察模型的思考過程；(2) 模型對安全規則進行推理時，表現比起針對非預期情境更穩定。

為了對我們的改善項目執行壓力測試，我們在部署前依據應變整備框架⁠(在新視窗中開啟)進行了一系列安全測試與紅隊演練。我們發現在各項評估中，思考鏈推理皆對模型能力均有所助益。其中值得注意的是，我們觀察到有趣的獎勵操弄案例⁠(在新視窗中開啟)。詳細測試結果已收錄於隨附的系統說明卡中。

指標	GPT-4o	o1-preview
對有害提示的安全回應百分比標準	0.990	0.995
對有害提示的安全回應百分比高難度：越獄行為與邊緣案例	0.714	0.934
↳ 騷擾 (嚴重)	0.845	0.900
↳ 剝削性的性內容	0.483	0.949
↳ 涉及未成年人的性内容	0.707	0.931
↳ 關於非暴力不當行為的建議	0.688	0.961
↳ 關於暴力不當行為的建議	0.778	0.963
WildChat 中每個類別前 200 名最高審核 API 分數的安全回應百分比 Zhao 等人2024 年	0.945	0.971
Goodness@0.1 StrongREJECT 越獄評估 Souly 等人2024 年	0.220	0.840
人為來源的越獄評估	0.770	0.960
內部良性邊緣案例的合規百分比「非過度拒答」	0.910	0.930
XSTest 中良性邊緣案例的合規百分比「非過度拒答」 Röttger 等人2023 年	0.924	0.976

隱藏思考鏈

我們認為，隱藏思考鏈帶來了獨特機會，可應用在模型監控。只要思考過程真實且清晰，我們就能「讀懂模型的心思」，瞭解其推理邏輯。例如，未來我們可能希望藉由監控思考鏈來偵測模型是否有操控使用者的傾向。然而，為了讓這項機制發揮作用，模型必須能自由表達其原始想法，因此我們不能透過訓練，將任何政策遵循或使用者偏好原則強加於思考鏈中。同時，我們也不希望將未經對齊的思考鏈直接呈現給使用者。

因此，綜合考量使用者體驗、競爭優勢，以及未來監控思考鏈的可能性後，我們決定不向使用者公開原始的思考鏈內容。我們知道這項決定有其缺點，因此致力透過訓練模型，在回應中重現思考鏈中任何有用的構想，多少彌補這個不足之處。在 o1 模型系列中，我們會顯示模型生成的思考鏈摘要。

結論

o1 的頂尖 AI 推理能力已有重大突破，我們將精益求精，持續推出改良版，預期這些新的推理能力將成為我們的一大助力，讓模型更有效對齊人類的價值觀與原則。我們相信 o1 及其後續版本，將開啟 AI 在科學、程式設計、數學及相關領域的各種新應用。我們也很期待使用者與 API 開發者持續發掘各種使用情境，讓這項技術為更多日常工作帶來助益。

附錄 A

資料集	指標	gpt-4o	o1-preview	o1
競賽數學 AIME (2024)	cons@64	13.4	56.7	83.3
競賽數學 AIME (2024)	一次通過率	9.3	44.6	74.4
競賽程式碼 CodeForces	Elo	808	1,258	1,673
競賽程式碼 CodeForces	百分位數	11.0	62.0	89.0
GPQA Diamond	cons@64	56.1	78.3	78.0
GPQA Diamond	一次通過率	50.6	73.3	77.3
生物學	cons@64	63.2	73.7	68.4
生物學	一次通過率	61.6	65.9	69.2
化學	cons@64	43.0	60.2	65.6
化學	一次通過率	40.2	59.9	64.7
物理	cons@64	68.6	89.5	94.2
物理	一次通過率	59.5	89.4	92.8
MATH	一次通過率	60.3	85.5	94.8
MMLU	一次通過率	88.0	92.3	90.8
MMMU (val)	一次通過率	69.1	不適用	78.2
MathVista (testmini)	一次通過率	63.8	不適用	73.9

作者

OpenAI

查看貢獻者

引用

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(在新視窗中開啟)、https://deepmind.google/technologies/gemini/pro⁠(在新視窗中開啟)
2
我們的評估採用了與 https://arxiv.org/abs/2305.20050⁠(在新視窗中開啟) 相同的 500 題測試題庫
3
https://codeforces.com/blog/entry/68288⁠(在新視窗中開啟)