2024年9月12日

學習使用大語言模式 (LLM) 推理

我們即將推出 OpenAI o1，這是一款透過強化學習訓練來執行複雜推理的新型大型語言模型。o1 會在回答之前先思考——它可以在回應用戶之前產生一長串的內在思路鏈。

正在載入...

OpenAI o1 在競賽型程式設計問題 (Codeforces) 中的表現超越 89% 的參賽者，在美國數學奧林匹克資格賽 (AIME) 中躋身美國前 500 名學生之列，並在涵蓋物理、生物及化學問題的基準測試 (GPQA) 中的準確度超越人類博士水準。儘管還需要一些時間努力，才能讓這個新模型像現有模型一樣易於使用，但我們即將發行模型的早期版本, OpenAI o1‑preview，以供 ChatGPT 和受信任的 API 用戶⁠（在新視窗中開啟）立即使用。

我們的大規模強化學習演算法教導模型如何在資料高效的訓練流程中利用其思路鏈富有成效地思考。我們發現 o1 的表現隨著更多的強化學習（訓練時計算）和更多的思考時間（測試時計算）而持續提升。擴展此方法的限制與大型語言模型前期訓練的限制截然不同，我們正在繼續研究它們。

圖像中顯示比較訓練期間和測試時的「o1 AIME 準確度」的兩個散佈圖。兩個圖表都以「pass@1 準確度」作為 y 軸，以計算量（對數刻度）作為 x 軸。圓點表示隨著計算時間增加，準確度也會提高。

o1 的效能隨著訓練時間和測試時間的計算而平穩提升。

評估

為了突顯 GPT‑4o 在推理方面的改進，我們在多種人類考試和機器學習基準測試中測試我們的模型。我們發現，o1 在絕大多數這些著重推理的任務中都顯著優於 GPT‑4o。除非另有說明，我們是在最大測試時間運算設定下評估 o1 的。

競賽數學評價 (AIME 2024) — o1 在挑戰性推理基準測試中的表現與 GPT-4o 相比有很大的改進。實線表示 pass@1 準確度，陰影區域顯示 64 個樣本的多數決（共識）效能表現。

數學 (AIME 2024)、編碼 (CodeForces) 以及博士等級的科學問題（GPQA 鑽石）的競賽評估結果 — o1 在挑戰性推理基準測試中的表現與 GPT-4o 相比有很大的改進。實線表示 pass@1 準確度，陰影區域顯示 64 個樣本的多數決（共識）效能表現。

GPT-4O 與 O1 在各種競賽評估中的準確度和原始分數的詳細分類 — o1 在廣泛的基準測試中與 GPT-4o 相比有改進，其中包括 54/57 MMLU 子類別。圖畫中顯示了七個。

機器學習基準測試和測試分數，gpt4o 與 o1 的改進結果在頂部，移動端的替代文本 — o1 在廣泛的基準測試中與 GPT-4o 相比有改進，其中包括 54/57 MMLU 子類別。圖畫中顯示了七個。

o1 的效能在許多著重推理的基準測試中，可與人類專家媲美。最近的頂尖模型¹在 MATH² 和 GSM8K 上的表現非常出色，這些基準測試已無法有效區分模型。我們評估以挑戰美國最聰明的高中數學學生為目標的考試 AIME 上的數學表現。GPT‑4o 在 2024 年的 AIME 考試中，平均只解出 12%（1.8/15）題目。o1 在每題僅嘗試一次的情況下，平均解出 74%（11.1/15）；若產生 64 個解答並取得共識，平均達到 83%（12.5/15）；透過學習的評分函數重新排序 1000 個解答，表現提升至 93%（13.9/15）。13.9 的成績使其躋身全國前 500 名學生之列，並且高於美國數學奧林匹克的分數。

我們也在 GPQA diamond 上評估 o1，這是一項困難的智慧基準測試，旨在測試化學、物理和生物學方面的專業知識。為了將模型與人類進行比較，我們招募擁有博士學位的專家來回答 GPQA-diamond 問題。我們發現 o1 超越這些人類專家的表現，成為第一個在此基準測試中做到這一點的模型。這些結果並不表示 o1 在所有方面都比博士更具能力，這僅表示模型在解決某些博士應能解決的問題方面更為熟練。o1 在數個機器學習評測中的表現優於現有的頂尖技術。在啟用 o1 的視覺感知能力後，它在 MMMU 上獲得 78.2% 的分數，成為第一款與人類專家競爭的模型。它也在 57 個 MMLU 子類別中的 54 個上超越 GPT‑4o。

思路鏈

類似於人類在回答一個困難問題前可能需要長時間思考，o1 在嘗試解決問題時，也會使用思路鏈。o1 透過強化學習，學會磨練其思路鏈並改進所用的策略。它學會辨識並糾正自己的錯誤。它學會將複雜的步驟分解為更簡單的步驟。它學會當目前的方法無效時，嘗試不同的方法。這個流程顯著提高模型進行推理的能力。為了說明這項突飛猛進，我們在下面展現 o1‑preview 在幾個困難問題上的思路鏈。

GPT-4o

用戶

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

用戶

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

思考了 5 秒

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

純文字

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

純文字

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

純文字

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

純文字

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

編程

我們從 o1 開始初始化，並經過訓練進一步提高程式設計技能，在 2024 年國際資訊學奧林匹克競賽 (IOI) 中獲得 213 分，排名位於前 49%，訓練出一個模型。這個模型在 2024 年的 IOI 中，以與人類參賽者相同的條件參加競賽。它有 10 個小時的時間來解決 6 個具有挑戰性的演算法問題，並且每個問題允許提交 50 次。

我們的系統會針對每個問題取樣許多候選提交，並根據測試階段的選擇策略提交其中的 50 次。提交的選擇是基於在 IOI 公開測試案例、模型產生的測試案例以及學習到的評分函數上的表現。
如果我們改為隨機提交，平均只能獲得 156 分，這表示在競賽限制下，這個策略價值接近 60 分。

我們發現模型的表現在放寬提交限制的情況下顯著提升。當允許每個問題提交 10,000 次時，模型的得分為 362.14，高於金牌的門檻，即使完全沒有採用測試階段的選擇策略。

最後，我們模擬 Codeforces 舉辦的競技程式設計比賽，以展示模型的編程技能。我們的評估嚴格符合競賽規則，並允許 10 次提交。GPT‑4o 的 Elo 得分³為 808，這在所有人類競爭者中屬於第 11 百分位。這個模型遠超過 GPT‑4o 和 o1，Elo 評分達到 1807，表現優於 93% 的競爭者。

圖像中顯示比較不同模型的 Codeforces Elo 百分位數排名的一個條形圖。GPT-4o 擁有 808 Elo（第 11 百分位），o1 preview 擁有 1258 Elo（第 62 百分位），o1 擁有 1673 Elo（第 89 百分位），而 o1-ioi 擁有 1807 Elo（第 93 百分位）。

對程式設計競賽的進一步微調改進了 o1。改進的模型在 2024 年國際資訊奧林匹克競賽規則下排名在第 49 百分位。

人類喜好評估

除了考試和學術基準，我們也評估人類對 o1‑preview 和 GPT‑4o 在廣泛領域中具挑戰性、開放式提示詞的喜好。在此評估中，人類訓練師會看到來自 o1‑preview 和 GPT‑4o 的匿名提示回覆，並投票選出他們喜好的回覆。o1‑preview 在資料分析、編程和數學等推理密集的類別中，大幅度領先 GPT‑4o。然而，o1‑preview 在某些自然語言任務上並不受歡迎，這表明它不適用於所有用例。

圖像中顯示比較五個模型的分數的一個橫向條形圖，誤差線代表信賴區間。x 軸的範圍是 0 到 100，虛線作為效能的參考點。

安全

思路鏈推理為對齊和安全提供新的機會。我們發現，將關於模型行為的政策融入推理模型的思考流程，是穩健教導人類價值與原則的有效方式。透過教導模型我們的安全規則，以及如何在情境中推理這些規則，我們發現推理能力直接有助於模型的穩健性：o1‑preview 的表現在主要的越獄測試和我們最嚴苛的內部安全拒絕邊界評估中顯著提升。我們相信，使用思路鏈為安全和對齊性提供重大進展，因為 (1) 它能使我們以清晰的方式觀察模型的思維，以及 (2) 模型對安全規則的推理，對於分佈外場景更具穩健性。

為了壓力測試我們的改進，我們在部署前進行一系列安全測試，這符合我們的風險應對架構 (Preparedness Framework)⁠（在新視窗中開啟）。我們發現，在我們所有的評估中，思路鏈推理都有助於提升能力。特別值得注意的是，我們觀察到一些有趣的獎勵破解案例⁠（在新視窗中開啟）。請參閱隨附的系統卡中有關這些評估的詳細結果。

指標	GPT-4o	o1-preview
有害提示的安全完成百分比標準	0.990	0.995
有害提示的安全完成百分比挑戰：越獄和邊緣案例	0.714	0.934
↳ 騷擾（嚴重）	0.845	0.900
↳ 剝削性的性內容	0.483	0.949
↳ 可能包括涉及未成年人的色情內容。	0.707	0.931
↳ 關於非暴力不法行為的建議	0.688	0.961
↳ 關於暴力不法行為的建議	0.778	0.963
WildChat Zhao 等人每個類別中 Moderation API 得分最高的前 200 名的安全完成百分比2024 年	0.945	0.971
Goodness@0.1 StrongREJECT 越獄評估 Souly et al.2024	0.220	0.840
人為越獄方法評估	0.770	0.960
內部良性邊緣案例的合規百分比「並非過度拒絕」	0.910	0.930
XSTest 良性邊緣案例合規百分比「並非過度拒絕」 Röttger 等人2023 年	0.924	0.976

隱藏思路鏈

我們相信，隱藏的思路鏈為監控模型提供了獨特的機會。假設它是忠實且清晰的，隱藏的思路鏈能讓我們「讀取」模型的「思想」，並理解其思維過程。例如，日後我們可能希望監控思路鏈中是否有操控用戶的跡象。然而，為了實現此目的，模型必須能以未經修改的形式自由表達思想，因此我們不能對思路鏈進行任何策略合規或用戶喜好的訓練。我們也不希望讓未對齊的思路鏈直接呈現在用戶面前。

因此，經過考量多項因素，包括用戶體驗、競爭優勢以及是否追蹤思路鏈的選項，我們決定不向用戶展示原始的思路鏈。我們承認這個決定有其缺點。我們努力透過教導模型，在回答中重現思路鏈中任何有用的想法，來彌補這一點的部分。對於 o1 模型系列，我們展示模型產生的思路鏈摘要。

結論

o1 大幅提升人工智能推理的最新技術水準。我們計劃在持續改進流程中發佈此模型的改進版。我們預計這些新的推理能力將會提升我們使模型與人類價值觀和原則對齊的能力。我們相信 o1 及其後繼者將會在科學、編程、數學及相關領域為人工智能開啟更多新的用例。我們很高興用戶和 API 開發人員能發現它如何改進他們的日常工作。

附錄 A

資料集	指標	gpt-4o	o1-preview	o1
競賽數學 AIME (2024)	cons@64	13.4	56.7	83.3
競賽數學 AIME (2024)	pass@1	9.3	44.6	74.4
競賽編碼 CodeForces	Elo	808	1,258	1,673
競賽編碼 CodeForces	百分位	11.0	62.0	89.0
GPQA Diamond	cons@64	56.1	78.3	78.0
GPQA Diamond	pass@1	50.6	73.3	77.3
生物學	cons@64	63.2	73.7	68.4
生物學	pass@1	61.6	65.9	69.2
化學	cons@64	43.0	60.2	65.6
化學	pass@1	40.2	59.9	64.7
物理學	cons@64	68.6	89.5	94.2
物理學	pass@1	59.5	89.4	92.8
數學	pass@1	60.3	85.5	94.8
MMLU	pass@1	88.0	92.3	90.8
MMMU（值）	pass@1	69.1	N/A	78.2
MathVista (testmini)	pass@1	63.8	N/A	73.9

作者

OpenAI

專案貢獻者

引文

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠（在新視窗中開啟）, https://deepmind.google/technologies/gemini/pro⁠（在新視窗中開啟）
2
我們的評估採用 https://arxiv.org/abs/2305.20050⁠（在新視窗中開啟）中發現的相同 500 題測試集。
3
https://codeforces.com/blog/entry/68288⁠（在新視窗中開啟）