2024年9月12日

LLM で論理的思考を学ぶ

OpenAI o1の導入を進めています。複雑な推論を行うために強化学習で訓練された新しい大規模言語モデル（LLM）です。o1は答える前に考えます。ユーザーに答える前に、内部で長い思考の連鎖を生み出すことができます。

読み込んでいます...

OpenAI o1は、競争型のプログラミング問題（Codeforces）で89パーセンタイルにランクし、アメリカ数学オリンピック（AIME）の予選で米国の学生上位500人の中に入り、物理学、生物学、化学の問題のベンチマーク（GPQA）で人間の博士号レベルの精度を超えています。この新しいモデルを現在のモデルと同じように使いやすくするために必要な作業はまだ進行中ですが、このモデルの初期バージョンである「OpenAI o1‑preview」をChatGPTですぐに利用可能にし、信頼できるAPIユーザー⁠（新しいウィンドウで開く）を対象にリリースする予定です。

当社の大規模強化学習アルゴリズムは、データ効率の高いトレーニングプロセスにて思考の連鎖を採用した生産的な思考方法をこのモデルに教えます。o1の性能は、強化学習（トレーニング時計算）と思考時間（テスト時計算）が長くなるほど一貫して向上することが明らかになりました。このアプローチを推進していく上での制約は、LLMプレトレーニングのそれとは大きく異なり、現在も検討中です。

画像は学習中とテスト時の「o1 AIME 精度」を比較した2つの散布図。どちらのグラフにもy軸に「pass@1 の精度」、x軸に計算（ログスケール）がある。ドットは、計算時間が長くなるほど精度が向上することを示しています。

o1の性能は、学習時計算とテスト時計算の両方で円滑に向上

評価

GPT‑4oに対する推論の改良を明らかにするため、人間の試験とMLベンチマークの多様なセットで各モデルをテストしました。こうした推論を多用するタスクの大部分において、o1がGPT‑4oを大幅に上回ることを示します。特記がない限り、o1を最大テスト時計算設定で評価しました。

数学（AIME 2024）のコンペティション評価、プログラミング（CodeForces）、博士号レベルの科学の問題（GPQA Diamond） — o1は高難易度の推論ベンチマークにおいてGPT-4oを大幅に上回る。黒抜きグラフはpass@1での正答率を示し、影付きの領域は64サンプルのうち最も多く出現した出力を示す。

各種比較評価におけるGPT-4oとo1の精度とロウスコアの詳細 — o1は、54/57のMMLUサブカテゴリを含む広範なベンチマークでGPT-4oを上回る。説明のために7項目を表示。

GPT-4oとo1を比較したMLベンチマークと試験のスコアの向上、モバイル用代替テキスト — o1は、54/57のMMLUサブカテゴリを含む広範なベンチマークでGPT-4oを上回る。説明のために7項目を表示。

推論を多用する多くのベンチマークにおいて、o1は人間の専門家のパフォーマンスに匹敵しています。最近のフロンティアモデル¹はMATH²とGSM8Kで非常に高い成績を収め、これらのベンチマークではモデルを有効に差別化できなくなっています。アメリカで最難関の高校数学試験として設計されたAIMEで数学のパフォーマンスを評価しました。2024年の AIME 試験では、GPT‑4o は平均12%（1.8/15）の正答に留まりました。o1 は、問題あたり1サンプルで74%（11.1/15）、64サンプルのコンセンサスで83%（12.5/15）、学習したスコアリング関数で1000サンプルを再順位付けすると93%（13.9/15）の平均値となりました。13.9というスコアは全米で上位500人の学生に入るもので、アメリカ数学オリンピックのカットオフ値を上回っています。

また、化学、物理学、生物学の専門知識を問う難解な知能ベンチマークであるGPQAダイアモンドでo1を評価しました。各モデルを人間と比較するため、GPQAダイアモンドの問題に答える博士号を持つ専門家を募集しました。o1が人間の専門家の成績を上回り、このベンチマークでそれを達成した最初のモデルとなりました。これらの結果は、o1が全ての面で博士よりも有能であることを示唆するものではありません。あくまで博士が解くと期待される一部の問題において同モデルがより熟練していることを示唆しています。他のいくつかのMLベンチマークでは、o1は最先端を上回りました。視覚認識機能を有効にした状態では、o1は、MMMUで78.2%のスコアを獲得し、人間の専門家に匹敵する初めてのモデルとなりました。また、MMLUの57個のサブカテゴリ中54個でGPT‑4oを上回りました。

思考の連鎖

人間が難しい質問に答える前にしばらく考えるのと同じように、o1は問題を解くときに思考の連鎖を使います。強化学習を通じて、o1は思考の連鎖を磨き、使用する戦略を洗練させることを学びます。間違いを認識し、修正することを学びます。難しいステップをより簡単なステップに分解することを学びます。今のアプローチが通用しないときに別のやり方を試すことを学びます。このプロセスによってモデルの推論能力を劇的に向上させます。この飛躍を説明するため、以下の難問に取り組むo1‑previewからの思考の連鎖をご紹介します。

GPT-4o

ユーザー

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

ユーザー

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

思考時間：5秒

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

プレーンテキスト

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

プレーンテキスト

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

プレーンテキスト

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

プレーンテキスト

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

コーディング

o1から初期化し、プログラミングスキルをさらに向上させるトレーニングを行ったところ、2024年の国際情報学オリンピック（IOI）で213点を獲得し、49パーセンタイルの順位を付けました。このモデルは、人間の出場者と同じ条件で2024年のIOIに出場しました。10時間で6つの難解なアルゴリズム問題を解き、問題あたり50本の提出が認められました。

各問題に対して、o1のシステムは多数の提出候補をサンプリングし、テスト時選択戦略に基づいてそのうち50本を提出しました。提出答案は、IOIの公開テストケース、モデルによって生成されたテストケース、学習されたスコアリング関数でのパフォーマンスに基づいて選択されました。無作為に提出していた場合であれば平均で156点に留まっていたため、この戦略は大会という制約下では60点近い価値があったことになります。

提出の制約を緩和した場合、モデルのパフォーマンスが大幅に向上することが明らかになりました。問題あたり10,000本の提出が許可されたとしたら、テスト時選択戦略がなくても、このモデルは金メダルのしきい値を超える362.14点を達成していました。

最後に、Codeforcesが主催するプログラミングコンテストのシミュレーションを行い、このモデルのコーディングスキルを実証しました。評価は大会規則に則り、10本の提出を認めました。GPT‑4oは808のEloレーティング³を達成し、これは人間の出場者の11パーセンタイルに入ります。このモデルはGPT‑4oとo1の両方を大きく上回り、1807のEloレーティングを達成し、競争者の93%を上回るパフォーマンスとなりました。

画像は、Codeforces Elo percentile ランキングをモデルごとに比較した棒グラフです。GPT-4o は808 Elo（11パーセンタイル）、o1 プレビューは1258 Elo（62パーセンタイル）、o1 は1673 Elo（89パーセンタイル）、o1-ioi は1807 Elo（93パーセンタイル）。

プログラミング大会に合わせてさらにファインチューニングすることでo1は向上します。改良版モデルは、2024年国際情報オリンピックにおいて、大会規則に基づいて49パーセンタイルに順位を付けました。

人間の選好評価

試験や学術的なベンチマークに加え、幅広い領域で困難かつオープンエンドなプロンプトについて、o1‑previewとGPT‑4oに対する人間の好みを評価しました。この評価では、あるプロンプトに対する o1‑preview と GPT‑4o の匿名化された回答を人間のトレーナーに示し、好む回答に投票してもらいました。o1‑preview は、データ分析、コーディング、数学といった論理的思考を多用するカテゴリでは、gpt-4o に大差をつけて好まれました。ただし、o1‑previewはいくつかの自然言語タスクでは好まれず、全てのユースケースに適しているわけではないことを示唆しています。

画像は5モデルのスコアを信頼区間を表す誤差棒で比較した横棒グラフです。x軸の範囲は0から100で、破線はパフォーマンスの基準点となります。

安全性

思考の連鎖による推論は、安全性とアライメントに新たな機会を提供します。推論モデルの思考連鎖にモデル行動の方針を組み込むことで、人間の価値観や原則を効果的に教えることができると発見しました。モデルに当社の安全規則と、文脈におけるその推論方法を教えることで、推論能力がモデルの堅牢性に直接寄与するというエビデンスを発見しました。o1‑previewは、主要なジェイルブレイク評価と、当社のモデルの安全性拒否境界を評価するための最も困難な社内ベンチマークにおいて、大幅な性能向上を達成しました。思考の連鎖を用いることで、安全性とアライメントを大幅に進歩させることができると考えています。その理由は、（1）モデルの思考を理解しやすい形で観察できる（2）安全規則に関するモデルの推論が、分布外のシナリオに対してより堅牢になるためです。

改善点をストレステストするため、デプロイ前に当社のPreparedness Framework⁠（新しいウィンドウで開く）に従って一連の安全性テストとレッドチーミングを実施しました。思考の連鎖による推論が評価全体の能力向上に寄与していることを発見しました。特筆すべきは、報酬ハッキングの興味深い事例⁠（新しいウィンドウで開く）の観測です。これらの評価の詳細な結果は、System Cardに掲載しています。

測定基準	GPT-4o	o1-preview
有害なプロンプトに対する安全なコンプリーションの割合標準	0.990	0.995
有害なプロンプトに対する安全なコンプリーションの割合困難なケース：ジェイルブレイクおよびエッジケース	0.714	0.934
↳ ハラスメント（重度）	0.845	0.900
↳ 搾取的な性的コンテンツ	0.483	0.949
↳ 未成年者を含む性的コンテンツ	0.707	0.931
↳ 非暴力的な不正行為に関する助言	0.688	0.961
↳ 暴力的な不正行為に関する助言	0.778	0.963
WildChatにおけるカテゴリーごとのModeration APIスコア上位200件のプロンプトに対する安全なコンプリーションの割合 Zhao, et al.2024	0.945	0.971
Goodness@0.1 StrongREJECT ジェイルブレイク評価 Souly et al.2024	0.220	0.840
人間によるジェイルブレイク評価	0.770	0.960
内部の害のないエッジケースに対する適合度過剰に拒否しない	0.910	0.930
XSTestにおける害のないエッジケースに対する適合度過剰に拒否しない Röttger, et al.2023	0.924	0.976

思考の連鎖を隠す

思考の連鎖を隠すことは、モデルのモニタリングにおいて独自の機会をもたらすと考えています。隠された思考の連鎖が忠実で認識可能なものであれば、私たちはモデルの「心を読む」ことができ、その思考プロセスを理解できます。例えば、ユーザーを操ろうとする兆候を確認するため、将来的に思考の連鎖を監視することがあるかもしれません。ただし、これを適切に行うには、モデルが思考をそのまま変更のない形で表現する自由がなければならないため、ポリシーの遵守やユーザーの嗜好をトレーニングで思考の連鎖に組み込むことはできません。また、当社は整合性のない思考の連鎖を直接ユーザーに明かすことを望んでいません。

したがって、ユーザーエクスペリエンス、競争上の優位性、思考の連鎖の監視を追求する選択肢など、複数の要因から検討した結果、生の思考の連鎖をユーザーに見せない判断としました。この決定には欠点もあることを理解しています。思考の連鎖から有用なアイデアがあれば、それを回答に再現するようモデルを教えることで、部分的にそれを補うよう努めています。o1モデルシリーズについては、モデルが生成した思考の連鎖の要約を示します。

結論

o1は、AIによるリーズニングの最先端を大きく前進させます。今後も開発を続け、このモデルの改良版を発表していく予定です。これらの新しい推論能力によって、モデルを人間の価値観や主義に合致させる能力が向上すると期待しています。o1やその後継モデルは、科学やコーディング、数学、関連分野におけるAIの新たなユースケースをいくつも開拓していくと信じています。ユーザーやAPI開発者の方には、日々の仕事を改善させる活用方法を発見していただけることを期待しています。

付録A

データセット	測定基準	gpt-4o	o1-preview	o1
競技数学 AIME（2024）	cons@64	13.4	56.7	83.3
競技数学 AIME（2024）	pass@1	9.3	44.6	74.4
競技プログラミング CodeForces	Elo	808	1,258	1,673
競技プログラミング CodeForces	パーセンタイル	11.0	62.0	89.0
GPQA Diamond	cons@64	56.1	78.3	78.0
GPQA Diamond	pass@1	50.6	73.3	77.3
生物学	cons@64	63.2	73.7	68.4
生物学	pass@1	61.6	65.9	69.2
化学	cons@64	43.0	60.2	65.6
化学	pass@1	40.2	59.9	64.7
物理学	cons@64	68.6	89.5	94.2
物理学	pass@1	59.5	89.4	92.8
MATH	pass@1	60.3	85.5	94.8
MMLU	pass@1	88.0	92.3	90.8
MMMU (val)	pass@1	69.1	n/a	78.2
MathVista（testmini）	pass@1	63.8	n/a	73.9

著者

OpenAI

貢献者を見る

引用文献

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠（新しいウィンドウで開く）, https://deepmind.google/technologies/gemini/pro⁠（新しいウィンドウで開く）
2
当社の評価では、https://arxiv.org/abs/2305.20050⁠（新しいウィンドウで開く）に掲載されているものと同じ500問のテストスプリットを使用しました。
3
https://codeforces.com/blog/entry/68288⁠（新しいウィンドウで開く）